More

throwdbaaway · 2026-04-26T21:57:55 1777240675

If I understand correctly, both the staging database and the production database share the same volume. Thus, production data was gone as well after deleting the volume.

1st hint - the API call only contains one volume:

    curl -X POST https://backboard.railway.app/graphql/v2 \
      -H "Authorization: Bearer [token]" \
      -d '{"query":"mutation { volumeDelete(volumeId: \"3d2c42fb-...\") }"}'

2nd hint - this gem from the tweet:

> No "this volume contains production data, are you sure?"

hunterpayne · 2026-04-26T23:12:06 1777245126

"If I understand correctly, "

You don't. You are missing the part where the LLM had a token which blocked access as expected. Then the LLM searched the source base, found a different token with the delete privs and then used that.

PS That warning happens in staging envs too, the LLM doesn't know which env is which by design.

throwdbaaway · 2026-04-27T01:23:11 1777252991

Huh that's not what I gathered from the tweet at all. If I am going to write a five why's analysis, the immediate cause is the LLM wrongly decided to delete a volume, while the root cause is the bad design to co-locate staging and production data in the same volume. The writing was quite vague though, let's wait for a response from railway.

throwdbaaway · 2026-04-23T21:54:16 1776981256

Should be about 10~20 GiB per session. Save/restore is exactly what DeepSeek does using its 3FS distributed filesystem: https://github.com/deepseek-ai/3fs#3-kvcache

With this much cheaper setup backed by disks, they can offer much better caching experience:

> Cache construction takes seconds. Once the cache is no longer in use, it will be automatically cleared, usually within a few hours to a few days.

throwdbaaway · 2026-04-16T22:58:20 1776380300

Based on the release schedule of 3.5 previously, my optimistic take is that they distill the small models from the 397B, and it is much faster to distill a sparse A3B model. Hopefully the other variants will be released in the coming days.

throwdbaaway · 2026-04-15T23:23:09 1776295389

His Vibe Coding book is invaluable as a textbook example of slop.

throwdbaaway · 2026-04-12T15:28:10 1776007690

https://github.com/anthropics/claude-code/issues/46829#issue... - Have you checked with your colleague? (and his AI, of course)

fluidcruft · 2026-04-12T16:08:08 1776010088

Doesn't what's said at the link approximately agree? The 5m bug was said to be isolated to use of overage (API billing).

throwdbaaway · 2026-04-10T03:45:10 1775792710

> EC2 instances on shared hardware showed up to 30% variance between runs due to noisy neighbors.

Based on this finding, I suppose the better way is to rely on local hardware whenever possible?

throwdbaaway · 2026-04-10T02:59:11 1775789951

Very nice TG improvement from Flash Attention KQ fusion. Is it something that was already done in ik_llama.cpp? If not, then it will be a welcomed addition for hybrid CPU/GPU inference.

throwdbaaway · 2026-04-07T21:19:02 1775596742

https://github.com/THUDM/IndexCache - Might be some expected issue when rolling out this. They don't have enough compute, and have to innovate.

throwdbaaway · 2026-03-14T08:12:23 1773475943

90% of what you pay in agentic coding is for cached reads, which are free with local inference serving one user. This is well known in r/LocalLLaMA for ages, and an article about this also hit HN front page few weeks ago.

throwdbaaway · 2026-03-11T14:49:57 1773240597

What about the VRAM requirement for KV cache? That may matter more than memory bandwidth. With these GPUs, there are more compute capacity than memory bandwidth than VRAM.

DeepSeek got MLA, and then DSA. Qwen got gated delta-net. These inventions allow efficient inference both at home and at scale. If Anthropic got nothing here, then their inference cost can be much higher.

DeepSeek also got https://github.com/deepseek-ai/3FS that makes cached reads a lot cheaper with way longer TTL. If Anthropic didn't need to invent and uses some expensive solution like Redis, as indicated by the crappy TTL, then that also contributes to higher inference cost.