Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

By the way, a corollary I encountered, I think with one of the recent AWS meltdowns, is that a paradoxical consequence of designing for "reliability" is that it guarantees that when something does happen, it's going to be bad, because the reliability engineering has done a good job of masking all the smaller faults.

Which means 1. anything that gets through, almost by definition, is going to be bad enough to escape the safeguards, and 2. when things do get bad enough to escape the safeguards, it will likely expose the avalanche of things that were already in a failure state but were being mitigated

The takeaway, which I'm not really sure how to practically make use of, was that if a system isn't observably failing occasionally in small ways, one day it's going to instead fail in a big way

I don't think that's necessarily something rigorously proven but I do think of it sometimes in the face of some mess



That's a fairly common pattern. As frequency of incidents goes down the severity of the average incident goes up. There has to be some underlying mechanism for this (maybe the one you describe but I'm not so sure that's the whole story).




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: