Are there any benchmarks/evals to see if this particular one is doing anything good comparing to, let's say, plan mode? How do you measure it actually works and you don't waste tokens and your personal time?
I fail to see any backing for claims 'boosting performance' and 'keeping costs low'
when plan + code mode works - no need to change it. when it does not, because feature is complicated - than we need something else. Thats when sdd is applicable. I use it for mid + size projects only.
Measuring is a bit of subjective thing here. But when plan mode + code does not work and sdd works (because of double decomposition) - you get what you need.
Tokens consumption is lower because you can wipe your context after every step or subtask implemented. The scope to deliver specs is bigger however. But confusion is way lower as your context is focused per single step or subtask.
Every crypto firm I know personally is in the BVI (but there's other popular places, like I believe UAE and Switzerland), so these guys are not an outlier in a fishy space.
MS is more like assembling the surfaces, it's not the same as having a full vertical integration like Apple is trying to achieve with their contracts with Intel, in-house modem, etc.
I think that's what author implies by this sentence in the intro:
> It’s still your responsibility to understand your system and define what “correctness” means, and you need a high-level understanding of temporal logic.
OTOH if you’re developing a service with a frontend and a backend you are firmly in the distributed system territory even if you only have one user, so it doesn’t have to work all the time; it’s enough if it works approximately more often than not to be an improvement.
I'll wait for the follow up posts "setting up reasonable budgets for ai-native organization" and "validating (benchmarks/evals) that your ai-native setup has any benefit over a simplistic ai-assisted org"
How do you make it so that the model doesn't forget to follow those rules and skills? How do you make it actually understand the architecture and constraints? You can't, current models don't work that way to make it happen.
That's actually a very interesting idea. At least it could be a cherry on a cake given he lost the lawsuit with OpenAI.
reply