More

zihotki · 2026-05-22T20:10:19 1779480619

> To screw with OpenAI's IPO?

That's actually a very interesting idea. At least it could be a cherry on a cake given he lost the lawsuit with OpenAI.

zihotki · 2026-05-22T07:52:37 1779436357

Are there any benchmarks/evals to see if this particular one is doing anything good comparing to, let's say, plan mode? How do you measure it actually works and you don't waste tokens and your personal time?

I fail to see any backing for claims 'boosting performance' and 'keeping costs low'

sermakarevich · 2026-05-22T08:02:49 1779436969

fair

here are slides explaining it in more details: https://docs.google.com/presentation/d/1SjKXF7hkoqyiN9-3tBGY...

when plan + code mode works - no need to change it. when it does not, because feature is complicated - than we need something else. Thats when sdd is applicable. I use it for mid + size projects only.

Measuring is a bit of subjective thing here. But when plan mode + code does not work and sdd works (because of double decomposition) - you get what you need.

Tokens consumption is lower because you can wipe your context after every step or subtask implemented. The scope to deliver specs is bigger however. But confusion is way lower as your context is focused per single step or subtask.

zihotki · 2026-05-21T16:07:05 1779379625

Wait, that's actually a great feature. Let me contact a friend in Google and make a suggestion..

zihotki · 2026-05-20T10:21:19 1779272479

> Crypto is fast. Markets move violently. Complexity is accelerating. > Most users discover risk only after damage is done.

and then

> Resonance Labs (BVI) Limited > 2nd Floor, Ellen L. Skelton Building, > Fishers Lane, Road Town, > Tortola, British Virgin Islands, VG 1110

This sounds very fishy

ekjhgkejhgk · 2026-05-20T10:26:44 1779272804

Crypto itself is fishy.

Every crypto firm I know personally is in the BVI (but there's other popular places, like I believe UAE and Switzerland), so these guys are not an outlier in a fishy space.

zihotki · 2026-05-19T15:06:03 1779203163

MS is more like assembling the surfaces, it's not the same as having a full vertical integration like Apple is trying to achieve with their contracts with Intel, in-house modem, etc.

zihotki · 2026-05-19T14:58:34 1779202714

I think that's what author implies by this sentence in the intro:

> It’s still your responsibility to understand your system and define what “correctness” means, and you need a high-level understanding of temporal logic.

nyrikki · 2026-05-19T17:30:06 1779211806

As TLA+ uses state machines that can define infinite state spaces, checking arbitrary temporal logic formulas is undecidable in the general case.

Using a TLC model checker to verify invariants or properties, finding a valid counterexample can scale non-elementarily.

In fact, in some situations it can end up being Ackermann-complete[0], but even recursively enumerable problems can be Tower-complete.

I would say that unless you focus on more detailed temporal logic pitfalls you may have issues.

So IMHO this will stay a use case specific solution, that you choose based on context.

Even a common solution like adding Circumscription causes counterintuitive changes [1][2].

IMHO, if you want to use TLA+ as a primary method, you will need some depth or be ready to abandon it by time boxing etc…

Remember that we know the open domain frame problem in [2] is equal to HALT, it will not universally apply.

It is just another tool that works well when it works well.

[0] https://arxiv.org/abs/2104.13866

[1] https://arxiv.org/abs/2407.20822

[2] https://en.wikipedia.org/wiki/Circumscription_(logic)

baq · 2026-05-19T19:36:58 1779219418

OTOH if you’re developing a service with a frontend and a backend you are firmly in the distributed system territory even if you only have one user, so it doesn’t have to work all the time; it’s enough if it works approximately more often than not to be an improvement.

zihotki · 2026-05-19T07:20:13 1779175213

It's an unreadable word soup in general

zihotki · 2026-05-16T11:03:21 1778929401

I'll wait for the follow up posts "setting up reasonable budgets for ai-native organization" and "validating (benchmarks/evals) that your ai-native setup has any benefit over a simplistic ai-assisted org"

zihotki · 2026-05-16T08:45:08 1778921108

> Powered by Gemini 1.5 Pro

full slope

zihotki · 2026-05-15T10:44:04 1778841844

How do you make it so that the model doesn't forget to follow those rules and skills? How do you make it actually understand the architecture and constraints? You can't, current models don't work that way to make it happen.