More

sfink · 2026-04-24T20:32:55 1777062775

It's like those ridiculous people who try to make a PBJ without knowing anything about glycemic indexes, peanut smut, or the historical origins of breadmaking.

Kids these days just want to use prefab libraries and frameworks with a million dependencies doing god knows what and written by randos.

(Unrelated to how commenters these days just want an excuse to use the term "peanut smut".)

sfink · 2026-04-24T16:34:55 1777048495

Living where? If it's in the GPU, then it's still taking up precious space that could be used for serving other sessions. If it's not in the GPU, then it doesn't help.

sfink · 2026-04-24T15:33:52 1777044832

> Why does it take the company that is probably the best at agentic coding more than a month to find and solve such large regressions, even with customers complaining about them?

My unfounded suspicion: because this is the tradeoff we're all facing and for the most part refusing to accept when transitioning over to LLM-driven coding. This is exactly how we're being trained to work by the strengths and limitations of this new technology.

We used to depend on maintaining a global if incomplete understanding of a whole system. That enabled us to know at a glance whether specs and tests and actual behavior made sense and guided our thinking, enabling us to know what to look at. With agentic coding, the brutal truth is that this is now a much less "efficient" approach and we'll ship more features per day by letting that go and relying on external signs of behavior like test suites and an agent's analysis with respect to a spec. It enables accomplishing lots of things we wouldn't have done before, often simply because it would be too much friction to integrate it properly -- write tests, check performance, adjust the conceptual understanding to minimize added complexity, whatever.

So in order to be effective with these new tools, we're naturally trained to let go of many of the things we formerly depended on to keep quality up. Mistakes that would have formerly been evidence of stupidity or laziness are now the price to pay for accelerate productivity, and they're traded off against the "mistakes" that we formerly made that were less visible, often because they were in the form of opportunity cost.

Simple example: say you're writing a simple CLI in Python. Formerly, you might take in a fixed sequence of positional arguments, or even if you did use argparse, you might not bother writing help strings for each one. Now because it's no harder, the command-line processing will be complete and flexible and the full `--help` message will cover everything. Instead, you might have a `--cache-dir=DIR` option that doesn't actually do anything because you didn't write a test for it and there's no visible behavioral change other than worse performance.

Closely related, what do you do with user feedback and complaints? Formerly they might be one of your main signals. Now you've found that you need dependable, deterministic results in your test suite that the agent is executing or it doesn't help. User input is very very noisy. We're being trained away from that. There'll probably be a startup tomorrow that digests user input and boils out the noise to provide a robust enough signal to guide some monitoring agent, and it'll help some cases, and train us to be even worse at others.

alfons_foobar · 2026-04-25T12:02:48 1777118568

I am not sure this approach can take you very far.

In my experience, CC makes it very very easy to _add_ things, resulting in much more code / features.

CC can obviously read/understand a codebase much faster than we do, but this also has a limit (how much context we can feed into it) - I think your approch is in essence a bet that future models' ability to read/understand code (size of context) improves as fast or faster than the current models' ability to create new code.

generalpf · 2026-04-25T15:29:05 1777130945

>There'll probably be a startup tomorrow that digests user input and boils out the noise to provide a robust enough signal to guide some monitoring agent, and it'll help some cases, and train us to be even worse at others.

This sounds like Enterpret.

pixl97 · 2026-04-24T18:21:30 1777054890

> you might have a `--cache-dir=DIR` option that doesn't actually do anything

Working in enterprise software it's surprising how long an option that doesn't actually do anything can be missed. And that was before AI and having thousands of customers use it.

This same problem happens with documentation all the time. You end up with paragraphs or examples that simply don't reflect what the product actually does.

tetromino_ · 2026-04-24T22:33:38 1777070018

Where I work, options that don't do anything are seen as good engineering practice. You see, you can't break your user's scripts. Your CLI arguments are part of your stable API. If your tool used to have a cache_dir CLI option, and now no longer needs it, you still have to keep accepting cache_dir and treat it as a no-op until you are confident your users have migrated away from it.

gogopromptless · 2026-04-24T20:42:35 1777063355

I've been working on this problem coming from the program synthesis school of thought over at https://promptless.ai (which you would have no clue just from looking at the website because its targeted at tech writers).

I'm quite fond of the idea of incremental mutation of agent trajectories to move/embody some of the reasoning steps from LLM tokens into a program. Imagine you have a long agent transcript/trajectory and you have a magic want to replace a run of messages with "and now I'll call this script which gives me exactly the information I need," then seeing if the rewritten trajectory is stable.

To give credit where it's due, it's an overly complicated restatement of what Manny Silva has been saying with docs-as-tests https://www.docsastests.com/. Once you describe some user flow to humans (your "docs"), you can "compile" or translate part or all of those steps into deterministic test programs that perform and validate state transitions. Ideally you compile an agent trajectory all the way.

So: working with coding agents, you've cranked up the defect rate in exchange for speed, lets try testing all important flows. The first thing you try is: ok, I've got these user guides, I guess I'll have the agent follow along and try do it. And that works! But it's a little expensive and slow.

So I go, ok I'll have the agent do it once, and if it finds a trajectory through a product that works, we can reflect on that transcript and make some helper scripts to automate some or all of those state transitions, then store these next to our docs.

And then you say, ok if I ship a product change, can I have my coding agent update those testing scripts to save the expense and time of re-running the original follow-along. Also an obvious thing to do, and you can totally build it yourself with Claude Code in a github action. But I think there is a lot of complexity in how you go about doing this, what kind of incremental computation you can do to keep the LLM costs of all this under a couple hundred bucks a month for teams shipping 20 changes a day with 200 pages of docs.

The most polished open source "compiler/translator" I've seen exploring these ideas so far is Doc Detective (https://doc-detective.com) by Manny.

taikahessu · 2026-04-24T20:45:44 1777063544

> Closely related, what do you do with user feedback and complaints? Formerly they might be one of your main signals. Now you've found that you need dependable, deterministic results in your test suite that the agent is executing or it doesn't help. User input is very very noisy.

I don't even use Claude and it has been rather clear to me, that their service has not been working properly for some time now.

andrekandre · 2026-04-25T00:14:44 1777076084

  > digests user input and boils out the noise to provide a robust enough signal to guide some monitoring agent

not to sound uncharitable but this seems like the absolute worst way to run a business; your customers are basically lab rats... why should they pay for anything in this scenario?

sfink · 2026-04-25T02:43:16 1777084996

I just said someone's gonna build it, not that it's a good idea!

To be fair [to myself], this is scale-dependent. I work on a product with hundreds of millions of users. We're not going to be reading and pondering every bit of feedback we get. We have automation for stripping out some of the noise (eg the number of crash reports we get from bit flips due to faulty RAM is quite significant at this scale). We have lines of defense set up to screen things down -- though if you file a well-researched and documented bug, we'll pay attention. (We won't necessarily do what you want, but we'll pay attention.)

When I worked at a much smaller and earlier stage company, we begged our users for feedback. We begged potential users for feedback. We implemented some things purely to try to get someone excited enough that they would be motivated to give feedback.

Anthropic, OpenAI, Google? They have a lot of users.

Also, this automation would be in addition to the other channels by which you'd pay attention to feedback.

Also also, the ship has sailed. We're all lab rats now. We're randomly chosen to be A/B tested on. We are upgraded early as part of a staged rollout. We're region-locked. Geocoded. Tracked as part of the cohort that has bought formula or diapers recently. Maybe we live in the worst of all possible worlds?

sfink · 2026-04-23T18:23:04 1776968584

It was, which is why it makes such a perfect analogy.

Surveillance has lots of good and bad uses, and is morally neutral itself. Powerful but neutral. The problem comes when the users use it for bad purposes, and in fact it is so tempting that they can't help using it for more and more bad purposes. If every palantir (either one) user was a "good guy" who refused to use it for bad purposes, it would be a potent force for good, and that's why they were created in the first place.

OkayPhysicist · 2026-04-23T18:38:22 1776969502

I thoroughly disagree. Surveillance is an invasive tool of control, and as such intrinsically immoral. Just like a slew of other immoral actions, it may be a net positive when applied for a greater good, but if not used for anything, it's evil.

This is trivially true to most common moral understandings. If my neighbor installs a camera pointing through my window and into my shower, applying some fancy technique to see through clouded glass, most of us would justly think that was immoral of him, even in complete absence of any other immoral actions facilitated by that surveillance.

sfink · 2026-04-23T18:57:36 1776970656

That depends on the definition of "surveillance". Should a foreman not pay close attention to his workers? Should a hospital not track its patients' locations and vital stats while within the hospital? Are cameras in a jewelry shop morally wrong?

Your neighbor's surveillance of you is bad because they're violating your privacy, and using the tool of surveillance to do it. If you lived in a foggy area and they were monitoring their front walkway with a camera that was good at seeing through fog, and they happened to get a corner of your property in the camera's field of view, then you might have something to complain about but I wouldn't call it morally wrong.

I agree that surveillance is a tool of control. So are fences. It's ok to control some things.

I also agree that surveillance gets into sticky territory very, very quickly. I definitely don't have a clean dividing line between what I'd like the police to be able to see and what they shouldn't. (Especially when the temptation to share that data is so strong and frequently succumbed to.) I would probably say in some useless abstract sense, mass surveillance is also morally neutral. But given that it's proven to be pretty much impossible to implement in a way that doesn't end up serving more evil than good, I wouldn't object to calling it immoral.

rogue7 · 2026-04-23T22:47:02 1776984422

IMHO surveillance is a problem when it is asymmetric ; which is obviously the case here. Governments for example are watching everyone inside and outside, but the people that are being monitored simply cannot really watch the people watching them. Don't you agree ?

In this view, maybe an ultra radical solution to privacy issues is : no privacy at all, for no-one. Complete and total transparency of everyone to everyone. Now the question is how to implement that ? That's obviously impossible, because someone in power will always have something to hide. So maybe if true democracy where everyone holds exactly the same amount of power that could work ? Same issue, because it is impossible to implement too. Oh well.

OkayPhysicist · 2026-04-23T22:52:12 1776984732

That is a "justifies extreme violence to prevent" type suggestion. Privacy is a basic human right. The problem is power. No one should be in a position to spy on everyone.

8note · 2026-04-23T21:25:28 1776979528

if its ok for the foreman to control the workers, you would then say its ok for the foreman to hold the workers at gun point while they work?

id say the control is immoral, in all forms. Voluntary agreement and consent are fine but then its not surveillance, its people saying where they are. the patient wants the doctor to know where they are and what they are doing, and not just letting the doctor decide on their own what to know.

the worker wants the foreman to know that they are present and working, in fulfilment of their contract together. its not surveillance either.

the jewelry store itself is immoral, but private property and control thereof is a tradeoff we've made

OkayPhysicist · 2026-04-23T19:10:44 1776971444

Again, there are plenty of instances where enough good comes from surveillance that it outweighs the intrinsic negative, but denying that it is, in of itself, intrinsically negative suggests that some creepy dude monitoring everyone's every move is just fine, as long as he's not doing anything else.

A more obvious parallel is violence. To trip over Godwin's law, shooting Hitler would have been a moral action, but not because "shooting people" is amoral. Shooting a random person is definitely immoral. We constantly do immoral things for the greater good, but it is a mistake to thusly assume those actions are amoral.

dudefeliciano · 2026-04-24T01:49:15 1776995355

> So are fences

Good fences make good neighbors... If I could put a notion in his head: Why do they make good neighbors?

Manuel_D · 2026-04-23T19:20:50 1776972050

So should the US simply not pursue any tax evasion cases? Because catching tax evasion necessarily requires surveillance.

sleepybrett · 2026-04-23T19:15:09 1776971709

the palantir weren't created for spying, they were created so that the various kingdoms of middle earth could stay in contact with each other. The palantir are a party line. It just got real sketchy when Minas Ithil fell (and became Minas Morgul) and Sauron got possession of the orb. After which the kings of gondor stopped using them.

jltsiren · 2026-04-23T20:39:35 1776976775

The palantiri were created by Fëanor. The kinslayer whose pride, rage, and desire for vengeance drove most of his people to their doom. The potential to corrupt was always present in them.

In the LotR, Aragorn bends a palantir to his will and uses it for good with great difficulty. He manages to do that, because he is (in addition to everything else) the trueborn king and the palantiri are his birthright. Denethor, on the other hand, succumbs to corruption. While he is a powerful lord with good intentions, he is only a steward, not a king. The right to use the palantiri is not inherent in his being, because he only wields power in someone else's name.

daemin · 2026-04-24T07:15:34 1777014934

I think that's also one of those problems in stories like this, that there's a powerful tool with the power to corrupt but only those chosen few by right can wield it without becoming corrupted.

It implies that there will always be people ordained through some manner which are incorruptible and therefore can use these things to "fight evil". The usual suspects in our world are people in Government, and by extension in military and law enforcement.

One example of someone actually destroying such a device after they finished using it rather than letting themselves be corrupted by it was Batman after he finished using it to locate the Joker. But of course this was in a fictional movie, and in no way represents the real world.

8note · 2026-04-23T21:12:50 1776978770

surveillance creates leverage over people. its not neutral if it creates a power imbalance, especially since its used by the wealthy on the poor.

you can't do surveillance and not learn the bad knowledge, and once youve created the bad knowledge its just a matter of time before it gets into nefarious hands.

a "bad guy" could still hack the "good guys" or palantir itself, and get access to all the bad data the "good guys" have created.

kortilla · 2026-04-23T18:44:28 1776969868

It’s not morally neutral, the very existence of surveillance has a chilling effect on dissenting opinions.

uoaei · 2026-04-23T18:54:33 1776970473

There are morally neutral technologies, but the unique quality of surveillance data containing PII (and tools to correlate across time and space) means that it's only morally neutral until it is used in any capacity. Which is to say, it is not morally neutral.

sfink · 2026-04-23T19:05:15 1776971115

You've already made a pretty big leap from surveillance to storing surveillance data persistently, and another to the tools. I'm not going to argue that mass surveillance is morally neutral.[1]

Tolkien's Palantirs let you see and communicate and influence across vast distances. That's no more immoral than a videophone. Of course, that's also not surveillance; that'd be a telescope. But surely telescopes aren't immoral?

[1] I mean, I would, but (1) you can't create a mass surveillance system from a morally neutral or positive place, and (2) it seems nearly impossible to implement a mass surveillance system without creating more harm than benefit. So it becomes a boring semantics argument as to whether mass surveillance is fundamentally immoral or not.

renticulous · 2026-04-23T18:51:35 1776970295

If Palintir itself gets hacked, all the data and analysis will be stopped up by others.

sfink · 2026-04-22T21:55:01 1776894901

> Corn subsidies are a few billions of dollars a year, that’s pretty cheap for food security.

A few billions a year to destroy farming capacity in the rest of the world, and even within our country for growing anything non-corn (because it has to compete with subsidized ethanol production). You could get more benefit and do less harm by using those billions to maintain production capacity for other crops (even if you're not even growing anything but a cover crop!), plus generate far more energy from solar production.

I'd say it's pretty expensive for food insecurity plus opportunity cost.

> Ethanol in gasoline is food security policy that exists to have something to use the corn for rather than throw it away.

That's just false. The mandate (The Renewable Fuel Standard) forces ethanol production. The law says you have to overproduce. If we wanted to preserve capacity, we wouldn't grow the corn, we'd subsidize maintaining the ability to grow it -- and other crops -- which would be way cheaper and also provide more food security.

sfink · 2026-04-22T21:45:01 1776894301

That can explain a little. Not the 40% of all corn grown that is used for ethanol.

Which would be better for the nation's security? Having all this ethanol, or having 31x the energy provided by that ethanol via solar production? We couldn't actually use that much solar power right now, but that's part of the opportunity cost: we aren't gearing up to make use of it because we're generating all of this ethanol that we don't need instead! The capacity maintenance argument works both ways: pay to maintain the capacity to grow vastly more corn than we'll ever need, or pay to maintain the capacity to generate tons more energy that we're far more likely to need.

(Also, taking land that has been largely destroyed by industrial corn farming and changing it into land that's growing some more valuable food crop isn't just a matter of changing your mind about what to grow the next year.)

sfink · 2026-04-22T21:32:35 1776893555

Nope, that's the cover story. The US subsidizes production, not capacity, which results in lots of excess crop that gets dumped on the market and depresses prices and impoverishes competitors. The ethanol mandates were created partly as a response to the problems that this created. But they are mandates for blending in a certain amount of ethanol, producing artificial demand, and putting us in the ridiculous situation where 40% of corn production goes to ethanol that nobody needs. It's the dumbest thing ever and makes no sense, but is very popular with farm states for obvious reasons.

If we actually wanted to maintain spare production capacity, it would look very different. We'd have to pay to keep land capable of growing food even when not growing any. We'd subsidize the inputs (irrigation, drainage, soil) instead of the outputs. We'd avoid overproduction instead of encouraging it, since it's a form of "inflation" that lowers prices and drives out farmers (other than the ones printing money... er, growing unneeded corn).

sfink · 2026-04-22T19:28:38 1776886118

Alternatively, you can have a fully safe language, and then to get certain things done you add fundamentally unsafe FFI[1]. Or you use IPC to a process written in an unsafe language. Again, you're "back in C++ land".

It seems like your complaint is that Rust is referred to as a safe language. Which is fine; it's more correct to use the phrase "in safe Rust" rather than assuming that "in Rust" fully implies "safe". That is true, but that's a crack in a sidewalk compared to the chasm of difference between Rust and C++. Why obsess over that crack?

Should we all refer to "Python without FFI or any extensions written in C or another unsafe language" instead of "Python", to avoid asserting that Python-as-it-is-used is a safe language?

[1] Assuming it's FFI to an unsafe language, and that's the main purpose of FFI.

sfink · 2026-04-22T19:14:07 1776885247

I agree that GC code can be and usually is slower. But as a counterpoint: allocation can be very slow. Bump allocation is fast. If you know your lifetimes and can free all of your bump-allocated memory in one go, that's pretty much the best you can do. If not, then you can still bump-allocate, but copy away the live stuff while discarding the rest. You keep bump allocation, you may even improve locality by eliminating the fragmentation in your bump allocation arena. On the other hand, you've written a garbage collector and thus inherit the disadvantages of GC.

My point is that you can start out without a GC and do a series of sane and effective optimizations that end you up with a GC. Just as you can start with a GC and optimize by moving more and more of your allocations to non-GC memory and end up without a GC. Which endpoint is faster depends on the workload.

sfink · 2026-04-22T00:59:06 1776819546

Oh, it definitely can, in a way very similar to the way you can dramatically increase doctor's success rates by being selective about who you treat.

Specifically: take the most disruptive students and eat them. (Be stealthy about it, the point is not fear of punishment.) The productivity difference between a classroom that spends 90% of its time on instruction vs 90% of its time on classroom management is massive.

That's why you have to be careful about applying business notions like "productivity" to governmental duties like education and mail and highways. (I dearly wanted to include healthcare or at least hospitals in the list, but I live in the US.) Businesses can and should be selective and take higher risks. For governmental tasks, productivity isn't even well-defined. If you're failing (or eating) 20% of your students but the other 80% are doing amazingly well, is that better or worse than 99% of everyone doing just okay? How about if everyone's test scores go up and practical ability goes to shit? (This is not a hypothetical, not where the kids have figured out how to use ChatGPT even for the tests. Which is a lot of places.)

Teaching is nowhere near Pareto optimal right now, so I'm not arguing in favor of the status quo. I'm just saying you have to be very, very careful when pushing for "productivity".