I love how these articles drop, and all of a sudden HN is filled with people who think engineering productivity is simple to measure.
Yes, productivity implies revenue (or cost reduction), and revenue is measurable.
However:
1. You spend money today to build features that drive revenue in the future, so when expenses go up rapidly today, you don’t yet have the revenue to measure.
2. It’s inherently a counterfactual consideration: you have these features completed today, using AI. You’re profitable/unprofitable. So AI is productive/unproductive, right? No. You have to estimate what you would’ve gotten done without AI, and how much revenue you would’ve had then.
3. Business is often a Red Queen’s race. If you don’t make improvements, it’s often the case that you’ll lose revenue, as competitors take advantage.
4. Most likely, AI use is a mixture of working on things that matter and people throwing shit against the wall “because it’s easy now.” Actually measuring the potential productivity improvements means figuring out how to keep the first category and avoid the second.
This isn’t me arguing for or against AI. It’s just me telling you not to be lazy and say “if it were productive you’d be able to measure it.”
> HN is filled with people who think engineering productivity is simple to measure.
I think the prevailing (correct) consensus is that developer productivity is actually very hard to measure, and every time it is attempted the measure is immediately made a target making the whole thing pointless even if it had been a solid measurement- which it wasn't.
IDK where you're getting the idea here that measuring productivity of anyone who isn't a factory worker is easy.
Is it easy to measure a factory worker's productivity? It would seem surprising and interesting if every job's productivity is hard to measure except for one particular kind.
This is the message that somehow the tech industry is constitutionally incapable of absorbing. The "innovation impulse" is cancer. I have no idea why tech managers keep harping on about "innovating", it's so bizarre.
I mean, the option is not zero productivity or some productivity: it could be negative.
We doubt the productivity because we have enough experience with Claude Code to know that flooding your organization with that many tokens isn't just unproductive, it's actively harmful.
Minor shifts in productivity are hard to measure. Major jumps in productivity would be obvious. I think it’s clear that, if AI is affecting productivity, it’s to a minor degree at best.
Maybe. It also lets people build things that never would have existed before. My hobby is competitive pinball. There are multiple new stat and tournament tracking apps that have been vibe coded by people who never would have written code by hand.
If it was genuinely worth building before, you would have. Having some kind of cost involved is a force of nature that invokes one to decide whether it is worth doing it or not.
Moreover these activities only serve to enhance the wealth and interests of the few. Congrats. Don’t forget to look in the mirror.
If it were 10x productive you'd be able to measure it indirectly, you'd be unable to avoid measuring it. So the initial claims were clearly lies. The research question is:
Is it >1.0x productive?
I agree that's very hard to measure. But given what this shit costs, it had better be answerable, and the multiple had better justify the cost.
Yeah, as I was toggling "blue" / "green" / "blue" / "green" I had the distinct sensation that it might just show me that I was in a region where I couldn't even make a consistent distinction.
> “The raw output of ChatGPT’s proof was actually quite poor. So it required an expert to kind of sift through and actually understand what it was trying to say,” Lichtman says. But now he and Tao have shortened the proof so that it better distills the LLM’s key insight.
Interestingly, it was an elegant technique, but the proof still required a lot of work.
Which is difficult, because the fact that you can come up with your example questions tells us they're probably not very dangerous. Plenty of ink has been spilled about how LLMs could help people create bioweapons. The basic idea "you could do dangerous things with an LLM" is already pop culture, and you're not doing anything dangerous by giving easy example questions.
A dangerous question would have to be along the lines of "Could I use unobtanium with the Tony Stark process to produce explosives much more powerful than nuclear weapons?" so that the question itself contains some insight that gets you closer to doing something dangerous.
Perhaps the reason for not publishing the questions is twofold:
1) they want a universal jailbreak that can get the model to answer any "bad" question.
2) they don't want bad publicity when someone not under NDA jailbreaks their model and answers their question
That's a real question, maybe the changes are useful, though I think I'd like to see some examples. I do not trust cognitive complexity metrics, but it is a little interesting that the changes seem to reliably increase cognitive complexity.
I haven't previously thought about this, but I think words over a commutative monoid are equivalent to a vector of non-negative integers, at which point you have vector addition systems, and I believe those are decidable, though still computationally incredibly hard: https://www.quantamagazine.org/an-easy-sounding-problem-yiel....
There is a difference that someone smoking nearby automatically harms people around you. With alcohol, the effect is more unpredictable, but it is equally real.
Alcohol is a factor in an automobile crashes, and a factor in a significant proportion of violent crime, especially domestic violence (https://www.cato-unbound.org/2008/09/17/mark-kleiman/taxatio... edit: this source isn't as great, Kleiman has written elsewhere about the subject, but google is failing me). If we could wave a magic wand and cause drinking to cease to exist, many lives would be saved.
Note: I do in fact drink, I am not a teetotaler. But what I said above is factual. I personally believe that prohibition would be worse, and it's reasonable for individuals to make their own choices. But that does not entail denying that it goes very badly for many.
Second-hand smoke does affect people around you. It is how people get addicted to nicotine. It is how new smokers are created.
And there are some people who are more sensitive to temporary exposure to smoke (and pollution in general) than others.
That is why smoking tends to be is banned around hospitals and day care centers — because those are places where you will find those people.
My father was one of them, after he had got his larynx removed for throat cancer after having smoked for decades. He could not suffer being subjected to even small amounts of second-hand smoke again because then the breathing hole in his throat would get irritated, fill up with mucus and have to be cleaned with a suction device.
And if you drink alcohol next to me, it does not make my clothes and my hair stink so much afterwards that I will want to wash my hair and change my clothes before going to bed.
This is an interesting analysis, but "are the costs of AI agents also rising exponentially is?" is a very bad question that this doesn't answer.
What's rising exponentially is the price of the most ambitious thing cutting edge agents can do.
But to answer whether the cost of AI agents is rising in general, you would take a fixed set of problems, and for each of them, ask "once it's solvable, how does the price change?"
For that latter question, there isn't a lot of data in these charts because there aren't enough curves for models of the same family over time, but it does look like there are a number of points where newer models solve the same problems at lower prices. Look at GPT5 vs. the older GPT models--the curve for GPT5 is shifted left.
The cost of models are almost exponentially decreasing with time.
The author performs a non sequitur by muddling two concepts of time. They say costs are getting “unsustainable” which is not a conclusion that follows.
What is true is that at a given point in time, cost to perform a task is exponentially related to the human time taken. But it does not mean it will remain that way.. far from it.
Yes, productivity implies revenue (or cost reduction), and revenue is measurable.
However:
1. You spend money today to build features that drive revenue in the future, so when expenses go up rapidly today, you don’t yet have the revenue to measure.
2. It’s inherently a counterfactual consideration: you have these features completed today, using AI. You’re profitable/unprofitable. So AI is productive/unproductive, right? No. You have to estimate what you would’ve gotten done without AI, and how much revenue you would’ve had then.
3. Business is often a Red Queen’s race. If you don’t make improvements, it’s often the case that you’ll lose revenue, as competitors take advantage.
4. Most likely, AI use is a mixture of working on things that matter and people throwing shit against the wall “because it’s easy now.” Actually measuring the potential productivity improvements means figuring out how to keep the first category and avoid the second.
This isn’t me arguing for or against AI. It’s just me telling you not to be lazy and say “if it were productive you’d be able to measure it.”
reply