Hacker Newsnew | past | comments | ask | show | jobs | submit | hyperpape's commentslogin

I love how these articles drop, and all of a sudden HN is filled with people who think engineering productivity is simple to measure.

Yes, productivity implies revenue (or cost reduction), and revenue is measurable.

However:

1. You spend money today to build features that drive revenue in the future, so when expenses go up rapidly today, you don’t yet have the revenue to measure.

2. It’s inherently a counterfactual consideration: you have these features completed today, using AI. You’re profitable/unprofitable. So AI is productive/unproductive, right? No. You have to estimate what you would’ve gotten done without AI, and how much revenue you would’ve had then.

3. Business is often a Red Queen’s race. If you don’t make improvements, it’s often the case that you’ll lose revenue, as competitors take advantage.

4. Most likely, AI use is a mixture of working on things that matter and people throwing shit against the wall “because it’s easy now.” Actually measuring the potential productivity improvements means figuring out how to keep the first category and avoid the second.

This isn’t me arguing for or against AI. It’s just me telling you not to be lazy and say “if it were productive you’d be able to measure it.”


> HN is filled with people who think engineering productivity is simple to measure.

I think the prevailing (correct) consensus is that developer productivity is actually very hard to measure, and every time it is attempted the measure is immediately made a target making the whole thing pointless even if it had been a solid measurement- which it wasn't.

IDK where you're getting the idea here that measuring productivity of anyone who isn't a factory worker is easy.


I do not think it is easy, like I said. I am saying other people are acting like it’s easy.

See the second comment on this article. https://news.ycombinator.com/item?id=47976781

See @emp17344 responding to me.


That second comment isn't making that statement though.

It's saying that: cost vs revenue is something we can see.

If I buy a plow for $2,500 and it enables growth of $5000, then arguing "the plow was expensive" is a moot point.

It doesn't make any argument about measured productivity, only investment vs return.


The difficulty in measuring productivity is the attribution. How do you know the new plow enabled growth?

Is it easy to measure a factory worker's productivity? It would seem surprising and interesting if every job's productivity is hard to measure except for one particular kind.

Any job where there's a definable output can be measured. Factory workers are one type.

Others might be farmers; if they're able to yield x tonnes of valid crops out of y acres.


> You spend money today to build features that drive revenue in the future

Totally but new features in their app or better software are not going to increase Uber's revenue/profit significantly.


This is the message that somehow the tech industry is constitutionally incapable of absorbing. The "innovation impulse" is cancer. I have no idea why tech managers keep harping on about "innovating", it's so bizarre.

I mean, the option is not zero productivity or some productivity: it could be negative.

We doubt the productivity because we have enough experience with Claude Code to know that flooding your organization with that many tokens isn't just unproductive, it's actively harmful.


Minor shifts in productivity are hard to measure. Major jumps in productivity would be obvious. I think it’s clear that, if AI is affecting productivity, it’s to a minor degree at best.

i think it will make things go backwards.

the big leaps in productivity come from really great ideas that are formalised into concepts that then take form.

this comes from being in a meditative state. not blasting output at a higher rate.


Maybe. It also lets people build things that never would have existed before. My hobby is competitive pinball. There are multiple new stat and tournament tracking apps that have been vibe coded by people who never would have written code by hand.

So..?

If it was genuinely worth building before, you would have. Having some kind of cost involved is a force of nature that invokes one to decide whether it is worth doing it or not.

Moreover these activities only serve to enhance the wealth and interests of the few. Congrats. Don’t forget to look in the mirror.


If it were 10x productive you'd be able to measure it indirectly, you'd be unable to avoid measuring it. So the initial claims were clearly lies. The research question is:

  Is it >1.0x productive?
I agree that's very hard to measure. But given what this shit costs, it had better be answerable, and the multiple had better justify the cost.

I think this site is doing a binary search, so that you narrow down on a boundary.

It would be much funnier, and also more insightful, if it didn't do this and let you contradict yourself.


Yeah, as I was toggling "blue" / "green" / "blue" / "green" I had the distinct sensation that it might just show me that I was in a region where I couldn't even make a consistent distinction.

Unless kids have gotten a lot faster in the past 25 years, I think that's a lot better than a typical 2000 person high school.

> “The raw output of ChatGPT’s proof was actually quite poor. So it required an expert to kind of sift through and actually understand what it was trying to say,” Lichtman says. But now he and Tao have shortened the proof so that it better distills the LLM’s key insight.

Interestingly, it was an elegant technique, but the proof still required a lot of work.


Which is difficult, because the fact that you can come up with your example questions tells us they're probably not very dangerous. Plenty of ink has been spilled about how LLMs could help people create bioweapons. The basic idea "you could do dangerous things with an LLM" is already pop culture, and you're not doing anything dangerous by giving easy example questions.

A dangerous question would have to be along the lines of "Could I use unobtanium with the Tony Stark process to produce explosives much more powerful than nuclear weapons?" so that the question itself contains some insight that gets you closer to doing something dangerous.

Perhaps the reason for not publishing the questions is twofold: 1) they want a universal jailbreak that can get the model to answer any "bad" question. 2) they don't want bad publicity when someone not under NDA jailbreaks their model and answers their question


> because the fact that you can come up with your example questions tells us they're probably not very dangerous

maybe I know more about this field that you think

there are biologists on video saying that present day models have expert level wet-lab knowledge and can guide a novice through whole procedures

models also were able to tweak DNA sequences to make them bypass DNA-printing companies filters

> they don't want bad publicity when someone not under NDA jailbreaks their model and answers their question

just like people now pay $500k for Chrome vulnerabilities, soon people will pay similar amounts to jailbrake models to do bad things


>there are biologists on video

Link handy?


[flagged]


Who are you quoting?

Well, yes. If you set up a nuclear lab in your house next door to me I'm calling the feds.

Things that are potentially dangerous to others when mishandled get regulated because some individual or some company abuses it and harms others.


That's a real question, maybe the changes are useful, though I think I'd like to see some examples. I do not trust cognitive complexity metrics, but it is a little interesting that the changes seem to reliably increase cognitive complexity.

I haven't previously thought about this, but I think words over a commutative monoid are equivalent to a vector of non-negative integers, at which point you have vector addition systems, and I believe those are decidable, though still computationally incredibly hard: https://www.quantamagazine.org/an-easy-sounding-problem-yiel....

Thanks, that's an interesting tidbit!

(The whole thing made me think about applications to SQL query optimizers, although I'm not sure if it's practically useful for anything.)


With all due respect, this is completely wrong.

There is a difference that someone smoking nearby automatically harms people around you. With alcohol, the effect is more unpredictable, but it is equally real.

Alcohol is a factor in an automobile crashes, and a factor in a significant proportion of violent crime, especially domestic violence (https://www.cato-unbound.org/2008/09/17/mark-kleiman/taxatio... edit: this source isn't as great, Kleiman has written elsewhere about the subject, but google is failing me). If we could wave a magic wand and cause drinking to cease to exist, many lives would be saved.

Note: I do in fact drink, I am not a teetotaler. But what I said above is factual. I personally believe that prohibition would be worse, and it's reasonable for individuals to make their own choices. But that does not entail denying that it goes very badly for many.


Second-hand smoke does affect people around you. It is how people get addicted to nicotine. It is how new smokers are created.

And there are some people who are more sensitive to temporary exposure to smoke (and pollution in general) than others. That is why smoking tends to be is banned around hospitals and day care centers ­— because those are places where you will find those people. My father was one of them, after he had got his larynx removed for throat cancer after having smoked for decades. He could not suffer being subjected to even small amounts of second-hand smoke again because then the breathing hole in his throat would get irritated, fill up with mucus and have to be cleaned with a suction device.

And if you drink alcohol next to me, it does not make my clothes and my hair stink so much afterwards that I will want to wash my hair and change my clothes before going to bed.


No, but the person drinking next to you can suddenly decide you gave them a bad look and decide to pick a fight.

Why are you replying as if I denied second hand smoke harms people? I very clearly said it did.

This is a great piece of data, but only a piece of the actual question that we need to answer, which is:

For a given input, how many tokens will be used for an answer, and how high quality will that answer be?

Measuring the tokenizer is just one input into the cost-benefit tradeoff.


This is an interesting analysis, but "are the costs of AI agents also rising exponentially is?" is a very bad question that this doesn't answer.

What's rising exponentially is the price of the most ambitious thing cutting edge agents can do.

But to answer whether the cost of AI agents is rising in general, you would take a fixed set of problems, and for each of them, ask "once it's solvable, how does the price change?"

For that latter question, there isn't a lot of data in these charts because there aren't enough curves for models of the same family over time, but it does look like there are a number of points where newer models solve the same problems at lower prices. Look at GPT5 vs. the older GPT models--the curve for GPT5 is shifted left.


The cost of models are almost exponentially decreasing with time.

The author performs a non sequitur by muddling two concepts of time. They say costs are getting “unsustainable” which is not a conclusion that follows.

What is true is that at a given point in time, cost to perform a task is exponentially related to the human time taken. But it does not mean it will remain that way.. far from it.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: