Productivity growth. If you take rolling averages from this chart, it clearly demonstrate higher productivity growth before the adoption of software. This is a well established fact in econ circles.
I think this is a classic case of reading into specific arguments too deeply without understanding what they really mean in the grand picture. Few points to easily disprove this argument
- if it were true that software paradoxically reduces productivity, you can just start a competing company that doesn't use software. Obviously this is ridiculous - top 20 companies by market cap are mostly Software based. Every other non IT company is heavily invested in software
- if you might say the problem is it at the country level, it is obvious that every country that has digitised has had higher productivity and GDP growth. Take Italy vs USA for instance.
- if you are saying that the problem is even more global, take the whole world - the GDP per is still pretty high since the IT revolution (and so have other metrics)
If you still think there's something more to it, you are probably deep in some conspiracy rabbit hole
You don't have a counterfactual to suggest that it would have continued increasing had it not been for technology. Is there _any_ credible economist who suggests that we might have higher productivity without tech?
There is no counterfactual needed. Productivity growth has declined, despite the expectation that software would accelerate productivity. I'm asking you why this didn't happen.
Of course a counterfactual is needed, absent clear separation of causes and links to effects, neither of which the productivity metrics on their own establish. This is also widely known and talked about in econ circles in the face of this very data.
I'm not even proposing that growth would have been higher without "technology". I said information technology has not increased productivity growth compared to the past. This is an observation of fact.
Yeah, my theory is that Rust is going to be a somewhat "local optima" for a while for LLMs.
LLMs have a couple major problem, they hallucinate and make mistakes. So the ideal way to use them is to constrain them as much as possible with formal methods. Rust's type system is a formal proof of some forms of correctness, this will let "vibe-coding" work to a much higher degree than with other languages.
In the very long run, I suspect that all vibe-coding will actually occur in a language with dependent types and we'll push as much as possible into being proven correct at compile-time. Since the cost of generating code is plummeting, and thus the sheer volume of code will be exponentially rising, this is the only way to prevent an unsurmountable mountain of errors.
Formal methods and LLMs are a match made in heaven.
Lol basically we're saying AI isn't AI if we utilize the strength of computers (being able to compute). There's no reason why AGI should have to be as "sample efficient" as humans if it can achieve the same result in less time.
Let's say an agent needs to do 10 brain surgeries on a human to remove a tumor and a human doctor can do it in a single surgery.
I would prefer the human.
"steps" are important to optimize if they have negative externalities.
I think your logic isn't sound: Wouldn't we want a "intelligence" to solve problems efficiently rather than brute force a million monkies? There's defnitely a limit to compute, the same ways there's a limit to how much oil we can use, etc.
In theory, sure, if I can throw a million monkies and ramble into a problem solution, it doesnt matter how I got there. In practice though, every attempt has a direct and indirect impact on the externalities. You can argue those externalities are minor, but the largesse of money going to data centers suggests otherwise.
Lastly, humans use way less energy to solve these in fewer steps, so of course it matter when you throw Killowatts at something that takes milliwatts to solve.
> Lastly, humans use way less energy to solve these in fewer steps,
Not if you count all the energy that was necessary to feed, shelter and keep the the human at his preferred temperature so that he can sit in front of a computer and solve the problem.
It's kind of the point? To test AI where it's weak instead of where it's strong.
"Sample efficient rule inference where AI gets to control the sampling" seems like a good capability to have. Would be useful for science, for example. I'm more concerned by its overreliance on humanlike spatial priors, really.
ARC has always had that problem but for this round, the score is just too convoluted to be meaningful. I want to know how well the models can solve the problem. I may want to know how 'efficient' they are, but really I don't care if they're solving it in reasonable clock time and/or cost. I certainly do not want them jumbled into one messy convoluted score.
'Reasoning steps' here is just arbitrary and meaningless. Not only is there no utility to it unlike the above 2 but it's just incredibly silly to me to think we should be directly comparing something like that with entities operating in wildly different substrates.
If I can't look at the score and immediately get a good idea of where things stand, then throw it way. 5% here could mean anything from 'solving only a tiny fraction of problems' to "solving everything correctly but with more 'reasoning steps' than the best human scores." Literally wildly different implications. What use is a score like that ?
The measurement metric is in-game steps. Unlimited reasoning between steps is fine.
This makes sense to me. Most actions have some cost associated, and as another poster stated it's not interesting to let models brute-force a solution with millions of steps.
Same thing in this case. No Utility and just as arbitrary. None of the issues with the score change.
Models do not brute force solutions in that manner. If they did, we'd wait the lifetimes of several universes before we could expect a significant result.
Regardless, since there's a x5 step cuttof, 'brute forcing with millions of steps' was never on the table.
Cost has utility in the real world and this doesn't. That's the only reason i would tolerate thinking about cost, and even then, i would never bundle it into the same score as the intelligence, because that's just silly.
It's an interesting point but I too find it questionable. Humans operate differently than machines. We don't design CPU benchmarks around how humans would approach a given computation. It's not entirely obvious why we would do it here (but it might still be a good idea, I am curious).
HN in general always does this. I got a lot of push back when I said that in general consumers don't care at all about open source, and the majority of them probably have no clue what it even means.
You can really sense the SF-centric bubble HN lives in.
Open source is a supply chain specific issue and consumers don’t care about supply chain.
Anyone with any illusions about this name quickly the top vendor for the third item in the materials itinerary of the first thing with a materials itinerary you get your hands on (for me it’s usually food. Who is the main vendor for citric acid? Or sugar. Or that red dye that causes adhd. I have no clue)
General consumers could not care less about open source.
Like, the whole point of open source is that this thread is not a thing. The whole point is "if this software is taken on by a malevolent dictator for life, we'll just fork it and keep going with our own thing." Or like if I'm evaluating whether to open-source stuff at a startup, the question is "if this startup fails to get funding and we have to close up shop, do I want the team to still have access to these tools at my next gig?" -- there are other reasons it might be in the company's interests, like getting free feature development or hiring better devs, but that's the main reason it'd be in the employees' best interests to want to contribute to an open-source legacy rather than keep everything proprietary.
The leadership and product direction work are at least as hard as the code work. Astral/uv has absolutely proven this, otherwise Python wouldn't be a boneyard for build tools.
Projects - including forks - fail all the time because the leadership/product direction on a project goes missing despite the tech still being viable, which is why people are concerned about these people being locked up inside OpenAI. Successfully forking is much easier said than done.
I had a lot of trouble convincing people that a correct Python package manager was even possible. uv proved it was possible and won people over with speed.
I had a sketched out design for a correct package manager in 2018 but when I talked to people about it I couldn't get any interest in it. I think the brilliant idea that uv had that I missed was that it can't be written in Python because if is written in Python developers are going to corrupt its environment sooner or later and you lose your correctness.
I think that now that people are used to uv it won't be that hard to develop a competitor and get people to switch.
What about pharma and for-profit healthcare employees?
reply