Hacker Newsnew | past | comments | ask | show | jobs | submit | ehtbanton's commentslogin

I just don't trust Anthropic's Claude Code team at all any more. Their tools are vibe-coded and their behaviour is anti-consumer.

They shouldn't be surprised at the thousands moving to Codex every day.


Benchmarks like this one are designed to thoroughly test the model across several iterations. 15% is a MASSIVE discrepancy.

Come on Anthropic, admit what you're doing already and let us access your best models unhindered, even if it costs us more. At the moment we just all feel short-changed.


This is genuinely very helpful. I'm planning a MacBook pro purchase with local inference in mind and now see I'll have to aim for a slightly higher memory option because the Gemma A4 26B MoE is not all that!

pretty sure Nvidia GPU is better bang for buck because of usable inference speed..

I have upgraded my M4 Pro 24GB to M5 Pro 48GB yesterday. The same Gemma 4 MoE model (4bit, don't remember which version) runs about 8x faster on M5 Pro and loads 2x times faster in memory.

So yes, do purchase that new MacBook Pro.


You don't know if it's the newer model or the increase in RAM. If someone has already got 48GB it they might not benefit much. You changed 2 things at once.

Not really: it's the same model size and it fits 24GB entirely.

If you're doing it specifically for inference (or in most other situations) a Mac(book) represents very low RoE.

s/RoE/RoI

This is very impressive, have tried it out.

If only everyone was as good at making performant terminal applications (cough cough Anthropic)


I've had this thought myself too. Going off on a slight tangent: I think there's also loads of useful stuff in domains like either of these which maps amazingly well to AI agent system design, but there's such a huge discrepancy between the knowledge bases of the fields that no benefit ever really surfaces.

(Speaking from the perspective of someone who simultaneously loves high-performance compute and agentic AI haha)


I will always maintain that the best benchmark is just trying it out for yourself. The most practical parallel for me is all the people posting about how some open-source model has "achieved X on Y benchmark - beating out Opus 4.6!" It's all show and everyone cheats.

Wake me up when Anthropic does something right again...

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: