ehtbanton's comments

ehtbanton · 2026-04-13T14:05:31 1776089131

I just don't trust Anthropic's Claude Code team at all any more. Their tools are vibe-coded and their behaviour is anti-consumer.

They shouldn't be surprised at the thousands moving to Codex every day.

ehtbanton · 2026-04-13T03:34:11 1776051251

Benchmarks like this one are designed to thoroughly test the model across several iterations. 15% is a MASSIVE discrepancy.

Come on Anthropic, admit what you're doing already and let us access your best models unhindered, even if it costs us more. At the moment we just all feel short-changed.

ehtbanton · 2026-04-13T01:46:30 1776044790

This is genuinely very helpful. I'm planning a MacBook pro purchase with local inference in mind and now see I'll have to aim for a slightly higher memory option because the Gemma A4 26B MoE is not all that!

tomr75 · 2026-04-13T08:12:00 1776067920

pretty sure Nvidia GPU is better bang for buck because of usable inference speed..

egorfine · 2026-04-13T08:32:08 1776069128

I have upgraded my M4 Pro 24GB to M5 Pro 48GB yesterday. The same Gemma 4 MoE model (4bit, don't remember which version) runs about 8x faster on M5 Pro and loads 2x times faster in memory.

So yes, do purchase that new MacBook Pro.

croemer · 2026-04-13T11:24:05 1776079445

You don't know if it's the newer model or the increase in RAM. If someone has already got 48GB it they might not benefit much. You changed 2 things at once.

egorfine · 2026-04-13T11:48:33 1776080913

Not really: it's the same model size and it fits 24GB entirely.

59nadir · 2026-04-13T12:09:17 1776082157

If you're doing it specifically for inference (or in most other situations) a Mac(book) represents very low RoE.

59nadir · 2026-04-13T16:59:03 1776099543

s/RoE/RoI

ehtbanton · 2026-04-13T01:22:37 1776043357

This is very impressive, have tried it out.

If only everyone was as good at making performant terminal applications (cough cough Anthropic)

ehtbanton · 2026-04-12T23:59:47 1776038387

I've had this thought myself too. Going off on a slight tangent: I think there's also loads of useful stuff in domains like either of these which maps amazingly well to AI agent system design, but there's such a huge discrepancy between the knowledge bases of the fields that no benefit ever really surfaces.

(Speaking from the perspective of someone who simultaneously loves high-performance compute and agentic AI haha)

ehtbanton · 2026-04-12T23:52:08 1776037928

I will always maintain that the best benchmark is just trying it out for yourself. The most practical parallel for me is all the people posting about how some open-source model has "achieved X on Y benchmark - beating out Opus 4.6!" It's all show and everyone cheats.

ehtbanton · 2026-04-12T23:46:44 1776037604

Wake me up when Anthropic does something right again...