What is your use case? I read comments like this and it's totally opposite of my experience, I have both CC Opus 4.6 and Codex 5.4 and Codex is much more thorough and checks before it starts making changes maybe even to a fault but I accept it because getting Opus to redo work because it messes up and jumps in the first attempt is a massive waste of time, all tasks and spec are atomic and granularly spec'd, I'd say 30% of the time I regret when I decide to use Opus for 'simpler' and work
I'm building a correct, safe, highly understandable, concurrent runtime & language.
Essentially Rust/Tokio if it was substantially easier than even Go - and without a need for crates and a subset of the language to achieve near Ada-level safety.
That's the thing, when that level comes we will never know it's here. The only thing we'll have as evidence is the company who has it will always have a "public" model that is just slightly ahead of all competitors to keep market share while takeoff happens internally until they make big bang moves to lock in monopoly level/too big to fail/government protection to ensure utter victory.
So Anthropic bundled CC with Claude.ai cuz OAI bundled chatgpt with Codex, now OAI is unbundling, IPO must be around the corner. Writing is also on the wall for CC usage based subscriptions now that main competitor effectively got rid of it. How are the Chinese models looking?
Based on reading only I think a usable but a step below probably somewhat behind Sonnet. I also did read that some people successfully requested refunds late last year when their models shit the bed due to bugs so if they cut limits hard maybe you can try that
On the surface it seems like it would be a good idea for all these users who were suspended to do a mass arbitration like what happened to Uber to get them to start taking it seriously, this comes up like monthly people getting account pulled up from under them and impacting business. Maybe there a legal differences or something
https://www.mbelr.org/mass-arbitration-how-ubers-own-alterna...
Assistance of other humans? You do realise we're talking about an intelligence test right, at that point what are you even testing for. I'm sure you've taken exams where you couldn't bring your own notes, use Google or get help from someone, even though real life doesn't have those constraints
Why must models be analogous to humans using tools? Or to take the analogy route further wouldn't it be better if humans had calculators built into their brains, provided they are determisitic and reduce latency
Because it is directly analogous. Neural nets (whether biological or artificial) are not the best way to execute lots of deterministic computations quickly and reliably. That's why we invented computers.
I'm not convinced at all that this is the best way to reduce latency; there are many other ways of doing that.
Having a calculator in our brains would be handy of course, but a gigahertz multi core computer is still going to be better at anything that needs to do a lot of computation and or a lot of data.
Exactly. They've implemented a VM inside a transformer, turned an O(1) memory access call into O(n), optimized it down to O(log n) and wrote a post about how smart they are.
It's a nice bit of engineering, if you don't subscribe to YAGNI. If you do, you must ask the obvious question of what capability this delivers that wasn't available before. The only answer I've got is that someone must have been a bit chilly and couldn't figure out the thermostat
I get the feeling that the founders will not bend and invest for long term and not quarterly, as a non ex-stripe at least judging by their patience to IPO