Hacker Newsnew | past | comments | ask | show | jobs | submit | maxall4's commentslogin

At this scale, that kind of thing is not really a problem; you just dump all of the data you can find into the model (pre-training)1. Of course, the pre-training data influences the model, but the reinforcement learning is really what determines the model’s writing style and, in general, how it “thinks” (post-training).

1 This data is still heavily filtered/cleaned


This isn’t quite accurate. Data weighting is quite important in pretraining.

> on ~2% of new prosumer signups.

I, and everyone else I have asked, see this new updated sales UI; sounds like more than 2%.


Either they vibe coded a test that was extremely broken.

Or they vibe wrote some bullshit to try and back pedal.


Yeah I flat out don't believe the 2% thing. It's possible that I was the 1 out of 50 who checked the page and saw that Claude code was removed... but it really seems like everyone I shared it with saw the same thing which is incredibly unlikely. Also I am an existing subscriber and checked the price page while logged in, so I shouldn't be counted in "2% of new subscribers" at all...

I have a Claude Pro tier subscription; Claude Code, as of right now, is still functional for me. If Anthropic does boot Pro-tier users off Claude Code, I will be cancelling my subscription.

Indeed. Codex on $20/m is incredibly usable. Lots of value. My anthropic subscription keeps being worth less and less.

> Codex on $20/m is incredibly usable.

And how long do you think that will last if A\ does this?


I'd almost bet that OpenAI will do the same thing with its plans as soon as Anthropic makes the first move. It's just a matter of who blinks first.

I paid for the annual Pro plan in January...I know this mentions new users right now, but is there a chance they just take Code away?!

They would probably grandfather existing users in for at least a year or something, you have to imagine. Even if this "test" goes very well and points to removal

This test makes perfect sense with their actions the last few weeks, they think they've done enough to transition into the general public and away from devs and our goodwill no longer is something they should be concerned with.

Its funny that openai, who in my eyes went for the general public rather than devs initially, seems to be semi pivoting and catching all the fallout from anthropic's recent behavior.

It is a massive bummer, up until those few weeks ago, i was hard pulling for anthropic for quite some time, now i just dont care and hope something dope emerges quickly that signals i wont ever have to consider either of them.


I think they will try to get into the enterprise as soon as possible - looking out for big purchases - figma, atlassian, asana, monday, etc...

Yeah, at 100$ or 200$ a month my expectations would raise (and tolerance to errors go to zero) as we are going into enterprise level pricing.

I was part of a team researching MS at a university a while ago. It truly is an endlessly fascinating disease. Most evidence currently points to MS being caused by a combination of Epstein-Barr infection and genetic factors [0,1]. It is hypothesized that Epstein-Barr triggers autoimmunity which results in the prototypical demyelination [2].

[0]: https://www.science.org/doi/10.1126/science.abj8222

[1]: https://www.pnas.org/doi/10.1073/pnas.2424986122

[2]: https://www.nature.com/articles/s41586-022-04432-7


Demyelination needs more attention in the general populace.

Thanks very much for posting.


I smell bad data. This sounds too good to be true and most studies of this kind have turned out to be false a few years down the line.

Edit: one of many examples: https://www.science.org/content/article/journal-retracts-inf...


It doesn't seem to link to any data at all so we can't check, but I wouldn't be entirely surprised if they used the "standard" P=0.05.

I think for something this unexpected you'd want a much lower P.


Theoretically, you can’t benchmaxx ARC-AGI, but I too am suspect of such a large improvement, especially since the improvement on other benchmarks is not of the same order.


https://arcprize.org/arc-agi/1/

It's a sort of arbitrary pattern matching thing that can't be trained on in the sense that the MMLU can be, but you can definitely generate billions of examples of this kind of task and train on it, and it will not make the model better on any other task. So in that sense, it absolutely can be.

I think it's been harder to solve because it's a visual puzzle, and we know how well today's vision encoders actually work https://arxiv.org/html/2407.06581v1


The real question is: Why are people designing benchmarks that, if a model is trained on them, it won't improve the performance of the model at any real-world tasks? Why would anyone care about such benchmarks?


People are like typewriter monkeys, if something is possible to make it'll eventually be made.


I think you could have discovered this bug more easily by looking at the commit(s) that were made when the problem started.


This is a great technique!

In this case, I had made an overlarge squashed merge that included both the Intercom integration (a suspiciously likely cause of slowness) and the feedback button that added the heart – so I needed to go deeper to figure out the true cause. (Noto Emoji was in the app from before, but wasn't triggered in the dashboard until we added an emoji there.)


> In this article we'll tell you why we decided to put Claude Code into RollerCoaster Tycoon, and what lessons it taught us about B2B SaaS.

What is this? A LinkedIn post?


> Your outlook above is too self critical. This is the first time an AI has beaten this park much less played a full game of RollerCoaster Tycoon through a TUI. There are important learnings for B2B SaaS. This isn't LinkedIn (it is, in fact, LinkedIn). But seriously. What can we learn here.

From the transcript: https://htmlpreview.github.io/?https://gist.githubuserconten... :)


Starlink receivers are actually very complicated. They make use of a bunch of high-end FPGAs and a bunch of other expensive and uncommon components. See this teardown: https://youtu.be/h6MfM8EFkGg?si=m-sN6UW4nh8_HzPR.


If I read one more article/press release/whatever with such clumsy use of antithesis, I’m going to go insane. I have no problem with using AI to write if it is done well, but this…


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: