Hacker Newsnew | past | comments | ask | show | jobs | submit | schneems's commentslogin

This is neat. I wrote https://github.com/zombocom/rundoc. It has a similar feature. The main driver is to produce tutorials so it also puts the output of commands run back in the document.

They are bad at math. But they are good at writing code and as an optimization some providers have it secretly write code to answer the problem, run it and give you the answer without telling you what it did in the middle part.

Someone should tell the mathematicians if they use a calculator or a whiteboard or heavens forbid a computer they are "bad at math".

1) That's not related to chain of thought I was replying to. Someone asked about the "bad at math" and pointed out "but it seems good to me" so I added the color of why that might be the case. Your retort seems to imply I'm making an argument that because something uses tools for a job it cannot be good at the thing it's using a tool for. Which is not the case.

2) If you have something to say, just say it. Don't put words in my mouth and then argue with a thing I didn't say.


Right, but your narrative was incorrect and based on faulty premises, which you haven't acknowledged. That's fine, except you're still pressing the argument.

Can you please present a reasonable maths problem that I can bounce off GPT so we can see it fail? I can give you many hundreds of relatively complex problems, none of which have appeared in a textbook, that GPT has not only solved, but critiqued my own crappy solutions for. I'm only asking you for one counterpoint.


> your narrative was incorrect and based on faulty premises

I am referring to specific, documented behavior of LLMs. Google it.


Google any plausibly reasonable math problem, and even the terrible LLM that powers the Google search page will almost certainly solve it correctly for you.

I don't need to reconstruct my argument axiomatically from folk beliefs.


You seem to have misunderstood my comment. I'm happy to accept the fault for poor communication. But you're making it hard. You're signaling that any clarifications on my behalf will be treated as further arguments instead of some sort of shared desire to hear one another. I don't care to continue.

What would I do to demonstrate that they are bad at math? If by "maths" we mean things like working out a double integral for a joint probability problem, or anything simpler than that, GPT5 has been flawless.

Search the topic. It is historically documented. It might no longer be true though.

A way to test might be running an open model locally, directly (without a harness) where you could be sure it's not going through a translation layer. I think these days it might have this tool call behavior built in, but I think back in the day it was treated more like a magic trick. Without it, it behaved similar to "how many r's are in strawberry" for simple math.


It is wildly not true.

The request is for some reasonable math problem a model like GPT or Claude will fail at. I'm not going to set up a local model or some harness for it; I'm just going to copy/paste it into ChatGPT and watch it solve it.

Propose a problem, if you think I'm wrong about this. Seems simple.


> wildly not true

Source? Did you search anything like I suggested or no?


My argument: you can take basically any undergraduate collegiate math problem, right now, and it's likely that even the dumb LLM on the Google search page will solve, and nearly certain that frontier models will.

Your argument: "it is possible to Google for people claiming LLMs can't do math".


Are they bad at math? Or are they bad at arithmetic?

if you don't know much math, it's easy to confuse the two

Neither.

I see this being useful in infrastructure tools. Imagine a statically compiled bundler that can also do the job of RVM and friends (installing Ruby) but it is still written in Ruby.

The classic Ruby buildpack is written in Ruby but we have to bootstrap it with bash and it's annoying and has edge cases. The CNB is written in rust to not have that problem and the idea that you can ship a single binary with no dependencies is really powerful.


Puma 8.0+ webserver now defaults to IPv6

> Or just once, to say the entire OSS committee was employed by Shopify,

Mike works at Basecamp (now and then). Based on comms I don't believe any of them acted on behalf of their employer i.e. no "team orders." Or if they did, they did so in ways that aligned with my perception of what I believed to be the correct read of the situation.

I also think that we (as humans) are much less incapable of knowing what things sway and influence our opinions than we think. We are much less capable of correcting for conflicts of interest than we would like. The study "tappers and listeners" is about adjusting for knowledge (curse of knowledge), but I think it applies to influence as well. Which is to say...I'm sure that everyone was influenced in many ways, but I felt they acted as individuals and reacted in real time.

There are other details of affiliations that I omitted from the former maintainers as well, that are true to state, and likely had some impact on their decisions ... but I used judgment to omit what I didn't think was fair or didn't think was immediately relevant. Not saying I got it all right all the time, but sort of chiming in to say "I'm not only omitting information in favor of one party." Yes, I'm biased...but I'm trying to correct for that bias. (A funny thing to state after just saying humans are bad at it, I know).


> (2 current, 1 former) of Shopify's technical leadership

You'll have to take me on my word about it...but if I saw this as a driver of the issue I would have included it. I think saying "shopify was involved" is sort of like saying "people talked about RV at Rails World." Shopify is huge and hugely invested in Ruby's OSS ecosystem. I have my own critiques of the company, but not here. I think they're a net positive for Ruby OSS. I wish the general response was "more companies need to step up, I'll go talk to my leadership" rather than knocking these volunteers for their involvement. I've said elsewhere that if I were in the committee or in their shoes...I don't think the outcome would have been different (even if details would have). Also, you are welcome to disagree and have a different opinion.

I agree that it's best not to have situations like this. PSF bylaws "Section 5.15. Limits on Co-affiliation of Board Members." and similar rules are generally good at preventing the perception of conflict of interest (which is also important...that the perception alone can be damaging).

Right now, the committee is 100% one company (me). Because I'm the only one on it. Which is also a problem. Also, we're in a rebuilding/re-prioritizing phase with all of this...so it's hard to onboard while things are in flux.


> dispute in the stewardship of the bundler

This was never in dispute from the two parties. Ruby Central and "the maintainers" agreed from the beginning that it was collateral damage. The disagreement was what that meant and what to do with it. Hence the Sept 10 message from the Ruby Central Committee that they should move it to the Ruby core org (which IMO is long overdue).

The original plan (by the oss committee)was to move bundler to the Ruby org, that's what happened. When it did, the community generally like it (on HN and reddit comments).


You got your reply already. To add: YJIT is the one that does "basic block versioning" (Which was Maxime's thesis) while ZJIT is a more traditional design.

I am confident in that description but don't actually know what it means in practice (yes I've seen papers and talks, but I kinda need not-compiler-engineer to explain it to me.)

As I understand it BBV still holds promise, but the sheer volume of knowledge of more traditional methods might mean it gets better outcomes (also IIRC ZJIT is still lagging YJIT).


I gave a talk about ZJIT and the motivation for the change at RubyKaigi 2025 if people are curious. It's on YouTube.



Thanks for all your work Maxime!


> IIRC ZJIT is still lagging YJIT

It would be nice to have ZJIT on speed.ruby-lang.org!


Pedantic point: YC has ads, they are just blend in much better and are delivered in the same medium.

Hiring posts (definitively) and tech posts (maybe) by YC companies. The whole product is one big ad for a venture fund. Its generally well done and unobtrusive. So kudos to them for that it goes relatively unnoticed.


Relevancy is a big point here. HN readers work in tech or are super interested in tech, YC companies do very technical things so hiring posts or launches tend to blend right in for the most part.


In other words, we're still the product at HN: the customers are just not advertisers.


Needs (2015) in the title.

Five years later, I wrote a novel algorithm for rate limiting GCRA clients https://www.schneems.com/2020/07/08/a-fast-car-needs-good-br....


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: