Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>We know that they do not reason because we know the algorithm behind the curtain.

In other words, we didn't put the "reasoning algorithm" in LLMs therefore they do not reason. But what is this reasoning algorithm that is a necessary condition for reasoning and how do you know LLMs parameters didn't converge on it in the process of pre-training?

 help



Model parameters are weights, not algorithms. The LLM algorithm is (relatively) fixed: generate the next token according to the existing context, the model weights, and some randomization. That’s it. There is no more algorithm than that. The training parameters can shift the probabilities for predicting a token given the context, but there’s no more to it than that. There is no “reasoning algorithm” in the weights to converge to.

This overly reductive description of LLMs misses the forest for the trees. LLMs are circuit builders, the converged parameters pick out specific paths through the network that define programs. In other words, LLMs are differentiable computers[1]. Analogous to how a CPU is configured by the program state to execute arbitrary programs, the parameters of a converged LLM configure the high level matmul sequences towards a wide range of information dynamics.

Statistics has little relevance to LLM operation. The statistics of the training corpus imparts constraints on the converged circuit dynamics, but otherwise has no representation internally to the LLM.

[1] https://x.com/karpathy/status/1582807367988654081


> LLMs are circuit builders

I think they are circuit "approximators". In other words, a result of a glorified linear regression..


I called it a “big wad of linear algebra,” above. That’s all it is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: