The whole next word thing is interesting isn't it. I like to see it with Dennett's "Competence and comprehension" lens. You can predict the next word competently with shallow understanding. But you could also do it well with understanding or comprehension of the full picture. A mental model that allows you to predict better. Are the AIs stumbling into these mental models? Seems like it. However, because these are such black boxes, we do not know how they are stringing these mental models together. Is it a random pick from 10 models built up inside the weights? Is there any system-wide cohesive understanding, whatever that means?
Exploring what a model can articualate using self-reflection would be interesting. Can it point to internal cognitive dissonance because it has been fed both evolution and intelligent design, for example? Or these exist as separate models to invoke depending on the prompt context, because all that matters is being rewarded by the current user?
Given their failure on novel logic problems, generation of meaningless text, tendency to do things like delete tests and incompetence at simple mathematics, it seems very unlikely they have built any sort of world model. It’s remarkable how competent they are given the way they work.
Predict the next word is a terrible summary of what these machines do though, they certainly do more than that, but there are significant limitations.
‘Reasoning’ etc are marketing terms and we should not trust the claims made by companies who make these models.
The Turing test had too much confidence in humans it seems.
They generate text based on quite a large context, including hidden prompts we don’t see and their weights are distorted heavily by training. So I think there’s a lot more than a simple probability of word x coming next. That makes ‘predict next word’ a reductive summary IMO.
I do not personally feel it resembles thinking or reasoning though and really object to that framing because it is misleading many people.
I may be using the wrong terms, my impression was:
1. Weights in the model are created by ingesting the corpus
2. Techniques like reinforcement learning, alignment etc are used to adjust those weights before model release
3. The model is used and more context injected which then affects which words it will choose, though it is still heavily biased by the corpus and training.
That could be way off base though, I'd welcome correction on that.
The point I was trying to make though was that they do more than predict next word based on just one set of data. Their weights can encode entire passages of source material in the training data (https://arxiv.org/abs/2505.12546), including books, programs. This is why they are so effective at generating code snippets.
There are a lot of inputs nowadays and a lot of stages to training. So while I don't think they are intelligent I think it is reductive to call them next token predictors or similar. Not sure what the best name for them is, but they are neither next word predictors nor intelligent agents.
That extended explanation is more accurate, yes. I'd call your points 1 and 2 both training under the definition "anything that adjusts model weights is training". There are multiple stages and types of training. Right now AFAIK most (all) architectures then fix the weights and you have non-weight-affecting steps like the system prompt, context, etc.
You're right that the weights can enable the model to memorize training data.
So that might depend on model, how long ago you lasted tested it, etc. I've seen llms solve novel logic problems, generate meaningful text, retain tests just fine, and simple mathematics on newer models is a lot better.
Btw if you read the actual paper that proposes the Turing test, Turing actually rejects the framing of "can machines think"; preferring to go for the more practical "can you tell them apart in practice".
Yes, that’s the ‘too much confidence in humans’ bit - he didn’t count on some humans being easily fooled by prolix word generators. I’d be interested in his take on these generators but I think he’d be focussed on what was missing as well as the amazing progress we have seen.
> "The original question, 'Can machines think?' I believe to be too meaningless to deserve discussion."
> "the question, 'Can machines think?' should be replaced by 'Are there imaginable digital computers which would do well in the imitation game?'"
> "according to this view the only way to know that a man thinks is to be that particular man. It is in fact the solipsist point of view... instead of arguing continually over this point it is usual to have the polite convention that everyone thinks."
... is: if it's practical to say the system can give meaningful intput/output on xyz in -say- natural language; we might just go ahead and say it can think about xyz, because otherwise everyone's just going to go nuts inventing new terms every time.
ELIZA absolutely did not ever pass anything resembling a real Turing test. A real Turing test is adversarial, the interrogator knows the testees are trying to fool him.
Landauer and Bellman, absolutely put ELIZA to an adversarial Turing test, and called it such, in 1999. [0]
But... Over in 2025, ELIZA was once again, put to the Turing test in adversarial conditions. [1] And still had people think it was a real person, over 27% of the time. Over a quarter of the testees, thought the thing was a human.
The "ELIZA Effect" wasn't coined because everyone understands that an AI isn't conscious.
Unfortunately I'm not sure the Turing test posited a minimal level of intelligence for the human testers. As we have found with LLMs, humans are rather easy to fool.
There are many, many examples, mostly caused by people thinking LLMs are intelligent and reasoning and giving them too much power (e.g. treating them as agents, not text generators). I'm sure they're all fixed in whatever new version came out this week though.
Your sarcasm is misplaced. Without principled limitations that demonstrate the existence of a lower bound on the error rate and show that errors are correlated across invocations and models (so that you can't improve the error rate with multiple supervision), you can’t exclude the possibility that "they're all fixed in the new version" (for practical purposes).
It always occurred to me that LLMs may be like the language center of the brain. And there should be a "whole damn rest of the brain" behind it to steer it.
LLMs miss very important concepts, like the concept of a fact. There is no "true", just consensus text on the internet given a certain context. Like that study recently where LLMs gave wrong info if there was the biography of a poor person in the context.
And what does it mean to feel sorry? Beyond fallible and imprecise human introspective notion of "sorry", that is. A definition that can span species and computing substrates. A deanthropomorphized definition of "sorry", so to speak.
Ever practiced meditation of the form where you just witness your thoughts? It seems just like LLM generated words, both factual and confabulated nonsense.
thats unlikely. but they are awfully lot like turing machines (k/v cache ~ turing tape) so their architecture is strongly predisposed to be able to find any algorithm, possibly including reasoning
> You can predict the next word competently with shallow understanding.
I don't get this. When you say "predict the next word" what you mean is "predict the word that someone who understands would write next". This cannot be done without an understanding that is as complete as that of the human whose behaviour you are trying to predict. Otherwise you'd have the paradox that understanding doesn't influence behaviour.
Dennett also came to my mind, reading the title, but in a different sense. When people came up with theory of evolution, it was hard to conceive for many people, how do we get from "subtly selecting from random changes" to "build a complex mechanism such as human". I think Dennett offers a nice analogy with a skyscraper, how it can be built if cranes are only so tall?
In a similar way, LLMs build small abstractions, first on words, how to subtly rearrange them without changing meaning, then they start to understand logic patterns such as "If A follows B, and we're given A, then B", and eventually they learn to reason in various ways.
It's the scale of the whole process that defies human understanding.
(Also modern LLMs are not just next word predictors anymore, there is reinforcement learning component as well.)
> Are the AIs stumbling into these mental models? Seems like it.
Since nature decided to deprive me of telepathic abilities, when I want to externalize my thoughts to share with others, I'm bound to this joke of a substitute we call language. I must either produce sounds that encode my meaning, or gesture, or write symbols, or basically find some way to convey my inner world by using bodily senses as peripherals. Those who receive my output must do the work in reverse to extract my meaning, the understanding in my message. Language is what we call a medium that carries our meaning to one another's psyche.
LLMs, as their name alludes, are trained on language, the medium, and they're LARGE. They're not trained on the meaning, like a child would be, for instance. Saying that by their sole analysis of the structure and patterns in the medium they're somehow capable of stumbling upon the encoded meaning is like saying that it's possible to become an engineer, by simply mindlessly memorizing many perfectly relevant scripted lines whose meaning you haven't the foggiest.
Yes, on the surface the illusion may be complete, but can the medium somehow become interchangeable with the meaning it carries? Nothing indicates this. Everything an LLM does still very much falls within the parameters of "analyze humongous quantity of texts for patterns with massive amount of resources, then based on all that precious training, when I feed you some text, output something as if you know what you're talking about".
I think the seeming crossover we perceive is just us becoming neglectful in our reflection of the scale and significance of the required resources to get them to fool us.
Searle's Chinese Room experiment but without knowing what's in the room, and when you try to peek in you just see a cloud of fog and are left to wonder if it's just a guy with that really big dictionary or something more intelligent.
It's honestly disheartening and a bit shocking how everyone has started repeating the predict the next syllable criticism.
The language model predicts the next syllable by FIRST arriving in a point in space that represents UNDERSTANDING of the input language. This was true all the way back in 2017 at the time of Attention Is All You Need. Google had a beautiful explainer page of how transformers worked, which I am struggling to find. Found it. https://research.google/blog/transformer-a-novel-neural-netw...
The example was and is simple and perfect. The word bank exists. You can tell what bank means by its proximity to words, such as river or vault. You compare bank to every word in a sentence to decide which bank it is. Rinse, repeat. A lot. You then add all the meanings together. Language models are making a frequency association of every word to every other word, and then summing it to create understanding of complex ideas, even if it doesn't understand what it is understanding and has never seen it before.
That all happens BEFORE "autocompleting the next syllable."
The magic part of LLMs is understanding the input. Being able to use that to make an educated guess of what comes next is really a lucky side effect. The fact that you can chain that together indefinitely with some random number generator thrown in and keep saying new things is pretty nifty, but a bit of a show stealer.
What really amazes me about transformers is that they completely ignored prescriptive linguistic trees and grammar rules and let the process decode the semantic structure fluidly and on the fly. (I know google uses encode/decode backwards from what I am saying here.) This lets people create crazy run on sentences that break every rule of english (or your favorite language) but instructions that are still parsable.
It is really helpful to remember that transformers origins are language translation. They are designed to take text and apply a modification to it, while keeping the meaning static. They accomplish this by first decoding meaning. The fact that they then pivoted from translation to autocomplete is a useful thing to remember when talking to them. A task a language model excels at is taking text, reducing it to meaning, and applying a template. So a good test might be "take Frankenstein, and turn it into a magic school bus episode." Frankenstein is reduced to meaning, the Magic School Bus format is the template, the meaning is output in the form of the template. This is a translation, although from English to English, represented as two completely different forms. Saying "find all the Wild Rice recipes you can, normalize their ingredients to 2 cups of broth, and create a table with ingredient ranges (min-max) for each ingredient option" is closer to a translation than it is to "autocomplete." Input -> Meaning -> Template -> Output. With my last example the template itself is also generated from its own meaning calculation.
A lot has changed since 2017, but the interpreter being the real technical achievement still holds true imho. I am more impressed with AI's ability to parse what I am saying than I am by it's output (image models not withstanding.)
It does not have an understanding, it pattern matches the "idea shape" of words in the "idea space" of training data and calculates the "idea shape" that is likely to follow considering all the "idea shape" patterns in its training data.
It mimics understanding. It feels mysterious to us because we cannot imagine the mapping of a corpus of text to this "idea space".
It is quite similar to how mysterious a computer playing a movie can appear, if you are not aware of mapping of movie to a set of pictures, pictures to pixels, and pixels to co-ordinates and colors codes.
Semantics. Its a encoded position that represents meaning in a way that is useful and reusable. That is "understanding." It's a mathematical representation of grasp.
The distinction you're making reads like substance dualism to me. Are you able to provide a clear and objective metric for assessing "understanding"? If not then you're just handwaving an effectively meaningless semantic distinction.
And where is this objective metric for consciousness? Last I checked we didn't even have a sensible definition for it.
It seems to me you're just kicking the can.
Setting that issue aside. While I certainly don't believe LLMs to be conscious (an entirely subjective and arbitrary take on my part I admit) I don't see any reason that concepts such as "intelligence" and "understanding" should require it. When considering how we apply those terms to humans it seems to me they are results based and highly contextual (ie largely arbitrary).
>humans it seems to me they are results based and highly contextual (ie largely arbitrary).
Is that right? It seems that we generally say that "the computer is programmed to do", instead of "the computer understand" or "the computer knows", even if the programmed computer can produce the same result as a human who does it.
Of course we don't say that. You can't ask the (traditionally) programmed computer a freeform question and get a sensible answer back. We tried that for going on 50 years and it never really worked. (The highest achievement that comes to mind is answering jeopardy questions.)
You can very carefully construct a query in a dedicated language, debug that query, and get useful results back. But that's clearly just a human using a tool, not a machine exhibiting understanding or general knowledge.
Meanwhile you can ask a multi-billion parameter LLM a freeform question in ~any human language and it can produce a coherent and meaningful response. It can one shot pieces of code. Track down bugs based on compiler error messages. It might not (yet) be human level in many cases but to get hung up on that is to miss the point.
This is equivalent to an `if` statement with multi-billion levels of nesting.
It is just a "traditional program", just unimaginably huge.
Just because it is not "traditionally programmed" does not mean that it is not a really huge "traditional program".
Scaling something by many order of magnitude does not put it in a different category. A computer program, no matter how big, is still a computer program.
No, it's not equivalent to nested if statements. If you can mathematically demonstrate that it is I would be interested.
Anyway that's irrelevant. The point is that we use different language when referring to the one because its capabilities appear to be fundamentally different.
Your argument comes down to a claim of human exceptionalism - that a computer program can never "understand" simply by virtue of being a computer program. You haven't actually provided any defense of that claim though. You've just assumed it without justification.
It is. If you control the randomness involved, the output of a model is completely deterministic. Which means that it can be represented by a huge lookup table.
Anything that can be represented by a lookup table can be expressed as an `if then else` statement.
By that logic sin(x) is equivalent to a lookup table. Yeah, you can approximate it that way. But doing so at any reasonable level of precision will quickly become an exercise in the absurd. Neural networks are far worse, consisting of stacks of massive linear combinations fed into nonlinear functions.
Regardless, it remains irrelevant to the subject at hand. You're going off on a tangent rather than admit your initial claim was wrong.
>By that logic sin(x) is equivalent to a lookup table
NO!
`sin(x)` is continuous, so the domain is infinite.
But an LLM model is not a continuous function, and thus the domain of a LLM model is finite (set of all possible tokens). So using a lookup table for a model behavior would be exact and not an approximation. So it can indeed be represented by an if statement of finite size.
Hence proved!
If you don't understand something in what I wrote, I can clarify if you tell me where you have trouble following.
Even if it were true he would still be wrong. I wish HN had some mechanic to evict these sorts of troll accounts that attempt to score rhetorical points rather than honestly learning about a subject. The usual vote and flag mechanics don't work here because there's no singular and overt violation of the guidelines taking place.
>The Thinking Machines research team showed it’s possible to fix this. They built batch-invariant kernels for RMSNorm, matrix multiplication, and attention, integrating them into the open-source inference engine vLLM.
>Because you can't even define what consciousness is, let alone objectively test for it.
Exactly. So if we understand "hearing" as something registered by consciousness, then implicitly things that are not conscious cannot "hear".
>reduce the human auditory process
Yes, human auditory process, yes. "Hearing" no. I see that you cleverly switched to the "auditory process" instead of "hearing". moving goal posts, are we?
If a tree falls, does it make a sound? It depends on whether there is somebody to ultimately perceive the vibrations that the falling tree made (either directly or via recording).
It is easy to answer this if we define sound as the sensation. If there is no sensor then there is no sensation. If we define sound as the vibration of air. Then yes, it will make a sound.
Most of these questions feels perplexing because some of the underlying terms are loosely defined. If we strictly define those terms, then the question answers itself.
All of these communication can be done without using such words. It would just appear less "magical". This is done is the guise of dumbing it down, but people sometimes take it quite literally, which is what the marketing wants anyway..
I am not knowledgeable on how transformer works but, what if, us humans just do the same thing in our minds as well ? What if our feeling of "understanding" is merely just the emotional response to a pattern matching as you just said?
Yea, you said it. It is the feeling of understanding and feeling/sensing implies consciousness. Why does it matter? I don't know. All I know is that it is not the same thing, because a chunk of metal cannot feel. So I don't want it to be called by the same name.
When AI marketing (ab)uses the word, it is to project the appearance of human equivalence. And I don't like to fall for it.
> pattern matches the "idea shape" of words in the "idea space
it does much more than this. first layer has an attention mechanism on all previous tokens and spits out an activation representing some sum of all relations between the tokens. then the next layer spits out an activation representing relations of relations, and the next layer and so forth. the llm is capable of deducing a hierarchy of structural information embedded in the text.
Google chose "understanding" in that context, because the relevant AI/ML task is called "Natural language understanding". But that term is an aspiration. It's the problem of trying to reveal the "meaning" of text data (language) as in making sense of the symbols with computers.
Just because Transformers work well on the "Natural language understanding" task in AI, doesn't mean that a Transformer actually "understands" language in the human sense.
The task is language understanding. The tool is amazing. Pianos are amazing. The task is to create music. The process is to transform movement to sound. They don't understand music.
Even if it gets the output wrong, it always seems to provide some output that indicates that it got the input right. This is the first thing that really surprised me about this tech.