Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The whole next word thing is interesting isn't it. I like to see it with Dennett's "Competence and comprehension" lens. You can predict the next word competently with shallow understanding. But you could also do it well with understanding or comprehension of the full picture. A mental model that allows you to predict better. Are the AIs stumbling into these mental models? Seems like it. However, because these are such black boxes, we do not know how they are stringing these mental models together. Is it a random pick from 10 models built up inside the weights? Is there any system-wide cohesive understanding, whatever that means? Exploring what a model can articualate using self-reflection would be interesting. Can it point to internal cognitive dissonance because it has been fed both evolution and intelligent design, for example? Or these exist as separate models to invoke depending on the prompt context, because all that matters is being rewarded by the current user?


Given their failure on novel logic problems, generation of meaningless text, tendency to do things like delete tests and incompetence at simple mathematics, it seems very unlikely they have built any sort of world model. It’s remarkable how competent they are given the way they work.

Predict the next word is a terrible summary of what these machines do though, they certainly do more than that, but there are significant limitations.

‘Reasoning’ etc are marketing terms and we should not trust the claims made by companies who make these models.

The Turing test had too much confidence in humans it seems.


> Predict the next word is a terrible summary of what these machines do though, they certainly do more than that

What would that be?


They generate text based on quite a large context, including hidden prompts we don’t see and their weights are distorted heavily by training. So I think there’s a lot more than a simple probability of word x coming next. That makes ‘predict next word’ a reductive summary IMO.

I do not personally feel it resembles thinking or reasoning though and really object to that framing because it is misleading many people.


> their weights are distorted heavily by training

What does that even mean? Their weights are essentially created by training. There aren't some magic golden weights that are then distorted.


I may be using the wrong terms, my impression was:

1. Weights in the model are created by ingesting the corpus

2. Techniques like reinforcement learning, alignment etc are used to adjust those weights before model release

3. The model is used and more context injected which then affects which words it will choose, though it is still heavily biased by the corpus and training.

That could be way off base though, I'd welcome correction on that.

The point I was trying to make though was that they do more than predict next word based on just one set of data. Their weights can encode entire passages of source material in the training data (https://arxiv.org/abs/2505.12546), including books, programs. This is why they are so effective at generating code snippets.

Also text injected at the last stage during use has far less weight than most people assume (e.g. https://georggrab.net/content/opus46retrieval.html) and is not read and understood IMO.

There are a lot of inputs nowadays and a lot of stages to training. So while I don't think they are intelligent I think it is reductive to call them next token predictors or similar. Not sure what the best name for them is, but they are neither next word predictors nor intelligent agents.


That extended explanation is more accurate, yes. I'd call your points 1 and 2 both training under the definition "anything that adjusts model weights is training". There are multiple stages and types of training. Right now AFAIK most (all) architectures then fix the weights and you have non-weight-affecting steps like the system prompt, context, etc.

You're right that the weights can enable the model to memorize training data.


Alignment scrubs the underlying raw output to be socially acceptable. It's an artificial superego.


I was under the impression it is a part of training which adjusts weights before release.

Are you saying it is a separate process which scrubs output before we see it?


So that might depend on model, how long ago you lasted tested it, etc. I've seen llms solve novel logic problems, generate meaningful text, retain tests just fine, and simple mathematics on newer models is a lot better.

Btw if you read the actual paper that proposes the Turing test, Turing actually rejects the framing of "can machines think"; preferring to go for the more practical "can you tell them apart in practice".


Yes, that’s the ‘too much confidence in humans’ bit - he didn’t count on some humans being easily fooled by prolix word generators. I’d be interested in his take on these generators but I think he’d be focussed on what was missing as well as the amazing progress we have seen.


So my reading of (Turing 1950)...

> "The original question, 'Can machines think?' I believe to be too meaningless to deserve discussion."

> "the question, 'Can machines think?' should be replaced by 'Are there imaginable digital computers which would do well in the imitation game?'"

> "according to this view the only way to know that a man thinks is to be that particular man. It is in fact the solipsist point of view... instead of arguing continually over this point it is usual to have the polite convention that everyone thinks."

... is: if it's practical to say the system can give meaningful intput/output on xyz in -say- natural language; we might just go ahead and say it can think about xyz, because otherwise everyone's just going to go nuts inventing new terms every time.

grey-area!thinking, kim_bruning!thinking, pet_cat!thinking, octopus!thinking, claude_opus!thinking.

Can we leave out the '!' ? Nothing to do with fooling people. Just practical ways of dealing with the overall concept.

https://courses.cs.umbc.edu/471/papers/turing.pdf


Probably worth remembering that ELIZA passed Turing tests, and was the definition of shallow prediction.


ELIZA absolutely did not ever pass anything resembling a real Turing test. A real Turing test is adversarial, the interrogator knows the testees are trying to fool him.


Landauer and Bellman, absolutely put ELIZA to an adversarial Turing test, and called it such, in 1999. [0]

But... Over in 2025, ELIZA was once again, put to the Turing test in adversarial conditions. [1] And still had people think it was a real person, over 27% of the time. Over a quarter of the testees, thought the thing was a human.

The "ELIZA Effect" wasn't coined because everyone understands that an AI isn't conscious.

[0] https://books.google.com.au/books?id=jTgMIhy6YZMC&pg=PA174

[1] https://arxiv.org/html/2503.23674v1


Unfortunately I'm not sure the Turing test posited a minimal level of intelligence for the human testers. As we have found with LLMs, humans are rather easy to fool.


> there are significant limitations

Where can we read about those significant limitations?


Well here's some:

Confabulation/Hallucination - https://github.com/lechmazur/confabulations

Failure to read context - https://georggrab.net/content/opus46retrieval.html

Deleting tests to make them pass - https://www.linkedin.com/posts/jasongorman_and-after-it-did-...

Going rogue and deleting data - https://x.com/jasonlk/status/1946069562723897802

Agent security nightmares because they are not in fact intelligent assistants - https://x.com/theonejvo/status/2015401219746128322

Failure to read or generate structured data - https://support.google.com/gemini/thread/390981629/llm-ignor...

There are many, many examples, mostly caused by people thinking LLMs are intelligent and reasoning and giving them too much power (e.g. treating them as agents, not text generators). I'm sure they're all fixed in whatever new version came out this week though.


Your sarcasm is misplaced. Without principled limitations that demonstrate the existence of a lower bound on the error rate and show that errors are correlated across invocations and models (so that you can't improve the error rate with multiple supervision), you can’t exclude the possibility that "they're all fixed in the new version" (for practical purposes).


I've seen all of these from human teammates in my 30+ years in tech.


Sure but now everyone can do them all the time at 10x speed!


It always occurred to me that LLMs may be like the language center of the brain. And there should be a "whole damn rest of the brain" behind it to steer it.

LLMs miss very important concepts, like the concept of a fact. There is no "true", just consensus text on the internet given a certain context. Like that study recently where LLMs gave wrong info if there was the biography of a poor person in the context.


I think much along the same lines. LLMs are probably even just a part of the language center.

And of course they also miss things like embodiment, mirror neurons etc.

If an LLM makes a mistake, it will tell you it is sorry. But does it really feel sorry?


> But does it really feel sorry?

And what does it mean to feel sorry? Beyond fallible and imprecise human introspective notion of "sorry", that is. A definition that can span species and computing substrates. A deanthropomorphized definition of "sorry", so to speak.


Ever practiced meditation of the form where you just witness your thoughts? It seems just like LLM generated words, both factual and confabulated nonsense.


thats unlikely. but they are awfully lot like turing machines (k/v cache ~ turing tape) so their architecture is strongly predisposed to be able to find any algorithm, possibly including reasoning


> You can predict the next word competently with shallow understanding.

I don't get this. When you say "predict the next word" what you mean is "predict the word that someone who understands would write next". This cannot be done without an understanding that is as complete as that of the human whose behaviour you are trying to predict. Otherwise you'd have the paradox that understanding doesn't influence behaviour.


Dennett also came to my mind, reading the title, but in a different sense. When people came up with theory of evolution, it was hard to conceive for many people, how do we get from "subtly selecting from random changes" to "build a complex mechanism such as human". I think Dennett offers a nice analogy with a skyscraper, how it can be built if cranes are only so tall?

In a similar way, LLMs build small abstractions, first on words, how to subtly rearrange them without changing meaning, then they start to understand logic patterns such as "If A follows B, and we're given A, then B", and eventually they learn to reason in various ways.

It's the scale of the whole process that defies human understanding.

(Also modern LLMs are not just next word predictors anymore, there is reinforcement learning component as well.)


> Are the AIs stumbling into these mental models? Seems like it.

Since nature decided to deprive me of telepathic abilities, when I want to externalize my thoughts to share with others, I'm bound to this joke of a substitute we call language. I must either produce sounds that encode my meaning, or gesture, or write symbols, or basically find some way to convey my inner world by using bodily senses as peripherals. Those who receive my output must do the work in reverse to extract my meaning, the understanding in my message. Language is what we call a medium that carries our meaning to one another's psyche.

LLMs, as their name alludes, are trained on language, the medium, and they're LARGE. They're not trained on the meaning, like a child would be, for instance. Saying that by their sole analysis of the structure and patterns in the medium they're somehow capable of stumbling upon the encoded meaning is like saying that it's possible to become an engineer, by simply mindlessly memorizing many perfectly relevant scripted lines whose meaning you haven't the foggiest.

Yes, on the surface the illusion may be complete, but can the medium somehow become interchangeable with the meaning it carries? Nothing indicates this. Everything an LLM does still very much falls within the parameters of "analyze humongous quantity of texts for patterns with massive amount of resources, then based on all that precious training, when I feed you some text, output something as if you know what you're talking about".

I think the seeming crossover we perceive is just us becoming neglectful in our reflection of the scale and significance of the required resources to get them to fool us.


Searle's Chinese Room experiment but without knowing what's in the room, and when you try to peek in you just see a cloud of fog and are left to wonder if it's just a guy with that really big dictionary or something more intelligent.


It's an octopus, perhaps: https://aclanthology.org/2020.acl-main.463.pdf

There's also this blog post: https://julianmichael.org/blog/2020/07/23/to-dissect-an-octo... (which IMO is better to read than the paper)


It's honestly disheartening and a bit shocking how everyone has started repeating the predict the next syllable criticism.

The language model predicts the next syllable by FIRST arriving in a point in space that represents UNDERSTANDING of the input language. This was true all the way back in 2017 at the time of Attention Is All You Need. Google had a beautiful explainer page of how transformers worked, which I am struggling to find. Found it. https://research.google/blog/transformer-a-novel-neural-netw...

The example was and is simple and perfect. The word bank exists. You can tell what bank means by its proximity to words, such as river or vault. You compare bank to every word in a sentence to decide which bank it is. Rinse, repeat. A lot. You then add all the meanings together. Language models are making a frequency association of every word to every other word, and then summing it to create understanding of complex ideas, even if it doesn't understand what it is understanding and has never seen it before.

That all happens BEFORE "autocompleting the next syllable."

The magic part of LLMs is understanding the input. Being able to use that to make an educated guess of what comes next is really a lucky side effect. The fact that you can chain that together indefinitely with some random number generator thrown in and keep saying new things is pretty nifty, but a bit of a show stealer.

What really amazes me about transformers is that they completely ignored prescriptive linguistic trees and grammar rules and let the process decode the semantic structure fluidly and on the fly. (I know google uses encode/decode backwards from what I am saying here.) This lets people create crazy run on sentences that break every rule of english (or your favorite language) but instructions that are still parsable.

It is really helpful to remember that transformers origins are language translation. They are designed to take text and apply a modification to it, while keeping the meaning static. They accomplish this by first decoding meaning. The fact that they then pivoted from translation to autocomplete is a useful thing to remember when talking to them. A task a language model excels at is taking text, reducing it to meaning, and applying a template. So a good test might be "take Frankenstein, and turn it into a magic school bus episode." Frankenstein is reduced to meaning, the Magic School Bus format is the template, the meaning is output in the form of the template. This is a translation, although from English to English, represented as two completely different forms. Saying "find all the Wild Rice recipes you can, normalize their ingredients to 2 cups of broth, and create a table with ingredient ranges (min-max) for each ingredient option" is closer to a translation than it is to "autocomplete." Input -> Meaning -> Template -> Output. With my last example the template itself is also generated from its own meaning calculation.

A lot has changed since 2017, but the interpreter being the real technical achievement still holds true imho. I am more impressed with AI's ability to parse what I am saying than I am by it's output (image models not withstanding.)


>represents UNDERSTANDING of the input language.

It does not have an understanding, it pattern matches the "idea shape" of words in the "idea space" of training data and calculates the "idea shape" that is likely to follow considering all the "idea shape" patterns in its training data.

It mimics understanding. It feels mysterious to us because we cannot imagine the mapping of a corpus of text to this "idea space".

It is quite similar to how mysterious a computer playing a movie can appear, if you are not aware of mapping of movie to a set of pictures, pictures to pixels, and pixels to co-ordinates and colors codes.


Semantics. Its a encoded position that represents meaning in a way that is useful and reusable. That is "understanding." It's a mathematical representation of grasp.


Yea, semantics is important. It is not "understanding" any more than a microphone+ADC is hearing.


The distinction you're making reads like substance dualism to me. Are you able to provide a clear and objective metric for assessing "understanding"? If not then you're just handwaving an effectively meaningless semantic distinction.


>objective metric for assessing "understanding"

It should involve consciousness. You would not call an AI reacting to red color as "seeing" red. Same thing.


And where is this objective metric for consciousness? Last I checked we didn't even have a sensible definition for it.

It seems to me you're just kicking the can.

Setting that issue aside. While I certainly don't believe LLMs to be conscious (an entirely subjective and arbitrary take on my part I admit) I don't see any reason that concepts such as "intelligence" and "understanding" should require it. When considering how we apply those terms to humans it seems to me they are results based and highly contextual (ie largely arbitrary).


>humans it seems to me they are results based and highly contextual (ie largely arbitrary).

Is that right? It seems that we generally say that "the computer is programmed to do", instead of "the computer understand" or "the computer knows", even if the programmed computer can produce the same result as a human who does it.


Of course we don't say that. You can't ask the (traditionally) programmed computer a freeform question and get a sensible answer back. We tried that for going on 50 years and it never really worked. (The highest achievement that comes to mind is answering jeopardy questions.)

You can very carefully construct a query in a dedicated language, debug that query, and get useful results back. But that's clearly just a human using a tool, not a machine exhibiting understanding or general knowledge.

Meanwhile you can ask a multi-billion parameter LLM a freeform question in ~any human language and it can produce a coherent and meaningful response. It can one shot pieces of code. Track down bugs based on compiler error messages. It might not (yet) be human level in many cases but to get hung up on that is to miss the point.


>multi-billion parameter LLM

This is equivalent to an `if` statement with multi-billion levels of nesting. It is just a "traditional program", just unimaginably huge.

Just because it is not "traditionally programmed" does not mean that it is not a really huge "traditional program".

Scaling something by many order of magnitude does not put it in a different category. A computer program, no matter how big, is still a computer program.


No, it's not equivalent to nested if statements. If you can mathematically demonstrate that it is I would be interested.

Anyway that's irrelevant. The point is that we use different language when referring to the one because its capabilities appear to be fundamentally different.

Your argument comes down to a claim of human exceptionalism - that a computer program can never "understand" simply by virtue of being a computer program. You haven't actually provided any defense of that claim though. You've just assumed it without justification.


>No, it's not equivalent to nested if statements.

It is. If you control the randomness involved, the output of a model is completely deterministic. Which means that it can be represented by a huge lookup table.

Anything that can be represented by a lookup table can be expressed as an `if then else` statement.


By that logic sin(x) is equivalent to a lookup table. Yeah, you can approximate it that way. But doing so at any reasonable level of precision will quickly become an exercise in the absurd. Neural networks are far worse, consisting of stacks of massive linear combinations fed into nonlinear functions.

Regardless, it remains irrelevant to the subject at hand. You're going off on a tangent rather than admit your initial claim was wrong.


>By that logic sin(x) is equivalent to a lookup table

NO!

`sin(x)` is continuous, so the domain is infinite.

But an LLM model is not a continuous function, and thus the domain of a LLM model is finite (set of all possible tokens). So using a lookup table for a model behavior would be exact and not an approximation. So it can indeed be represented by an if statement of finite size.

Hence proved!

If you don't understand something in what I wrote, I can clarify if you tell me where you have trouble following.



Even if it were true he would still be wrong. I wish HN had some mechanic to evict these sorts of troll accounts that attempt to score rhetorical points rather than honestly learning about a subject. The usual vote and flag mechanics don't work here because there's no singular and overt violation of the guidelines taking place.


that is why I said

>If you control the randomness involved


Which you essentially cannot do

It is inherently randomized


But isn't the link shared by that comment doing exactly that

https://sulbhajain.medium.com/why-llms-arent-truly-determini...

>The Thinking Machines research team showed it’s possible to fix this. They built batch-invariant kernels for RMSNorm, matrix multiplication, and attention, integrating them into the open-source inference engine vLLM.

>The outcome: 1,000 identical prompts, 1,000 identical outputs. Perfect reproducibility.

??


Language models aren’t “programmed” though.


You are right, it is worse.

It is generated by tweaking a bunch of `if` statements until the output starts to look about right.


If I “convey understanding” I transfer it from one person to another. Consciousness does not transfer with understanding.

Some people argue that consciousness emerges in early childhood. I can get an infant to understand what I am saying even if they aren’t conscious.


A microphone + ADC is hearing though, that's the whole reason we even produce microphones. So that our electronics can hear sound.


So according to you when can you qualify something as capable of hearing

1. Vibrate according input to the sound, is that hearing?

2. Generate electrical signals according to the sound, is that hearing?

3. Amplify electrical signal, does we cross the hearing mark?

4. Record the signal to a cassete tape (or use an ADC -> mp3), are we hearing yet?

5. Play it back through a speaker. Sure, we should be hearing now!

At which point exactly would you say the thing is definitely hearing?


You can reduce the human auditory process to a similar mechanical list. At which specific point would you say a human is hearing?

You've fallen into the trap of human exceptionalism but you don't seem to be aware of that fact. Are you a substance dualist or not?


>You can reduce the human auditory process to a similar mechanical list.

You can't. Because we don't know at which point sound gets registered in consciousness.


Because you can't even define what consciousness is, let alone objectively test for it.

You are entirely wrong though. You most certainly _can_ reduce the human auditory process to a (bio)mechanical list.

You have unilaterally, arbitrarily, and without justification added consciousness to that list.


>Because you can't even define what consciousness is, let alone objectively test for it.

Exactly. So if we understand "hearing" as something registered by consciousness, then implicitly things that are not conscious cannot "hear".

>reduce the human auditory process

Yes, human auditory process, yes. "Hearing" no. I see that you cleverly switched to the "auditory process" instead of "hearing". moving goal posts, are we?


Alan Watts talks about this.

If a tree falls, does it make a sound? It depends on whether there is somebody to ultimately perceive the vibrations that the falling tree made (either directly or via recording).


It is easy to answer this if we define sound as the sensation. If there is no sensor then there is no sensation. If we define sound as the vibration of air. Then yes, it will make a sound.

Most of these questions feels perplexing because some of the underlying terms are loosely defined. If we strictly define those terms, then the question answers itself.


agree to disagree. encoding a meaning is understanding. I cited a source using the word in the same way.


>agree to disagree.

Yea

>encoding a meaning is understanding.

encoding a meaning is encoding. Nothing more!


what is understanding but encoded meaning distilled into pure structure, in both cases a property of a pattern?

No need to gatekeep the word "understanding" behind subjective human experience eg qualia.


> No need to gatekeep the word "understanding" behind subjective human experience eg qualia.

Yea, I think gatekeeping is needed exactly for the same reason. Make up another word if you want..


If you make up a word, nobody will know what it means.


All of these communication can be done without using such words. It would just appear less "magical". This is done is the guise of dumbing it down, but people sometimes take it quite literally, which is what the marketing wants anyway..


I am not knowledgeable on how transformer works but, what if, us humans just do the same thing in our minds as well ? What if our feeling of "understanding" is merely just the emotional response to a pattern matching as you just said?


Yea, you said it. It is the feeling of understanding and feeling/sensing implies consciousness. Why does it matter? I don't know. All I know is that it is not the same thing, because a chunk of metal cannot feel. So I don't want it to be called by the same name.

When AI marketing (ab)uses the word, it is to project the appearance of human equivalence. And I don't like to fall for it.


Psychopaths don't feel. Are they conscious?


They don't? If they stub their toe will they feel the pain? can they "see"? can they "hear"?


> pattern matches the "idea shape" of words in the "idea space

it does much more than this. first layer has an attention mechanism on all previous tokens and spits out an activation representing some sum of all relations between the tokens. then the next layer spits out an activation representing relations of relations, and the next layer and so forth. the llm is capable of deducing a hierarchy of structural information embedded in the text.

not clear to me how this isn't "understanding".


From what I understand, it's more like "input is 1, 3, 5, 7" so "output is likely to be 9".

Understanding would be a bit generous of a term for that I guess, but that also depends on the definition of understanding.


Id really invite people to read the google blog post. https://research.google/blog/transformer-a-novel-neural-netw...

Google chose the word understanding.


Google chose "understanding" in that context, because the relevant AI/ML task is called "Natural language understanding". But that term is an aspiration. It's the problem of trying to reveal the "meaning" of text data (language) as in making sense of the symbols with computers.

Just because Transformers work well on the "Natural language understanding" task in AI, doesn't mean that a Transformer actually "understands" language in the human sense.


Thanks for the link, I will read it. But keep in mind that Google wants to sell us something.


A the time, it was a free language translation tool. You weren't paying for transformers in 2017.


True, but that doesn't mean that Google did not already have intentions to monetize it if possible.


You would think, wouldn’t you?

And yet they waited until ChatGPT was a thing and threw Bard together overnight in response.


Fair point ;)


The task is language understanding. The tool is amazing. Pianos are amazing. The task is to create music. The process is to transform movement to sound. They don't understand music.


Even if it gets the output wrong, it always seems to provide some output that indicates that it got the input right. This is the first thing that really surprised me about this tech.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: