First, the obvious one, is that LLMs are trained to auto-regressively predict human training samples (i.e. essentially to copy them, without overfitting), so OF COURSE they are going to sound like the training set - intelligent, reasoning, understanding, etc, etc. The mistake is to anthropomorphize the model because it sounds human, and associate these attributes of understanding etc to the model itself rather than just reflecting the mental abilities of the humans who wrote the training data.
The second point is perhaps a bit more subtle, and is about the nature of understanding and the differences between what an LLM is predicting and what the human cortex - also a prediction machine - is predicting...
When humans predict, what we're predicting is something external to ourself - the real world. We observe, over time we see regularities, and from this predict we'll continue to see those regularities. Our predictions include our own actions as an input - how will the external world react to our actions, and therefore we learn how to act.
Understanding something means being able to predict how it will behave, both left alone, and in interaction with other objects/agents, including ourselves. Being able to predict what something will do if you poke it is essentially what it means to understand it.
What an LLM is predicting is not the external world and how it reacts to the LLMs actions, since it is auto-regressively trained - it is only predicting a continuation of it's own output (actions) based on it's own immediately preceding output (actions)! The LLM therefore itself understands nothing since it has no grounding for what it is "talking about", and how the external world behaves in reaction to it's own actions.
The LLMs appearance of "understanding" comes solely from the fact that it is mimicking the training data, which was generated by humans who do have agency in the world and understanding of it, but the LLM has no visibility into the generative process of the human mind - only to the artifacts (words) it produces, so the LLM is doomed to operate in a world of words where all it might be considered to "understand" is it's own auto-regressive generative process.
You’re restating two claims that sound intuitive but don’t actually hold up when examined:
1. “LLMs just mimic the training set, so sounding like they understand doesn’t imply understanding.”
This is the magic argument reskinned. Transformers aren’t copying strings, they’re constructing latent representations that capture relationships, abstractions, and causal structure because doing so reduces loss. We know this not by philosophy, but because mechanistic interpretability has repeatedly uncovered internal circuits representing world states, physics, game dynamics, logic operators, and agent modeling. “It’s just next-token prediction” does not prevent any of that from occurring. When an LLM performs multi-step reasoning, corrects its own mistakes, or solves novel problems not seen in training, calling the behavior “mimicry” explains nothing. It’s essentially saying “the model can do it, but not for the reasons we’d accept,” without specifying what evidence would ever convince you otherwise. Imaginary distinction.
2. “Humans predict the world, but LLMs only predict text, so humans understand but LLMs don’t.”
This is a distinction without the force you think it has. Humans also learn from sensory streams over which they have no privileged insight into the generative process. Humans do not know the “real world”; they learn patterns in their sensory data. The fact that the data stream for LLMs consists of text rather than photons doesn’t negate the emergence of internal models. An internal model of how text-described worlds behave is still a model of the world.
If your standard for “understanding” is “being able to successfully predict consequences within some domain,” then LLMs meet that standard, just in the domains they were trained on, and today's state of the art is trained on more than just text.
You conclude that “therefore the LLM understands nothing.” But that’s an all-or-nothing claim that doesn’t follow from your premises. A lack of sensorimotor grounding limits what kinds of understanding the system can acquire; it does not eliminate all possible forms of understanding.
Wouldn't the birds that have the ability to navigate from the earth's magnetic field soon say humans have no understanding of electromagnetism ? They get trained on sensorimotor data humans will never be able to train on. If you think humans have access to the "real world" then think again. They have a tiny, extremely filtered slice of it.
Saying “it understands nothing because autoregression” is just another unfalsifiable claim dressed as an explanation.
> This is the magic argument reskinned. Transformers aren’t copying strings, they’re constructing latent representations that capture relationships, abstractions, and causal structure because doing so reduces loss.
Sure (to the second part), but the latent representations aren't the same as a humans. The human's world that they have experience with, and therefore representations of, is the real word. The LLM's world that they have experience with, and therefore representations of, is the world of words.
Of course an LLM isn't literally copying - it has learnt a sequence of layer-wise next-token predictions/generations (copying of partial embeddings to next token via induction heads etc), with each layer having learnt what patterns in the layer below it needs to attend to, to minimize prediction error at that layer. You can characterize these patterns (latent representations) in various ways, but at the end of the day they are derived from the world of words it is trained on, and are only going to be as good/abstract as next token error minimization allows. These patterns/latent representations (the "world model" of the LLM if you like) are going to be language-based (incl language-based generalizations), not the same as the unseen world model of the humans who generated that language, whose world model describes something completely different - predictions of sensory inputs and causal responses.
So, yes, there is plenty of depth and nuance to the internal representations of an LLM, but no logical reason to think that the "world model" of an LLM is similar to the "world model" of a human since they live in different worlds, and any "understanding" the LLM itself can be considered as having is going to be based on it's own world model.
> Saying “it understands nothing because autoregression” is just another unfalsifiable claim dressed as an explanation.
I disagree. It comes down to how do you define understanding. A human understands (correctly predicts) how the real world behaves, and the effect it's own actions will have on the real world. This is what the human is predicting.
What an LLM is predicting is effectively "what will I say next" after "the cat sat on the". The human might see a cat and based on circumstances and experience of cats predict that the cat will sit on the mat. This is because the human understands cats. The LLM may predict the next word as "mat", but this does not reflect any understanding of cats - it is just a statistical word prediction based on the word sequences it was trained on, notwithstanding that this prediction is based on the LLMs world-of-words-model.
>So, yes, there is plenty of depth and nuance to the internal representations of an LLM, but no logical reason to think that the "world model" of an LLM is similar to the "world model" of a human since they live in different worlds, and any "understanding" the LLM itself can be considered as having is going to be based on it's own world model.
So LLMs and Humans are different and have different sensory inputs. So what ? This is all animals. You think dolphins and orcas are not intelligent and don't understand things ?
>What an LLM is predicting is effectively "what will I say next" after "the cat sat on the". The human might see a cat and based on circumstances and experience of cats predict that the cat will sit on the mat.
Genuinely don't understand how you can actually believe this. A human who predicts mat does so because of the popular phrase. That's it. There is no reason to predict it over the numerous things cats regularly sit on, often much more so the mats (if you even have one). It's not because of any super special understanding of cats. You are doing the same thing the LLM is doing here.
Orca and human brains are similar, in the sense we have a common ancestor if you look back far enough, but they are still very different and focus on entirely different slices of reality and input than humans will ever do. It's not something you can brush off if you really believe in input supremacy so much.
From the orca's perspective, many of the things we say we understand are similarly '2nd hand hearsay'.
To follow your hypothetical, if an Orca were to be exposed to human language, discussing human terrestrial affairs, and were able to at least learn some of the patterns, and maybe predict them, then it should indeed be considered not to have any understanding of what that stream of words meant - I wouldn't even elevate it to '2nd hand hearsay'.
Still, the Orca, unlike an LLM, does at least does have a brain, and does live in and interact with the real world, and could probably be said to "understand" things in it's own watery habitat as well as we do.
Regarding "input supremacy" :
It's not the LLMs "world of words" that really sets it apart from animals/humans, since there are also multi-model LLMs with audio and visual inputs more similar to a humans sensory inputs. The real difference is what they are doing with those inputs. The LLM is just a passive observer, whose training consisted of learning patterns in it's inputs. A human/animal is an active agent, interacting with the world, and thereby causing changes in the input data it is then consuming. The human/animal is learning how to DO things, and gaining understanding of how the word reacts. The LLM is learning how to COPY things.
There are of course many other differences between LLMs/Transformers and animal brains, but even if we were to eliminate all these differences the active vs passive one would still be critical.
If you ask a human to complete the phrase "the cat sat on the", they will probably answer "mat". This is memorization, not understanding. The LLM can do this too.
If you just input "the cat sat on the" to an LLM, it will also likely just answer "mat" since this is what LLMs do - they are next-word input continuers.
If you said "the sat sat on the" to a human, they would probably respond "huh?" or "who the hell knows!", since the human understands that cats are fickle creatures and that partial sentences are not the conversational norm.
If you ask an LLM to explain it's understanding of cats, it will happily reply, but the output will not be it's own understanding of cats - it will be parroting some human opinion(s) it got from the training set. It has no first hand understanding, only 2nd hand heresay.
>If you said "the sat sat on the" to a human, they would probably respond "huh?" or "who the hell knows!", since the human understands that cats are fickle creatures and that partial sentences are not the conversational norm.
I'm not sure what you're getting at here ? You think LLMs don't similarly answer 'What are you trying to say?'. Sometimes I wonder if the people who propose these gotcha questions ever bother to actually test them on said LLMs.
>If you ask an LLM to explain it's understanding of cats, it will happily reply, but the output will not be it's own understanding of cats - it will be parroting some human opinion(s) it got from the training set. It has no first hand understanding, only 2nd hand heresay.
Again, you're not making the distinction you think you are. Understanding from '2nd hand heresay' is still understanding. The vast majority of what humans learn in school is such.
> Sometimes I wonder if the people who propose these gotcha questions ever bother to actually test them on said LLMs
Since you asked, yes, Claude responds "mat", then asks if I want it to "continue the story".
Of course if you know anything about LLMs you should realize that they are just input continuers, and any conversational skills comes from post training. To an LLM a question is just an input whose human-preferred (as well as statistically most likely) continuation is a corresponding answer.
I'm not sure why you regard this as a "gotcha" question. If you're expressing opinions on LLMs, then table stakes should be to have a basic understanding of LLMs - what they are internally, how they work, and how they are trained, etc. If you find a description of LLMs as input-continuers in the least bit contentious then I'm sorry to say you completely fail to understand them - this is literally what they are trained to do. The only thing they are trained to do.
>Of course if you know anything about LLMs you should realize that they are just input continuers, and any conversational skills comes from post training.
No, they don't. Post-training makes things easier, more accessible and consistent but conversation skills are in pre-trained LLMs just fine. Append a small transcript to the start of the prompt and you would have the same effect.
>I'm not sure why you regard this as a "gotcha" question. If you're expressing opinions on LLMs, then table stakes should be to have a basic understanding of LLMs - what they are internally, how they work, and how they are trained, etc.
You proposed a distinction and explained a situation which would make that distinction falsifiable. And I simply told you LLMs don't respond the way you claim they would. Even when models respond mat (Now I think your original point had a typo?), it is clearly not due to a lack of understanding of what normal sentences are like.
>If you find a description of LLMs as input-continuers in the least bit contentious then I'm sorry to say you completely fail to understand them - this is literally what they are trained to do. The only thing they are trained to do.
They are predictors. If the training data is solely text then the output will be more text, but that need not be the case. Words can go in while Images or actions or audio may come out. In that sense, humans are also 'input continuers'.
>Yeah - you might want to check what you actually typed there.
That's what you typed in your comment. Go check. I just figured it was intentional since surprise is the first thing you expect humans to show in response to it.
>Not sure what you're trying to prove by doing it yourself though. Have you heard of random sampling? Never mind ...
I guess you fancy yourself a genius who knows all about LLMs now, but sampling wouldn't matter here. Your whole point was that it happens because of a fundamental limitation on the part of LLMs that causes them unable to do it. Even one contrary response, never mind multiple would be enough. After all, some humans would simply say 'mat'.
Anyway, it doesn't really matter. Completing 'mat' doesn't have anything to do with a lack of understanding. It's just the default 'assumption' that it's a completion that is being sought.
First, the obvious one, is that LLMs are trained to auto-regressively predict human training samples (i.e. essentially to copy them, without overfitting), so OF COURSE they are going to sound like the training set - intelligent, reasoning, understanding, etc, etc. The mistake is to anthropomorphize the model because it sounds human, and associate these attributes of understanding etc to the model itself rather than just reflecting the mental abilities of the humans who wrote the training data.
The second point is perhaps a bit more subtle, and is about the nature of understanding and the differences between what an LLM is predicting and what the human cortex - also a prediction machine - is predicting...
When humans predict, what we're predicting is something external to ourself - the real world. We observe, over time we see regularities, and from this predict we'll continue to see those regularities. Our predictions include our own actions as an input - how will the external world react to our actions, and therefore we learn how to act.
Understanding something means being able to predict how it will behave, both left alone, and in interaction with other objects/agents, including ourselves. Being able to predict what something will do if you poke it is essentially what it means to understand it.
What an LLM is predicting is not the external world and how it reacts to the LLMs actions, since it is auto-regressively trained - it is only predicting a continuation of it's own output (actions) based on it's own immediately preceding output (actions)! The LLM therefore itself understands nothing since it has no grounding for what it is "talking about", and how the external world behaves in reaction to it's own actions.
The LLMs appearance of "understanding" comes solely from the fact that it is mimicking the training data, which was generated by humans who do have agency in the world and understanding of it, but the LLM has no visibility into the generative process of the human mind - only to the artifacts (words) it produces, so the LLM is doomed to operate in a world of words where all it might be considered to "understand" is it's own auto-regressive generative process.