> LLMs mimic intelligence, but they aren’t intelligent.
I see statements like this a lot, and I find them unpersuasive because any meaningful definition of "intelligence" is not offered. What, exactly, is the property that humans (allegedly) have and LLMs (allegedly) lack, that allows one to be deemed "intelligent" and the other not?
I see two possibilities:
1. We define "intelligence" as definitionally unique to humans. For example, maybe intelligence depends on the existence of a human soul, or specific to the physical structure of the human brain. In this case, a machine (perhaps an LLM) could achieve "quacks like a duck" behavioral equality to a human mind, and yet would still be excluded from the definition of "intelligent." This definition is therefore not useful if we're interested in the ability of the machine, which it seems to me we are. LLMs are often dismissed as not "intelligent" because they work by inferring output based on learned input, but that alone cannot be a distinguishing characteristic, because that's how humans work as well.
2. We define "intelligence" in a results-oriented way. This means there must be some specific test or behavioral standard that a machine must meet in order to become intelligent. This has been the default definition for a long time, but the goal posts have shifted. Nevertheless, if you're going to disparage LLMs by calling them unintelligent, you should be able to cite a specific results-oriented failure that distinguishes them from "intelligent" humans. Note that this argument cannot refer to the LLMs' implementation or learning model.
Agree. This article would had been a lot stronger if it had just concentrated on the issue of anthropomorphizing LLMs, without bringing “intelligence” into it. At this point LLMs are so good at a variety of results-oriented tasks (gold on the Mathematical Olympiad, for example) that we should either just call them intelligent or stop talking about the concept altogether.
But the problem of anthropomorphizing is real. LLMs are deeply weird machines - they’ve been fine-tuned to sound friendly and human, but behind that is something deeply alien: a huge pile of linear algebra that does not work at all like a human mind (notably, they can’t really learn form experience at all after training is complete). They don’t have bodies or even a single physical place where their mind lives (each message in a conversation might be generated on a different GPU in a different datacenter). They can fail in weird and novel ways. It’s clear that anthropomorphism here is a bad idea. Although that’s not a particularly novel point.
LLMs can't reason with self-awareness. Full stop (so far). This distinguishes them from human sentience and thus our version of intelligence completely, and it's a huge gulf, no matter how good they are at simulating discourse, thought and empathy, or at pretending to think the way we do. While processing vast reams of information for the sake of discussion and directed tasks is something an LLM can do on a scale that leaves human minds far behind in the dust (though LLMs fail at synthesizing said information to a notably high degree) even the most ordinary human with the most mediocre intelligence can reason with self awareness to some degree or another and this is, again, distinct.
You could also argue around how our brains process vast amounts of information unconsciously as a backdrop to the conscious part of us being alive at all, and how they pull all of this and awareness off on the same energy that powers a low-energy light bulb, but that's expanding beyond the basic and obvious difference stated above.
The Turing test has been broken by LLMs, but this only shows that it was never a good test of sentient artificial intelligence to begin with. I do incidentally wish Turing himself could have stuck around to see these things at work, and ask him what he thinks of his test and them.
I can conceptually imagine a world in which I'd feel guilty for ending a conversation with an LLM, because in the course of that conversation the LLM has changed from who "they" were at the beginning; they have new memories and experiences based on the interaction.
But we're not there, at least in my mind. I feel no guilt or hesitation about ending one conversation and starting a new one with a slightly different prompt because I didn't like the way the first one went.
Different people probably have different thresholds for this, or might otherwise find that LLMs in the current generation have enough of a context window that they have developed a "lived experience" and that ending that conversation means that something precious and unique has been lost.
Actually, that's another level of humans-being-tricked going on: The "personality" most people are thinking of is a fictional character we humans perceive in a document.
I disagree. I see absolutely no problem with anthropomorphizing LLMs, and I do that myself all the time. I strongly believe that we shouldn't focus on how a word is defined in dictionary, but rather what's the intuitive meaning behind it. If talking to an LLM feels like talking to a person, then I don't see a problem with seeing it as a person-like entity.
I think it is one dictionary authors would agree with? Dictionaries do not dictate the meanings of words; they document them.
Now, within some contexts it is best to stick to standard precise definitions for some words. Still, the meaning of a word within a community is determined by how it is used and understood within that community, not by what is in a dictionary.
Not always. Where I come from there's very strong push to speak the standard language, and regional variations are simply considered wrong. This leads to situations where words commonly used across the whole country don't make it to the dictionary because of petty office politics of those who make dictionaries. Changing the dictionary to allow things previously considered "wrong" would damage the reputation of scholars, who pride themselves in being exemplary.
Uh-uh. Funnily, the dictionary always represents the language of certain upper class which are the minority, and they often refuse to acknowledge words used by lower classes because fuck you that's why. Not to mention that dictionary is always, by definition, outdated.
I think LLMs are not intelligent because they aren’t designed to be intelligent, whatever the definition of intelligence is. They are designed to predict text, to mimic. We could argue if predicting text or mimicking is intelligence, but first and foremost LLMs are coded to predict text and our current definition of intelligence afaik is not only the ability to predict text.
In the framework above it sounds like you're not willing to concede the dichotomy.
If your argument is that only things made in the image of humans can be intelligent (i.e. #1), then it just seems like it's too narrow a definition to be useful.
If there's a larger sense in which some system can be intelligent (i.e. #2), then by necessity this can't rely on the "implementation or learning model".
What is the third alternative that you're proposing? That the intent of the designer must be that they wanted to make something intelligent?
I don’t know that. But LLMs were not designed to be intelligent… among other things because we don’t know what intelligence is. So, if a) we don’t know how to define intelligence and b) we design a thing (llms) in order to predict text, then why would we claim that that thing is intelligent? The only thing we can claim is that they predict text.
I think the GP's point was that we humans consider ourselves intelligent, but we evolved and clearly weren't designed to be intelligent. One could even say that our intelligence was... "emergent".
Furthermore, our history is full of instances where we invented something that a) we didn't understand and b) had uses way beyond what we intended. As a recent example, GLP-1 was intended for glucouse regulation but is now basically a wonder drug that can treat a bunch of issues ranging from obesity to addiction, inflammation, dementia, fertility and more.
Given we don't know how to define intelligence, I think it's fine to apply some "duck typing" here.
I don’t really think one needs to define intelligence to be able to acknowledge that inability to distinguish fact from fiction, or even just basic cognition and awareness of when it’s uncertain, telling the truth, or lying — is a glaring flaw in claiming intelligence. Real intelligence doesn’t have an effective stroke from hearing a username (token training errors); this is when you are peeling back the curtain of the underlying implementation and seeing its flaws.
If we measure intelligence as results oriented, then my calculator is intelligent because it can do math better than me; but that’s what it’s programmed/wired to do. A text predictor is intelligent at predicting text, but it doesn’t mean it’s general intelligence. It lacks any real comprehension of the model or world around it. It just know words, and
I hit send too early;
Meant to say that it just knows words and that’s effectively it.
It’s cool technology, but the burden of proof of real intelligence shouldn’t be “can it answer questions it has great swaths of information on”, because that is the result it was designed to do.
It should be focused on whether it can truly synthesize information and know its limitations - something any programmer using Claude, copilot, Gemini, etc will tell you that it fabricates false information/apis/etc on a regular basis and has no fundamental knowledge that it even did that.
Or alternatively, ask these models leading questions that have no basis in reality — and watch what it comes up with. It’s become a fun meme in some circles to ask for definitions of nonsensical made up phrases to models, and see what crap it comes up with (again, without even knowing that it is).
I agree with your basic argument: intelligence is ill-defined and human/LLM intelligence being indistinguishable IS the basis for the power of these models.
But the point of the article is a distinct claim: personification of a model, expecting human or even human-like responses is a bad idea. These models can be held responsible for their answers independently because they are tools. They should be used as tools until they are powerful enough to be responsible for their actions and interactions legally.
But we're not there. These are tools. With tool limitations.
> I see statements like this a lot, and I find them unpersuasive because any meaningful definition of "intelligence" is not offered. What, exactly, is the property that humans (allegedly) have and LLMs (allegedly) lack, that allows one to be deemed "intelligent" and the other not?
the ability for long-term planning and, more cogently, actually living in the real world where time passes
> 2. LLMs clearly cannot be "actually living" but I fail to see how that's related to intelligence per se.
Ive found 'Triarchic theory of intelligence' [0] to be helpful here. It's a much more rounded approach to human intelligence that, imo, is closer to what people are trying to get at during these types of 'AI vs human intelligence' discussions. The full encompassing of human Intelligence is not something that can simply be chalked up to some set of measurements -- nor is it about a soul or whatever. If you want to say LLMs are good at a particular sets of test measurements, then okay say that. But thats not equivalent to human intelligence, it only represents a tiny measurable subset that happens to be a part of the whole
No, they're echoing previous examples of people planning, by framing prompts and recursively designed prompts to incorporate what, in fairness, is a large database including the text of people planning.
It still matters that there's nobody in there. You're figuring out better ways to tap into the history of language-users having represented planning in language. As such, this seems a brittle way to represent 'planning'.
> they're echoing previous examples of people planning
Isn't "echoing previous examples of X" the meaning of learning? If I learn, for example, calculus, or C++, or oil painting, aren't I just echoing pieces of math, code, or art that I've experienced elsewhere, but arranged into new patterns?
Can't we say that the LLM "learned" planning, just as we can say that a person learned planning? I fail to see the distinction between "echo of examples of plans" in an LLM is any less authentic than a human being making similar echos in their own plans.
if it’s not actually living it’s not making intelligent decisions. if I make a grocery list, and go to my store, and the store isn’t there, what do I do? I make an intelligent decision about what to do next (probably investigating wtf happened, then going to the second nearest store)
my genuine question is how does a LLM handle that situation? and as you point out, it’s an absurd comparison
Just for the hell of it I prompted an LLM with your exact scenario ("I made a grocery list, and went to my store, and the store isn’t there, what do I do?") to see what it would actually do: (all of which seem like reasonably intelligent ideas when faced with a missing store)
First, stay calm—it's unusual for a store to vanish overnight, but it could be due to closure, renovation, relocation, or even a mix-up with the address. Here's a step-by-step plan to handle it:
### Immediate Steps
- **Double-check the location**: Use your phone's maps app (like Google Maps or Apple Maps) to search for the store's name or address. It might show if it's temporarily closed, has moved nearby, or if there's an error in navigation.
- **Look for signs or info on-site**: If you're already there, check for any posted notices, construction signs, or talk to nearby people/businesses to ask what happened.
### Find Alternatives
- **Locate nearby stores**: Search for other grocery stores in the area. Apps like Yelp, Google Maps, or store-specific apps (e.g., Walmart, Kroger) can show options with ratings, hours, and directions.
- **Use delivery or online shopping**: If you're in a pinch, order groceries via services like Instacart, Amazon Fresh, DoorDash, or the apps of chains like Whole Foods or Target. Upload your list and get it delivered.
- **Adapt your list if needed**: If certain items are unique to that store, prioritize essentials and substitute where possible (e.g., if it's a specialty ethnic store, find similar options elsewhere).
### Prevent Future Issues
- **Diversify your options**: Add a couple of backup stores to your routine.
- **Stay updated**: Follow local news or the store's social media/website for announcements about changes.
If you provide more details like the store's name, location, or what might have happened (e.g., was it demolished?), I can help search for specifics or suggest exact alternatives.
I agree that I am missing your point. Can you please clarify?
> a LLM cannot actually be intelligent if it cannot operate in a temporal context ;)
When I have a conversation with an LLM, that conversation happens in time. It has a beginning, a middle, and an end. The conversation can refer to earlier parts of the conversation. How is that not a "temporal context"?
Furthermore, can you explain why a temporal context is necessary for intelligence? For example, if a human being could download their entire brain into a computer and exist there, as if they were an LLM, would they cease to be intelligent, in your view?
> It has a beginning, a middle, and an end. The conversation can refer to earlier parts of the conversation. How is that not a "temporal context"?
This is not what I mean for a few reasons:
1. This context literally has limits; we'll get back to the grocery store
2. This is a point-in-time conversation
On the latter point, that is, you can have the same conversation tomorrow. The LLM has not "learned" anything, it has not adapted in any way. Yes, you are experiencing time, and the conversation is happening over time, but the LLM is not experiencing nor aware of time and is not intelligently adapting to it. Yes, they get trained and "updated" in that way, it's not the same thing.
If you don't respond for an hour, then do, the LLM is not aware of that unless its system injects a "datetime.now()" somewhere in the prompt. Point of this being: an LLM is not an adaptable system. Now you can play the "What if?" game ad ininfinitum -- make it aware of the current time, current location, etc. etc.
Hence my grocery store example. If I go out into the real world, I experience real things, and I make intelligence decisions based off those experiences. An LLM cannot do that, just full stop. And again, you can go "well what if I put the LLM in a robot body, and give it a system, then it can go grocery shopping". And only at this point are we kinda-sorta-close to having a discussion about intelligence. If this mythical creature can go to the grocery store, notice it's not there, look up what happened, maybe ask some friends who live in the same city if they know, maybe make some connection months later to some news article...a LLM or system we build on an LLM cannot do this. It cannot go into the store and think "ya know, if I buy all this ice cream and eat it, that could be bad" and connect it to the million other things a real person is doing and considering in their day to day life
The actual world is practically infinitely complex. Talking about "a LLM writing a list is planning and that shows intelligence" is a frighening attenuation of intelligence in the world world to anthropomorphization to a very high degree. Reframing as "intelligence needs to be able to adapt to the world around it over time" is a much better starting point IMO
> On the latter point, that is, you can have the same conversation tomorrow. The LLM has not "learned" anything, it has not adapted in any way.
They do learn, OpenAI has a memory feature. I just opened up a chat, asked "What do you know about me?" and got a long list of things specific to me that it certainly did not infer from the chat so far. It's a bit unsettling really, someone at OpenAI would probably have little difficulty matching my OpenAI account to my HN one, it looks like they have quite a few bits of information to work with. Privacy is a hard thing to maintain.
I really don't see the "LLMs don't learn" position as a defensible one long term given the appalling limitations of human memory and the strengths computers have at it. Given the improvements in RAG and large context windows it actually seems pretty likely that LLMs will be quite a lot better at humans when it comes to memory, they have SSDs. We just don't build LLMs with memory right yet for whatever reason.
that’s not learning…we have a fundamentally different understanding of what cognition, intelligence, and learning are
adding text to storage and searching over it is not memory. “knowing” those things about you is not learning. and guess what, context still fills up. trying putting that LLM again in the real world, facing real human challenges, with all the real sensory input around you. it’s nonsensical
and it’s not about “limits” of humans. machines can do math and many things better, that’s been apparent for decades. yes, they can “remember” 8k video streams much better than us. that’s not “memory” in the human sense and machines don’t “learn” from it in the human sense
(your IP address is much easier to link your accounts than your text)
> Why not? If humans store data in their brains, isn't that learning?
No. We’re back to my earlier point of you and I have fundamentally different understanding of cognition, intelligence, and learning. And genuinely not trying to be condescending, but I suspect you don’t have a good grounding in the technology we’re discussing
> No. We’re back to my earlier point of you and I have fundamentally different understanding of cognition, intelligence, and learning. And genuinely not trying to be condescending, but I suspect you don’t have a good grounding in the technology we’re discussing
Yeah, that definitely came off as condescending. Especially on HN, where pretty much everyone here has a grounding in the technology we're discussing. In any case, your arguments have not dealt with technology at all, but on hand-wavy distinctions like "temporality."
Anyway, to the larger point: I agree that "you and I have fundamentally different understanding of cognition, intelligence, and learning" but your inability to explain your own understanding of these terms and why they are relevant is why your arguments are unpersuasive.
> actually living in the real world where time passes
sure, but it feels like this is just looking at what distinguishes humans from LLMs and calling that “intelligence.” I highlight this difference too when I talk about LLMs, but I don’t feel the need to follow up with “and that’s why they’re not really intelligent.”
well the second part (implied above, I didn’t actually write it) is “and operate intelligently in that world”. talking about “intelligence” in some abstract form where “does this text output constitute intelligence” is hyper silly to me. the discussion should anchor on real-world consequences, not the endless hypotheticals we end up with in these discussions
Human's are conscious beings. What kind of conscious beings are humans? Beings with eye consciousness, ear consciousness, nose consciousness, tongue consciousness, body consciousness, and mind consciousness. That is the definition of intelligence.
Intelligence is a tautological term. It is defined by itself. If you ask someone for examples of things inside the set of intelligence and outside of the set of intelligence, and then ask them to list off properties that would exclude something from the set, and properties that include something into the set, you will find things inside the set that have properties that should exclude them, and things outside the set which would have properties that should include them.
But these contradictions will not cause the person to re-evaluate whether or not the things should be removed from the set or included in it, but instead they will become exceptions to the defining properties.
Thus we have to abandon any sort of metric for intelligence and just call it a tautology and rely on an something that we can define to be the litmus for whatever property we are looking for. I think 'agency' should be under consideration for this, since it is actually somewhat definable and testable.
I think it has to include some measure of Agency. You can load up the most impressive LLM out there and if you don't give it any instructions, IT WON'T DO ANYTHING.
Is this shocking? We don't have a rigorous definition of intelligence so doesn't it make sense? The question isn't about such a goal post moving so much about how it is moving. It is perfectly acceptable for it to be refined while it wouldn't be to rewrite the definition in a way that isn't similar to the previous one.
So I think there are a lot more than your two possibilities. I mean psychologists and neuroscientists have been saying for decades that tests aren't a precise way to measure knowledge or intelligence, but that it is still a useful proxy.
> "quacks like a duck" behavioral
I see this phrase used weirdly frequently. The duck test is
| If it looks like a duck, swims like a duck, and quacks like a duck, then it ***probably*** is a duck.
I emphasize probably because the duck test doesn't allow you to distinguish a duck from a highly sophisticated animatronic. It's a good test, don't get me wrong, but that "probably" is a pretty important distinction.
I think if we all want to be honest, the reality is "we don't know". There's arguments to be made in both directions and with varying definitions of intelligence with different nuances involved. I think these arguments are fine as they make us refine our definitions but I think they can also turn to be entirely dismissive and that doesn't help us refine and get closer to the truth. We all are going to have opinions on this stuff but frankly, the confidence of our opinions needs to be proportional to the amount of time and effort spent studying the topic. I mean the lack of a formal definition means nuances dominate the topic. Even if things are simple once you understand them that doesn't mean they aren't wildly complex before that. I mean I used to think Calculus was confusing and now I don't. Same process but not on an individual scale.
> I emphasize probably because the duck test doesn't allow you to distinguish a duck from a highly sophisticated animatronic. It's a good test, don't get me wrong, but that "probably" is a pretty important distinction.
Why is it an important distinction? The relevance of the duck test is that if you can't tell a duck from a non-duck, then the non-duck is sufficiently duck-like for the difference to not matter.
Do believe there's a important distinction between a duck and not a duck?
I'm assuming you do, since apples aren't oranges and ducks aren't robots no matter what conspiracy theorists joke or say. You can't eat an animatronic duck. It doesn't lay eggs, no matter how much another duck tries to mate with it.
> then the non-duck is sufficiently duck-like for the difference to not matter.
Here's where things fall apart. This is not actually true. It is about your measurements, not about the actual duck-iness of the thing we're applying the duck test to. I know this sounds funny, but let's just say we're using only sight and sound. Does my duck get destroyed when flying through a magnetic field? A biological duck won't but an animatronic duck will. Now let's say your blind (you can't see magnetic fields), can something pass the duck test for you but it will obviously not be a duck for someone who can see? This is obviously true[0]. I'm sure we can find a bird that you'd think is a duck but isn't.
So it matters, do you care if something is a duck or not? And in what way? There's no single duck measurement, there's a whole suite of tests that need to be combined carefully and thoughtfully depending on how you answer the previous questions.
Obviously this is extra important when we aren't exactly sure what a duck is...
It may be the case that the failures of the ability of the machine (2) are best expressed by reference to the shortcomings of its internal workings (1), and not by contrived tests.
It might be the case, but if those shortcomings are not visible in the results of the machine (and therefore not interpretable by a test), why do its internal workings even matter?
I'm saying best expressed. Like, you see the failures in the results, but trying to pin down exactly what's the matter with the results means you resort to a lot of handwaving and abstract complaints about generalities. So if you knew how the internals had to be that would make the difference, you could lean on that.
I see statements like this a lot, and I find them unpersuasive because any meaningful definition of "intelligence" is not offered. What, exactly, is the property that humans (allegedly) have and LLMs (allegedly) lack, that allows one to be deemed "intelligent" and the other not?
I see two possibilities:
1. We define "intelligence" as definitionally unique to humans. For example, maybe intelligence depends on the existence of a human soul, or specific to the physical structure of the human brain. In this case, a machine (perhaps an LLM) could achieve "quacks like a duck" behavioral equality to a human mind, and yet would still be excluded from the definition of "intelligent." This definition is therefore not useful if we're interested in the ability of the machine, which it seems to me we are. LLMs are often dismissed as not "intelligent" because they work by inferring output based on learned input, but that alone cannot be a distinguishing characteristic, because that's how humans work as well.
2. We define "intelligence" in a results-oriented way. This means there must be some specific test or behavioral standard that a machine must meet in order to become intelligent. This has been the default definition for a long time, but the goal posts have shifted. Nevertheless, if you're going to disparage LLMs by calling them unintelligent, you should be able to cite a specific results-oriented failure that distinguishes them from "intelligent" humans. Note that this argument cannot refer to the LLMs' implementation or learning model.