Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> and reliably report when they don't know.

Then we need a new system, because LMs, no matter if they are large or not, cannot do that, for a very simple reason:

A LM doesn't understand "truthfulness". It has no concept of a sequence being true or not, only of a sequence being probable.

And that probability cannot work as a standin for truthfulness, because the LM doesn't produce improbable sequences to begin with...it's output will always be the most (within heat settings) probable sequence. The LM simply has no way of knowing whether the sequence it just predicted is grounded in reality or not.



> A LM doesn't understand "truthfulness". It has no concept of a sequence being true or not, only of a sequence being probable.

I claim that the human brain doesn't understand "truthfulness" either. It merely creates the impression that understanding is taking place, by adapting to social and environmental pressures. The brain has no "concepts" at all, it just generates output based on its input, its internal wiring, and a variety of essentially random factors, quite analogous to how LLMs operate.

Do you have any evidence that contradicts that claim?


> Do you have any evidence that contradicts that claim?

Empirical evidence? Yes I do.

The brain commands an entity that has to exist and function in the context of objective reality. Being unable to verify it's internal state against that, would have been negatively selected some time ago, because stating: "I'm sure that rumbling cave bear with those big sharp teeth is a peaceful herbivore" won't change the objective reality that the caveman is about to become dinner.

How that works in detail is, to the best of my knowledge, still the subject of research in the realm of neurobiology.


Wouldn't that type of response fit in with how LMs work though? That caveman likely learned a lot of things over time, like: large animals can end life more likely than small ones, animals making loud noises are likely more dangerous, sharp teeth/claws are dangerous, or I saw one of those kill another caveman. All of those things tilt the probability of associating that loud cave bear with a high risk of death. That doesn't mean there's some inherit 'truth' that the caveman brain 'knows', it's just a high probability that it's a correct assessment of the input. Every true thing is really just an evaluation of probability in the end.


I think this is incomplete on a number of levels. For a start, to be interesting, “truth” has to be something than just whatever your eyes can see. There have been wars (culture, economic, kinetic, etc) fought to define something as a truth.

The concept of truth is notoriously hard for humans to grapple with. How do we know something is true isn’t just a neurobiological question, it’s been grappled with throughout the history of philosophy — including major revisions of our understanding in the past 80 years.

And for the record, rumbling cave bears are mostly peaceful herbivores.


> And for the record, rumbling cave bears are mostly peaceful herbivores.

For the record, all members of the Genus Ursus belong to the Order Carnivora, which literally translates to "Meat Eaters". And that includes Ursus spelaeus, aka. the Cave Bear.

And while it most likely, like many modern bears, was an Omnivore, that "Omni" very much included small, hairless monkey-esque creatures with no natural defenses other than ridiculously small teeth and pathetic excuses for claws, if they happened to stumble into their cave.

> The concept of truth is notoriously hard for humans to grapple with.

I am not talking about the philosophical questions of what truth is as a concept, nor am I talking about the many capabilities of humans to purposefully reshape others perceptions of truth for their own ends.

I am talking about truth as the observable state of the objective reality, aka. the Universe we exist in and interact with. A meter is longer than a centimeter, and boiling water is warmer than frozen water at the same pressure, whether any given philosophy or fabrication agrees with that or not, is irrelevant.


That's speculation, not evidence. The traits you describe aren't demonstrably incompatible with the mechanism I proposed.


It's empirical evidence, since we exist and are very much capable of selecting the correct statement from a bunch of stochstically likely, but untruthful statements about objective reality.


I find these takes so lazy. What you have claimed here is just totally wrong.


And you don't have a shred of actual evidence to demonstrate that, only your own preconceptions about how things supposedly are.


The burden is on you to prove your claims.


I'm not trying to demonstrate that my claims are true. I'm trying to demonstrate that it is meaningless to discuss these topics in the first place, because we don't understand the workings of the mind nearly well enough to distinguish things like "truthfulness" and "concepts".


The fact that we don't know exactly how our brains work, doesn't mean we cannot observe the results of their work.

And as I have demonstrated above, humans, and for that matter other species on this planet featuring capable brains like Corvidae or Cetaceans, do in fact have a concept of truth: They are capable of recognizing false or misleading information as being incongruous with objective reality: A raven that sees me putting food into my left hand, will not jump to a patch of ground where I pretend to put food with my right hand.

This is despite the fact that my actions of "hiding the food" with the empty hand are stochastically indistinguishable from an action of actually hiding food from with my left hand.


Not we. you. Do not foist your ignorance on others. I recommend foundations of neuroscience by Henley.


If this is true, why is GPT-4 better in that regard than GPT-3.5? Or why do questions about Python yield much less hallucinations than questions about Rust, or other less popular tech?


What specifically about these observations contradicts my statement?

Wrong statements about Python are simply less probable than wrong statements about Rust, since there is more Python than Rust in the training data.

That changes exactly nothing about the fact that the system isn't able to detect when it makes a blunder in Python.


You've claimed that LLMs create most probable output, which does not necessarily align with truth. So a bigger LLM will be better at creating the most probable output, but that would not translate into being more truthful. That could be interpreted as "better LLMs are expected to be better bullshitters".

That is not what we've observed though. Quite the opposite - we're seeing that the bigger LLM is and the more domain-specific material it digested, the more truthful it becomes.

Yes it can still make an error and be unable to spot it, but so can I.


> You've claimed that LLMs create most probable output, which does not necessarily align with truth.

No, that is not my claim. That is part of the explanation for it.

My claim is this: An LLM is incapable of knowing when it produces false information, as it simply doesn't have a concept of "truthfulness". It deals in probabilities, not alignment with objective reality.

And it doesn't matter how big you make them...this fact cannot change, as it is rooted in the basic MO of language models.

So, now that we have covered what my claim actually is...

> That is not what we've observed though. Quite the opposite - we're seeing that the bigger LLM is and the more domain-specific material it digested, the more truthful it becomes.

...I can ask what this observation has to do with it, and the answer is: Nothing at all. LMs with more params may produce untruthful statements less often, but what does this change about their ability to recignize when they do produce them? And the answer is: Nothing. They still can't.


Your claim was literally proven wrong in a previous comment with GPT-4's calibration.

a LLM can indeed know when it produces likely incorrect responses. Not a hypothetical.

What's the point of making claims you have no intention of rescinding regardless of evidence ? People are so funny.


>The LM simply has no way of knowing whether the sequence it just predicted is grounded in reality or not.

Base GPT-4 was excellently calibrated. So this is just wrong.

https://imgur.com/a/3gYel9r


What's needed, ideally, is a checker. Something that takes the LLM's output, can go back to the training material, and verify the output for consistency with it.

I don't think those steps are out of the bounds of possibility, really.


> Something that takes the LLM's output, can go back to the training material, and verify the output for consistency with it.

The problem is what you mean when you say "consistency".

The LM checks if sequences are stochastically consistent with other sequences in the training data. Within that realm, the sentence: "In the Water Wars of 1999, the Antarctic Coalitions aramada of Hovercraft valiantly faught in the battle of Golehim under Rear Admiral Korakow, against the Trade Unions Fleets." is consistent. Because, while it is total bollocks, it looks stochasticaly like something that could be in a historical text.

So, in it's context, the LM does exactly what you ask for. It produces output that is consistent with the training data.

Truthfulness is a completely different form of consistency: Does the semantic meaning of the data support the statement I just made? of course it doesn't, there isn't an Antarctic Coalition, there were no Water Wars in 1999, and no one ever built an Armada of Hovercraft for any war against a "Trade Union Fleet".

But to know that, one has to understand what the data means semantically. And our current AIs ... well, don't.


>But to know that, one has to understand what the data means semantically. And our current AIs ... well, don't.

Another wrong statement, you're on a roll today.

https://arxiv.org/abs/2305.11169

https://arxiv.org/abs/2306.12672

There's a word we would use to describe your confidently erroneous statements were it one of the outputs an LLM. Wonder what that might be..


Yeah, I don't mean stochastically consistent. Semantically consistent. The job of generating content from text and the job of assessing whether two texts represent aligned concepts are two different jobs, and I wouldn't expect a single LLM to do both within itself. That's why you want a second checker.


An llm can detect when an llm has outputted an inconsistent world.

It can reason. To an extent.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: