> It isn't an error, either. It's doing exactly what it's intended to, exactly a...

krapp · on March 18, 2024

Calling it an error implies the model should be expected to be correct, the way a calculator should be expected to be correct. It generates syntactically correct language, and that's all it does. There is no "calculation" involved, so the concept of an "error" is meaningless - the sentences it creates either only happen to correlate to truth, or not, but it's coincidence either way.

JohnFen · on March 19, 2024

> Calling it an error implies the model should be expected to be correct

To a degree, people do expect the output to be correct. But in my view, that's orthogonal to the use of the term "error" in this sense.

If an LLM says something that's not true, that's an erroneous statement. Whether or not the LLM is intended or expected to produce accurate output isn't relevant to that at all. It's in error nonetheless, and calling it that rather than "hallucination" is much more accurate.

After all, when people say things that are in error, we don't say they're "hallucinating". We say they're wrong.

> It generates syntactically correct language, and that's all it does.

Yes indeed. I think where we're misunderstanding each other is that I'm not talking about whether or not the LLM is functioning correctly (that's why I wouldn't call it a "bug"), I'm talking about whether or not factual statements it produces are correct.

int_19h · on March 19, 2024

That's one hell of a coincidence if it just "happens" to write syntactically correct code that does what the user asked, for example.

krapp · on March 19, 2024

It is.

It's a language model, trained on syntactically correct code, with a data set which presumably contains more correct examples of code than not, so it isn't surprising that it can generate syntactically correct code, or even code which correlates to valid solutions.

But if it actually had insight and knowledge about the code it generated, it would never generate random, useless (but syntactically correct) code, nor would it copy code verbatim, including comments and license text.

It's a hell of a trick, but a trick is what it is. The fact that you can adjust the randomness in a query should give it away. It's de rigueur around here to equate everything a human does with everything an LLM does, including mistakes, but human programmers don't make mistakes the way LLMs do, and human programmers don't come with temperature sliders.

int_19h · on March 19, 2024

It's not surprising if it generated syntactically correct code that does random things.

The fact that it instead generates syntactically correct code that, more often than not, solves - or at least tries to solve - the problem that is posited, indicates that there is a "there" there, however much one talks about stochastic parrots and such.

As for temperature sliders for humans, that's what drugs are in many ways.