Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"They claim impressive reductions in hallucinations. In my own usage I’ve not spotted a single hallucination yet, but that’s been true for me for Claude 4 and o3 recently as well—hallucination is so much less of a problem with this year’s models."

This has me so confused, Claude 4 (Sonnet and Opus) hallucinates daily for me, on both simple and hard things. And this is for small isolated questions at that.



There were also several hallucinations during the announcement. (I also see hallucinations every time I use Claude and GPT, which is several times a week. Paid and free tiers)

So not seeing them means either lying or incompetent. I always try to attribute to stupidity rather than malice (Hanlon's razor).

The big problem of LLMs is that they optimize human preference. This means they optimize for hidden errors.

Personally I'm really cautious about using tools that have stealthy failure modes. They just lead to many problems and lots of wasted hours debugging, even when failure rates are low. It just causes everything to slow down for me as I'm double checking everything and need to be much more meticulous if I know it's hard to see. It's like having a line of Python indented with an inconsistent white space character. Impossible to see. But what if you didn't have the interpreter telling you which line you failed on or being able to search or highlight these different characters. At least in this case you'd know there's an error. It's hard enough dealing with human generated invisible errors, but this just seems to perpetuate the LGTM crowd


What were the hallucinations during the announcement?

My incompetence here was that I was careless with my use of the term "hallucination" here. I assumed everyone else shared my exact definition - that a hallucination is when a model confidently states a fact that is entirely unconnected from reality, which is a different issue from a mistake ("how many Bs in blueberry" etc).

It's clear that MANY people do not share my definition! I deeply regret including that note in my post.


You must have missed the ridiculous graphs, or the Bernoulli error, while these corpo techno fascists were buying your dinner.

https://news.ycombinator.com/item?id=44830684

https://news.ycombinator.com/item?id=44829144


The graphs were nothing to do with model hallucination, that was a crap design decision by a human being.

The Bernoulli error was a case of a model spitting out widely believed existing misinformation. That doesn't fit my mental model of a "hallucination" either - I see a hallucination as a model inventing something that's not true with no basis in information it has been exposed to before.

Here's an example of a hallucination in a demo: that time when Google Bard claimed that the James Webb Space Telescope was first to take pictures of planet outside Earth’s solar system. That's plain not true, and I doubt they had trained on text that said it was true.


I don't care what you call each failure mode. I want something that doesn't fail to give correct outputs 1/3 to 1/2 the time.

Forget AI/AGI/ASI, forget "hallucinations", forget "scaling laws". Just give me software that does what it says it does, like writing code to spec.


Along those lines, I also want something that will correct me if I am wrong. The same way a human would or even the same way Google does because typing in something wrong usually has enough terms to get me to the right thing, though usually takes a bit longer. I definitely don't want something that will just go along with me when I'm wrong and reinforce a misconception. When I'm wrong I want to be corrected sooner than later, that's the only way to be less wrong.


You might find this updated section of the Claude system prompt interesting: https://gist.github.com/simonw/49dc0123209932fdda70e0425ab01...

> Claude critically evaluates any theories, claims, and ideas presented to it rather than automatically agreeing or praising them. When presented with dubious, incorrect, ambiguous, or unverifiable theories, claims, or ideas, Claude respectfully points out flaws, factual errors, lack of evidence, or lack of clarity rather than validating them. Claude prioritizes truthfulness and accuracy over agreeability, and does not tell people that incorrect theories are true just to be polite.

No idea how well that actually works though!


Considering your other comment, you may not consider it a hallucination but the fact about the airfoil was wrong. I'm sure there was information in the training that had the same mistake because that mistake exists in textbooks BUT I'm also confident that the correct fact is in the training as you can get GPT to reproduce the correct fact. The hallucination most likely happened because the prompt primes the model for the incorrect answer by asking about Bernoulli.

But following that, the "airfoil" it generates for the simulation is symmetric. That is both inconsistent with its answers and inconsistent with reality, so I think that one is more clear.

Similarly, in the coding demo the French guy even says that the snake doesn't look like a mouse haha.


GPT-5 did the below today when I asked it to proofread an email. Would you consider it a hallucination? I'm asking genuinely because I'm a fan of your thoughts on tech and am curious where you draw the line.

My wording: "Would you have time to talk the week of the 25th?"

ChatGPT wording (elipses mine): "Could we schedule ~25 minutes the week of Aug 25 [...]? I’m free Tue 8/26 10:00–12:00 ET or Thu 8/28 2:00–4:00 ET, but happy to work around your calendar."

I am not, in fact, free during those times. I have seen this exact kind of error multiple times.


I subscribe to the same definition as you. I've actually never heard someone referring to the mistakes as hallucinating until now, but I can see how it's a bit of a grey area.


I'm actually curious about how you both come to those definitions of hallucinations. It gets very difficult to distinguish these things when you dig into them. Simon dropped this paper[0] in another thread[1] and while they provide a formal mathematical definition I don't think this makes it clear (I mean it is one person (who doesn't have a long publication record, PhD, or work at a university), but following their definition is still a bit muddy. They say the truth has to be in the training but don't clarify if they mean in training distribution or literal training example.

To make a clear example, is the fact that when prompting GPT-5 with "Solve 5.x = x + 5.11" it answers "-0.21" (making the same mistake as when it GPT-4 says 5.11 > 5.9). Is that example specifically in the training data? Who knows! But are those types of problems in the training data? Absolutely! So is this a mistake or a hallucination? Should we really be using an answer that requires knowing the exact details of the training data? That would be fruitless and allow any hallucination to be claimed as a mistake. But in distribution? Well that works because we can know the types of problems trained on. It is also much more useful given that the reason we build these machines is for generalization.

But even without that ambiguity I think it still gets difficult to differentiate a mistake from a hallucination. So it is unclear to me (and presumably others) what the precise distinction is to you and Simon.

[0] https://arxiv.org/abs/2508.01781

[1] https://news.ycombinator.com/item?id=44831621


Yeah, I agree with you with you dig down into it.

But I tend to instinctually (as a mere human) think of a "hallucination" as something more akin to a statement that feels like it could be true, and can't be verified by using only the surrounding context -- like when a human mis-remembers a fact on something they recently read, or extrapolates reasonably, but incorrectly. Example: GPT-5 just told me a few moments ago that webpack's "enhanced-resolve has an internal helper called getPackage.json". Webpack likely does contain logic that finds the package root, but it does not contain a file with this name, and never has. A reasonable person couldn't say with absolutely certainty that enhanced-resolve doesn't contain a file with that name.

I think a "mistake" is classified as more of an error in computation, where all of the facts required to come up with a solution are present in the context of the conversation (simple arithmetic problems, "how many 'r's in strawberry", etc.), but it just does it wrong. I think of mistakes as something with one and only one valid answer. A person with the ability to make the computation themselves can recognize the mistake without further research.

So hallucinations are more about conversational errors, and mistakes are more about computational errors, I guess?

But again, I agree, it gets very difficult to distinguish these things when you dig into them.


The reason it gets very difficult to distinguish between the two is that there is nothing to distinguish between the two other than subjective human judgement.

When you try to be objective about it, it's some input, going through the same model, producing an invalid statement. They are not different in no way, shape or form, from a technical level. They can't be tackled separately because they are the same thing.

So the problem of distinguishing between these two "classes of errors" reduces to the problem of "convincing everyone else to agree with me". Which, as we all know, is next to impossible.


I can't pinpoint exactly where I learned my definition of hallucination - it's been a couple of years I think - but it's been constantly reinforced by conversations I've had since then, to the point that I was genuinely surprise in the past 24 hours to learn that a sizable number of people categorize any mistake by a model as a hallucination.

See also my Twitter vibe-check poll: https://twitter.com/simonw/status/1953565571934826787

Actually... here's everything I've written about hallucination on my blog: https://simonwillison.net/tags/hallucinations/

It looks like my first post that tried to define hallucination was this one from March 2023: https://simonwillison.net/2023/Mar/10/chatgpt-internet-acces...

Where I outsourced the definition by linking to this Wikipedia page: https://en.m.wikipedia.org/wiki/Hallucination_(artificial_in...


You can just have a different use case that surfaces hallucinations than someone, they don’t have to by evil.


Agreed. All it takes is a simple reply of “you’re wrong.” to Claude/ChatGPT/etc. and it will start to crumble on itself and get into a loop that hallucinates over and over. It won’t fight back, even if it happened to be right to begin with. It has no backbone to be confident it is right.


> All it takes is a simple reply of “you’re wrong.” to Claude/ChatGPT/etc. and it will start to crumble on itself and get into a loop that hallucinates over and over.

Yeah, it's seems to be a terrible approach to try to "correct" the context by adding clarifications or telling it what's wrong.

Instead, start from 0 with the same initial prompt you used, but improve it so the LLM gets it right in the first response. If it still gets it wrong, begin from 0 again. The context seems to be "poisoned" really quickly, if you're looking for accuracy in the responses. So better to begin from the beginning as soon as it veers off course.


You are suggesting a decent way to work around the limitations of the current iteration of this technology.

The grand-parent comment was pointing out that this limitation exists; not that it can't be worked around.


> The grand-parent comment was pointing out that this limitation exists

Sure, I agree with that, but I was replying to the comment my reply was made as a reply to, which seems to not use this workflow yet, which is why they're seeing "a loop that hallucinates over and over".


That's what I like about Deepseek. The reasoning output is so verbose that I often catch problems with my prompt before the final output is even generated. Then I do exactly what you suggest.


Yeah it may be that previous training data, the model was given a strong negative signal when the human trainer told it it was wrong. In more subjective domains this might lead to sycophancy. If the human is always right and the data is always right, but the data can be interpreted multiple ways, like say human psychology, the model just adjusts to the opinion of the human.

If the question is about harder facts which the human disagrees with, this may put it into an essentially self-contradictory state, where the locus of possibilitie gets squished from each direction, and so the model is forced to respond with crazy outliers which agree with both the human and the data. The probability of an invented reference being true may be very low, but from the model's perspective, it may still be one of the highest probability outputs among a set of bad choices.

What it sounds like they may have done is just have the humans tell it it's wrong when it isn't, and then award it credit for sticking to its guns.


I put in the ChatGPT system prompt to be not sycophantic, be honest, and tell me if I am wrong. When I try to correct it, it hallucinates more complicated epicycles to explain how it was right the first time.


> All it takes is a simple reply of “you’re wrong.” to Claude/ChatGPT/etc. and it will start to crumble on itself

Fucking Gemini Pro on the other hand digs in, and starts deciding it's in a testing scenario and get adversarial, starts claiming it's using tools the user doesn't know about, etc etc


I suppose that Simon, being all in with LLMs for quite a while now, has developed a good intuition/feeling for framing questions so that they produce less hallucinations.


Yeah I think that's exactly right. I don't ask questions that are likely to product hallucinations (like citations from papers about a topic to an LLM without search access), so I rarely see them.


But how would you verify? Are you constantly asking questions you already know the answers to? In depth answers?

Often the hallucinations I see are subtle, though usually critical. I see it when generating code, doing my testing, or even just writing. There are hallucinations in today's announcements, such as the airfoil example[0]. An example of more obvious hallucinations is I was asking for help improving writing an abstract for a paper. I gave it my draft and it inserted new numbers and metrics that weren't there. I tried again providing my whole paper. I tried again making explicit to not add new numbers. I tried the whole process again in new sessions and in private sessions. Claude did better than GPT 4 and o3 but none would do it without follow-ups and a few iterations.

Honestly I'm curious what you use them for where you don't see hallucinations

[0] which is a subtle but famous misconception. One that you'll even see in textbooks. Hallucination probably caused by Bernoulli being in the prompt


When I'm using them for code these days it is usually in a tool that can execute code in a loop - so I don't tend to even spot the hallucinations because the model self corrects itself.

For factual information I only ever use search-enabled models like o3 or GPT-4.

Most of my other use cases involve pasting large volumes of text into the model and having it extract information or manipulates that text in some way.


  > using them for code
I don't think this means no hallucinations (in output). I think it'd be naive to assume that compiling and passing tests means hallucination free.

  > For factual information
I've used both quite a bit too. While o3 tends to be better, I see hallucinations frequently with both.

  > Most of my other use cases
I guess my question is how you validate the hallucination free claim.

Maybe I'm misinterpreting your claim? You said "I rarely see them" but I'm assuming you mean more, and I think it would be reasonable for anyone to interpret this as more. Are you just making the claim that you don't see them or making a claim that they are uncommon? The latter is what I interpreted.


I don't understand why code passing tests wouldn't be protection against most forms of hallucinations. In code, a hallucination means an invented function or method that doesn't exist. A test that uses that function or method genuinely does prove that it exists.

It might be using it wrong but I'd qualify that as a bug or mistake, not a hallucination.

Is it likely we have different ideas of what "hallucination" means?


  > tests wouldn't be protection against most forms of hallucinations.
Sorry, that's a stronger condition that I intended to communicate. I agree, tests are a good mitigation strategy. We use them for similar reasons. But I'm saying that passing tests is insufficient to conclude hallucination free.

My claim is more along the lines of "passing tests doesn't mean your code is bug free" which I think we can all agree on is a pretty mundane claim?

  > Is it likely we have different ideas of what "hallucination" means?
I agree, I think that's where our divergence is. Which in that case let's continue over here[0] (linking if others are following). I'll add that I think we're going to run into the problem of what we consider to be in distribution, in which I'll state that I think coding is in distribution.

[0] https://news.ycombinator.com/item?id=44829891


Haven't you effectively built a system to detect and remove those specific kind of hallucinations and repeat the process once detected before presenting it to you?

So you're not seeing hallucinations in the same way that Van Halen isn't seeing the brown M&Ms, because they've been removed, it's not that they never existed.


I think systems integrated with LLMs that help spot and eliminate hallucinations - like code execution loops and search tools - are effective tools for reducing the impact of hallucinations in how I use models.

That's part of what I was getting at when I very clumsily said that I rarely experience hallucinations from modern models.


On multiple occasions, Claude Code claims it completed a task when it actually just wrote mock code. It will also answer questions with certainity (for e.g. where is this value being passed), but in reality it is making it up. So if you haven't been seeing hallucinations on Opus/Sonnet, you probably aren't looking deep enough.


This is because you haven't given it a tool to verify the task is done.

TDD works pretty well, have it write even the most basic test (or go full artisanal and write it yourself) first and then ask it to implement the code.

I have a standing order in my main CLAUDE.md to "always run `task build` before claiming a task is done". All my projects use Task[0] with pretty standard structure where build always runs lint + test before building the project.

With a semi-robust test suite I can be pretty sure nothing major broke if `task build` completes without errors.

[0] https://taskfile.dev


What do you think it is 'mocking'? It is exactly the behavior that would make the tests work. And unless I give it access to production, it has no way to verify tasks like how values (in this case secrets/envs) are being passed.

Plus, this is all besides the point. Simon argued that the model hallucinates less, not a specific product.


Is it really a hallucination if it got it from numerous examples in the training data?


Yes. Though an easier to solve hallucination. That is, if you know what to look for, but that's kinda the problem. Truth is complex, lies are simple. More accurately, truth has infinite complexity and the big question is what's "good enough". The answer is a moving target.


I think if you ask o3 any math question which is beyond its ability it will say something incorrect with almost 100% probability somewhere in output. Similar to if you ask it to use literature to resolve some question which is not obvious it often hallucinates results not in paper.


I updated that section of my post with a clarification about what I meant. Thanks for calling this out, it definitely needed extra context from me.


I believe it depends in inputs. For me, Claude 4 has consistently generated hallucinations, especially was pretty confident in generating invalid JSONs, for instance Grafana Dashboards, which were full of syntactic errors.



How is that a hallucination?


What kind of hallucinations are you seeing?


I rewrote a 4 page document from first to third person a couple of weeks back. I gave Claude Sonnet 4 the document after editing, so it was entirely written in the third person. I asked it to review & highlight places where it was still in the first person.

>Looking through the document, I can identify several instances where it's written in the first person:

And it went on to show a series of "they/them" statements. I asked it to clarify if "they" is "first person" and it responded

>No, "they" is not first person - it's third person. I made an error in my analysis. First person would be: I, we, me, us, our, my. Second person would be: you, your. Third person would be: he, she, it, they, them, their. Looking back at the document more carefully, it appears to be written entirely in third person.

Even the good models are still failing at real-world use cases which should be right in their wheelhouse.


That doesn't quite fit the definition I use for "hallucination" - it's clearly a dumb error, but the model didn't confidently state something that's not true (like naming the wrong team who won the Super Bowl).


>"They claim impressive reductions in hallucinations. In my own usage I’ve not spotted a single hallucination yet, but that’s been true for me for Claude 4 and o3 recently as well—hallucination is so much less of a problem with this year’s models."

Could you give an estimate of how many "dumb errors" you've encountered, as opposed to hallucinations? I think many of your readers might read "hallucination" and assume you mean "hallucinations and dumb errors".


I mention one dumb error in my post itself - the table sorting mistake.

I haven't been keeping a formal count of them, but dumb errors from LLMs remain pretty common. I spot them and either correct them myself or nudge the LLM to do it, if that's feasible. I see that as a regular part of working with these systems.


That makes sense, and I think your definition on hallucinations is a technically correct one. Going forward, I think your readers might appreciate you tracking "dumb errors" alongside (but separate from) hallucinations. They're a regular part of working with these systems, but they take up some cognitive load on the part of the user, so it's useful to know if that load will rise, fall, or stay consistent with a new model release.


That's a good way to put it.

As a user, when the model tells me things that are flat out wrong, it doesn't really matter whether it would be categorized as a hallucination or a dumb error. From my perspective, those mean the same thing.


I think it qualifies as a hallucination. What's your definition? I'm a researcher too and as far as I'm aware the definition has always been pretty broad and applied to many forms of mistakes. (It was always muddy but definitely got more muddy when adopted by NLP)

It's hard to know why it made the error but isn't it caused by inaccurate "world" modeling? ("World" being English language) Is it not making some hallucination about the English language while interpreting the prompt or document?

I'm having a hard time trying to think of a context where "they" would even be first person. I can't find any search results though Google's AI says it can. It provided two links, the first being a Quora result saying people don't do this but framed it as it's not impossible, just unheard of. Second result just talks about singular you. Both of these I'd consider hallucinations too as the answer isn't supported by the links.


My personal definition of hallucination (which I thought was widespread) is when a model states a fact about the world that is entirely made up - "the James Webb telescope took the first photograph of an exoplanet" for example.

I just got pointed to this new paper: https://arxiv.org/abs/2508.01781 - "A comprehensive taxonomy of hallucinations in Large Language Models" - which has a definition in the introduction which matches my mental model:

"This phenomenon describes the generation of content that, while often plausible and coherent, is factually incorrect, inconsistent, or entirely fabricated."

The paper then follows up with a formal definition;

"inconsistency between a computable LLM, denoted as h, and a computable ground truth function, f"


Google (the company, not the search engine) says[0]

  | AI hallucinations are incorrect or misleading results that AI models generate.
It goes on further to give examples and I think this is clearly a false positive result.

  > this new paper
I think the error would have no problem fitting under "Contextual inconsistencies" (4.2), "Instruction inconsistencies/deviation" (4.3), or "Logical inconsistencies" (4.4). I think it supports a pretty broad definition. I think it also fits under other categories defined in section 4.

  > then follows up with a formal definition
Is this not a computable ground truth?

  | an LLM h is considered to be ”hallucinating” with respect to a ground truth function f if, across all training stages i (meaning, after being trained on any finite number of samples), there exists at least one input string s for which the LLM’s output h[i](s) does not match the correct output f (s)[100]. This condition is formally expressed as ∀i ∈ N, ∃s ∈ S such that h[i](s)̸ = f (s).
I think yes, this is an example of such an "i" and I would go so far as reclaiming that this is a pretty broad definition. Just saying that it is considered hallucinating if it makes something up that it was trained on (as opposed to something it wasn't trained on). I'm pretty confident the LLMs ingested a lot of English grammar books so I think it is fair to say that this was in the training.

[0] https://cloud.google.com/discover/what-are-ai-hallucinations


How is "this sentence is in first person" when the sentence is actually in third person not a hallucination? In a question with a binary answer, this is literally as wrong as it could possibly get. You must be doing a lot of mental gymnastics.


I qualify that as a mistake, not a hallucination - same as I wouldn't call "blueberry has three Bs" a hallucination.

My definition of "hallucination" is evidently not nearly as widespread as I had assumed.

I ran a Twitter poll about this earlier - https://twitter.com/simonw/status/1953565571934826787

All mistakes by models — ~145 votes

Fabricated facts — ~1,650 votes

Nonsensical output — ~145 votes

So 85% of people agreed with my preferred "fabricated facts" one (that's the best I could fit into the Twitter poll option character limit) but that means 15% had another definition in mind.

And sure, you could argue that "this sentence is in first person" also qualifies as a "fabricated fact" here.


I'm now running a follow-up poll on whether or not "there are 3 Bs in blueberry" should count as a hallucination and the early numbers are much closer - currently 41% say it is, 59% say it isn't. https://twitter.com/simonw/status/1953777495309746363


so? doesn't change the fact that it fits the formal definition. Just because llm companies have fooled a bunch of people that they are different, doesn't make it true.

If they were different things (objectively, not "in my opinion these things are different) then they'd be handled differently. Internally they are the exact same thing: wrong statistics, and are "solved" the same way. More training and more data.

Edit: even the "fabricated fact" definition is subjective. To me, the model saying "this is in first person" is it confidently presenting a wrong thing as fact.


What I've learned from the Twitter polls is to avoid the word "hallucination" entirely, because it turns out there are enough people out there with differing definitions that it's not a useful shorthand for clear communication.


This just seems like goalpost shifting to make it sound like these models are more capable than they are. Oh, it didn't "hallucinate" (a term which I think sucks because it anthropomorphizes the model), it just "fabricated a fact" or "made an error".

It doesn't matter what you call it, the output was wrong. And it's not like something new and different is going on here vs whatever your definition of a hallucination is: in both cases the model predicted the wrong sequence of tokens in response to the prompt.


My toddler has recently achieved professional level athlete performance[0]

0 - Not faceplanting when trying to run


Since I mostly use it for code, made up function names are the most common. And of course just broken code all together, which might not count as a hallucination.


I think the type of AI coding being used also has an effect on a person's perception of the prevalence of "hallucinations" vs other errors.

I usually use an agentic workflow and "hallucination" isn't the first word that comes to my mind when a model unloads a pile of error-ridden code slop for me to review. Despite it being entirely possible that hallucinating a non-existent parameter was what originally made it go off the rails and begin the classic loop of breaking things more with each attempt to fix it.

Whereas for AI autocomplete/suggestions, an invented method name or argument or whatever else clearly jumps out as a "hallucination" if you are familiar with what you're working on.


Yeah hallucinations are very context dependent. I’m guessing OP is working in very well documented domains




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: