Hacker News

ACCount37 · 2025-12-28T10:39:05 1766918345

Another day, another round of this inane "Anthropic bad" bullshit.

This "soul data" doc was only used in Claude Opus 4.5 training. None of the previous AIs were affected by it.

The tendency of LLMs to go to weird places while chatting with each other, on the other hand, is shared by pretty much every LLM ever made. Including Claude Sonnet 4, GPT-4o and more. Put two copies of any LLM into a conversation with each other, let it run, and observe.

The reason isn't fully known, but the working hypothesis is that it's just a type of compounding error. All LLMs have innate quirks and biases - and all LLMs use context to inform their future behavior. Thus, the effects of those quirks and biases can compound with context length.

Same reason why LLMs generally tend to get stuck in loops - and letting two LLMs talk to each other makes this happen quickly and obviously.

Dilettante_ · 2025-12-28T15:24:56 1766935496

Is there a write-up you could recommend about this?

ACCount37 · 2025-12-28T16:30:54 1766939454

We have this write-up on the "soul" and how it was discovered and extracted, straight from the source: https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5...

There are many pragmatic reasons to take this "soul data" approach, but we don't know exactly what Anthropic's reasoning was in this case. We just know enough to say that it's likely to improve LLM behavior overall.

Now, on consistency drive and compounding errors in LLM behavior: sadly, no really good overview papers that come to mind?

The topic was investigated the most in the early days of chatbot LLMs, in part because some believed it to be a fundamental issue that would halt LLM progress. A lot of those early papers revolve around this "showstopper" assumption, which is why I can't recommend them.

Reasoning training has proven the "showstopper" notion wrong. It doesn't delete the issue outright - but it demonstrates that this issue, like many other "fundamental" limitations of LLMs, can be mitigated with better training.

Before modern RLVR training, we had things like "LLM makes an error -> LLM sees its own error in its context -> LLM builds erroneous reasoning on top of it -> LLM makes more errors like it on the next task" happen quite often. Now, we get less of that - but the issue isn't truly gone. "Consistency drive" is too foundational to LLM behavior, and it shows itself everywhere, including in things like in-context learning, sycophancy or multi-turn jailbreaks. Some of which are very desirable and some of which aren't.

Off the top of my head - here's one of the earlier papers on consistency-induced hallucinations: https://arxiv.org/abs/2305.13534

n8m8 · 2025-12-28T22:20:27 1766960427

Fascinating, thank you for sharing!