That's not true in practice. Floating point arithmetic is not commutative due to...

SetTheorist · 2025-07-01T14:46:14 1751381174

Nitpick: I think you mean that FP arithmetic is not _associative_ rather than non-commutative.

Commutative: A+B = B+A Associative: A+(B+C) = (A+B)+C

zorked · 2025-07-01T14:03:25 1751378605

That's basically a bug though, not an important characteristic of the system. Engineering tradeoff, not math.

e12e · 2025-07-01T15:58:13 1751385493

It's pretty important when discussing concrete implementations though, just like when using floats as coordinates in a space/astronomy simulator and getting decreasing accuracy as your objects move away from your chosen origin.

phyalow · 2025-07-01T10:59:17 1751367557

What? You can get consistent output on local models.

I can train large nets deterministically too (CUBLAS flags). What your saying isn't true in practice. Hell I can also go on the anthropic API right now and get verbatim static results.

simonw · 2025-07-01T12:22:37 1751372557

"Hell I can also go on the anthropic API right now and get verbatim static results."

How?

Setting temperature to 0 won't guarantee the exact same output for the exact same input, because - as the previous commenter said - floating point arithmetic is non-commutative, which becomes important when you are running parallel operations on GPUs.

sva_ · 2025-07-02T20:57:44 1751489864

Shouldn't it be the fact that they're non-associative? Because the reduction kernels will combine partial results (like the dot‑products in a GEMM or the sum across attention heads) in a way that the order of operations may change (non-associative), which can lead to the individual floats to be round off differently.

oxidi · 2025-07-01T14:51:43 1751381503

I think lots of people misunderstand that the "non-deterministic" nature of LLMs come from sampling the token distribution, not from the model itself.

simonw · 2025-07-01T14:55:36 1751381736

It's also the way the model runs. Setting temperature to zero and picking a fixed seed would ideally result in deterministic output from the sampler, but in parallel execution of matrix arithmetic (eg using a GPU) the order of floating point operations starts to matter, so timing differences can produce different results.

oxidi · 2025-07-01T17:53:39 1751392419

Good point. Though sampling generally happens on the CPU in a linear way. What you describe might influence the raw output logits from a single LLM step, but since the differences are only tiny, a well designed sampler could still make the output deterministic (so same seed = same text output). With a very high temperature these small differences might influence the output though, since the ranking of two tokens might be swapped.

I think the usual misconception is to think that LLM outputs are random "by default". IMHO this apparent randomness is more of a feature rather than a bug, but that may be a different conversation.