Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
LLMs struggle to explain themselves (jonathanychan.com)
50 points by jonathanyc on Aug 31, 2024 | hide | past | favorite | 80 comments


LLMs can't explain their past behavior because any time you ask them to, all they're really doing is reading the previous context from a previous inference and looking for any reason they can think of that they might have said that, so to speak.

That is, they have absolutely no genuine recollection of what they were thinking at the time they said something in the past. Even with "Tree of Thought" approaches all you're doing is recording past conversations, states, and contexts, and your new inference asking for the "justification" of that, will be a similarly totally fake justification, because as I said they have no memory but only context.

In my own app I can switch to a different LLM right in the middle of a conversation and the new LLM will just continue to always think it said everything in the prior context even though that's not the case.


> That is, they have absolutely no genuine recollection of what they were thinking at the time they said something in the past.

I really like this point. Imagine if model observability got to the point where a model could self-observe after running a response. Not only would it read the context, it could observe it's own thoughts from the past.


To be honest, there is a lot of reason to believe that this is more or less how humans work also. We might be better at figuring out why we did something, and we might be more deliberate in some cases. But for the most part I think and then logically justify their actions.


> the new LLM will just continue to always think

Even when you know, as you do, it's tough to avoid such characterizations.


I didn't use the word "think" in the human sense of the word, or even in the consciousness/qualia sense of the word. I of course only meant inference (Matrix multiplications)

What I like to say however is that LLMs are doing genuine "reasoning"; and what the emergent "intelligent" behavior of LLMs has proven to mankind is that "reasoning" and "consciousness/qualia" are separable things (not identical), and I think as recently as 2021 we didn't really know/expect that to be the case.


What other sense of the word would there be? That’s essentially what’s going on in your brain too.


I always assume some people might define "thinking" as requiring "consciousness", because up until about 2022 pretty much every human on the planet did.


So? Are you saying consciousness isn’t a solely mechanistic artifact of neural processing in your brain? What else would it be?


Consciousness/qualia actually correlates stronger to brain waves than to synaptic activity, and for that and a lot of other reasons (that I'm presently writing a paper on) I think consciousness is a phenomena made up of waves, and while it may indeed be "substrate independent" it's not just purely calculations, and therefore cannot be done in computers. Even a perfect simulation of a human brain in a computer would be nothing but a "predictor" of behavior, rather than a "feeler" of behavior.


Color me intensely skeptical. The actual neural processing absolutely corresponds with the content of thoughts. So you’re saying the experience of consciousness is something unrelated to that, yet necessarily linked through some unspecified mechanism? Because clearly the experience of consciousness depends on the content of what we are thinking and how, and is affected by things like neurotransmitter concentrations (emotional affects) which would be inaccessible to large-scale brainwave phenomena.

What is your motivation for believing this?


Basically the theory is that what neurons are doing is carrying I/O signals into a physical 3D arrangement of charge flow that can self-resonate in the EMF wave domain similar to how a radio transmitter/receiver works.

The actual qualia/consciousness part is the resonance itself. So you ask what's it resonating with? The answer: Every past instantiation of itself. I believe the Block Universe view of Physics is correct and there's an entanglement connection left behind whenever particles interact via collapse of the wave function.

So in my theory memories aren't even stored locally. When you "remember" something that is your brain resonating with nearest matches from past brains. I have a formula for resonance strength with a drop off due to time and a proportionality due to negentropy or repetition (multiple resonating matches). I'm not going to write the entire theory here, but it's about 100 pages to fully describe. The theory explains everything from fungal intelligence to why repeating things over and over makes you memorize them. It's not a theory about brains per se, it's a theory about negentropic systems of particles resonating thru the causality chain.


I’m not asking for an explanation of your theory. Sorry, but I wouldn’t read it. We have fully mechanistic explanations of thinking (including consciousness) that have no need for added complications beyond what is represented by the synaptic connectome. So why even bother adding something else?

I suspect that, to use Daniel Dennett’s terminology, you’re looking for a skyhook. You want some aspect of consciousness to not be explainable by neural nets. Why?


The "Consciousness is a Computation" theory leaves out the fact that we know brainwaves are strongly correlated to consciousness in too strong a way to be coincidence. I've noticed people believing in the purely computational view never want to talk about brain waves, electromagnetic effects, etc. and generally have a less than adequate understanding of Quantum Mechanics (or even radio circuits, to understand how resonance applies) to be able to even formulate an educated opinion. Once I mention "resonance" people just assume I mean it in the woo woo spiritual sense, because they don't even know how the circuit of a radio reciever/transmitter uses resonance, and that resonance has a precise meaning in the context of wave mechanics, including probability waves in QM.

And furthermore, no serious neuroscientist on the planet currently claims we have a viable mechanistic theory of consciousness yet, so your statement to the contrary calls into question your knowledge, even at a general level, of this entire field.


Correlation is not causation. One would expect a physical implementation of an electro-chemical neural net with loops (and consciousness / self-awareness definitely requires loops) to develop voltage biases over time, which require periodic rebalancing to continue operation. This is what brain waves are. You see the structure of the brain wave change when consciousness is chemically turned on or off (anesthesia), because those self-reinforcing loops turn off or change character when the brain is unconscious. It's like noticing that an electrical circuit gives off radio waves, and then trying to locate the behavior of the circuit in those waves because you noticed that when you switched off the device those radio waves disappeared. No, the computation is happening in the circuit; the radio waves are just an unavoidable side product of electric charges moving around.

I'm a physicist by training; I understand what resonance is.

> no serious neuroscientist on the planet currently claims we have a viable mechanistic theory of consciousness yet

This isn't true. There are dozens of mechanistic theories of consciousness: take your pick. We just don't know which one reflects the situation within our own brain, because we lack sufficient understanding of our neural circuitry to make that determination. But having dozens of different possible models and not knowing which one is "right" in the sense of describing our actual brain (while any one of them is a reasonable theory of consciousness for other systems) is very different from not even having a single model of consciousness, as you seem to be implying.


Right, I get your view of things. Probably our common ground of beliefs is that LLMs are mechanistic and they can do genuine human-level reasoning, as long as we expand the word reasoning to be slightly more general than "human reasoning". The brain is also built on Perceptrons (essentially) and can do reasoning via perceptrons like LLMs (agreeing with you there).

My special claim about wave resonance is a bit more specific and nuanced. I'm claiming that it's only memory, qualia/consciousness(also, emotions, pain, ets) that's made of waves. So we can agree that when you're thinking in terms of logic and reason itself, you may be using mostly the Perceptronics and not the waves.

But I think the "pattern matching" ability of the brain (which is 90% of what it does, my view, excluding the I/O [sensory+motor neurons]) is totally built on resonance. I claim memory and resonance are identical. When something happens that reminds you of something else (even Deja Vu) that's literally your brain being entangled with all past copies of itself and able to resonate in real-time with all of them across the causality chain of the Block Universe because they're ALL part of a single entangled structure.

EDIT: So consciousness is where your brain gets "agency" (executive decision making) from. So you can think of this "agency" aspect of a brain as "The thing that runs LLM prompts, using your Neurons as Perceptrons", and yes those LLM prompts that perform reason might be totally mechanistic just like computer LLMs.


It might be fair to say they think but they can't directly introspect the thinking process. They can only confabulate reasoning post-hoc, or use science to try to understand themselves from the outside. Same as humans.


>Same as humans

Hard disagree. LLMs are very accurately described by the "stochastic parrot" analogy that gets thrown around a lot. They do not "think" like humans at all, even if we use the word "think" because it's convenient.


I think the phrase "stochastic parrot" is misleading and has fooled millions of people into thinking that LLMs can't do genuine reasoning about situations they've never seen before, nor been trained on, which is wrong because LLMs definitely are doing genuine intelligent reasoning.

What model training is doing is building up a semantic space of vectors from which astronomically large numbers of true facts and ideas can be derived during inference. I mean like a number of facts larger than the number of molecules in the known universe. A googolplex more facts than the sum of all of humanity has ever "thought".


> What model training is doing is building up a semantic space of vectors from which astronomically large numbers of true facts and ideas can be derived during inference. I mean like a number of facts larger than the number of molecules in the known universe. A googolplex more facts than the sum of all of humanity has ever "thought".

If you've ever downloaded a model to play with locally, you may have noticed that it does not in fact contain more bits than there are atoms in the universe.


I said googolplex facts can be "derived". I didn't say they're stored. Also a single perceptron can answer an infinite number of questions. Here's how...

For example the knowledge of how to answer the linear equation "Y=mX+b" contains an infinite number of "facts". For any X prompt, you put in, you can get out a "fact" Y. That's a question and an answer. This is actually a perfect analogy too, because all Perceptrons are really doing is this exact linear math (aside from things like a tanh activation functions, etc).


If you've ever shuffled cards, you may have noticed that you can generate a deck sequence no one has ever seen before rather easily.

None of the cards are unique, but the sequence can be.

The GP specifically said that accurate predictions can be derived from the patterns in the model, not that it contains a list of all the facts one by one.


A limited number of bits means a limited amount of information.


> I think the phrase "stochastic parrot" is misleading and has fooled millions of people into thinking that LLMs can't do genuine reasoning about situations they've never seen before, nor been trained on, which is wrong because LLMs definitely are doing genuine intelligent reasoning.

Can you describe specifically which part of an LLM architecture does the "reasoning" and how it works? Because every architecture I'm familiar with is literally just a fancy way of predicting the likelihood of the next token given the stream of previous tokens and the known distribution of token based on the training data. This is not reasoning. This is simple statistical prediction, making the "stochastic parrot" analogy actually quite accurate.

> What model training is doing is building up a semantic space of vectors from which astronomically large numbers of true facts and ideas can be derived during inference. I mean like a number of facts larger than the number of molecules in the known universe. A googolplex more facts than the sum of all of humanity has ever "thought".

Sort of. It's like an extremely lossy compression process which has absolutely no guarantee that "truth" was maintained in the process. Also I'm extremely dubious of your claim that it can accurately encode "a number of facts larger than the molecules in the known universe" given that it's trivially easy to get the best LLMs to give you an incorrect answer to a question any human would easily get right.


The "reasoning" is an emergent property that no one understands yet. Yes we do all the training only to try to predict the next word (i.e. train to do word prediction), yet with enough training data, then at some scale (GPT 3.5ish) the embedding vectors in semantic space begin to build a geometric scaffolding in into the weights, for lack of a better way to phrase it.

If you know about facts like (Vector(man) minus Vector(woman) equals Vector(king) minus Vector(queen)), that's an indication that this "scaffolding" is taking shape. It means the "concept of gender" has a "direction" in the roughly 4,000 dimensional vector "space". This vector behaves geometrically, so that vectors behave in vector space as if it was a geometric space of sorts (there are directions and distances), even though there's no true space coordinates, just logical "directions". Mankind doesn't quite yet understand the "Geometry of Logic". LLMs prove we don't. I think it's a new math field to be invented.

As far as the actual number of "facts" contained in an LLM, I think you have to consider something that's a function of the number of bits in an entire model, and ask how many "states" can that store, as a rough approximation from an entropy standpoint. But these aren't pure facts. They're reasoning. I guess you can call reasoning something like "fuzzy facts", so there's a bit of uncertainty to each one of them. LLMs don't store facts, they store fuzzy reasoning. But I call it "factual" when an LLM fixes a bug in my code, or correctly states some piece of knowledge.


Perhaps we disagree on semantics here, but IMHO I wouldn't call this "reasoning". It's essentially just data compression, which is exactly what you get by constructing an encoder network that minimizes loss while trying to maximally crunch down that data into a handful of geometric dimensions.

"Mankind doesn't quite yet understand the geometry of logic" is laying it on a bit thick with the marketing speak, IMHO. It's just data compression whose result is somewhat obvious given what the loss function is optimizing for.

If a structure capable of real reasoning was being built, I wouldn't expect LLMs to get tripped up by simple questions like "How many Rs does the word Strawberry have in it?". There are only two simple reasoning systems you need to solve this question. You need to learn the English alphabet and you need to be able to count to a handful of single digit numbers, both tasks that kids of age 3-4 have mastered just fine. Putting together these two concepts allows you now to reason your way through any such question, with any word and any letter.

Instead, LLMs perform how we would mostly expect a stochastic parrot to react. They hallucinate an answer, immediately apologize when it's called out to be wrong, hallucinate a new, still incorrect answer, immediately apologize again, until they eventually get stuck in a loop of cursed context and model collapse.

I'm not suggested that an LLM couldn't learn such a reasoning task, for example, but it would need to look at many training examples of such problems, and more importantly, have an architecture and loss function that optimized for learning a mechanical pattern or equation for solving that kind of problem.

And in that regard, we're very, very far away from LLMs that can do any kind of generic reasoning, because I haven't seen any evidence that those models are generic enough that you can avoid learning lots and lots and lots of specific ways to approach and solve problems.

One thing I think it's critical to keep in mind is that improvisation upon contextually relevant data in your compressed knowledge base is not reasoning. It might sound convincing to a human reader, but when it's failing at much simpler reasoning tasks the illusion really is shattered.


Wasn't there some paper recently that showed over training models, well beyond when it is normally halted, led them to create internal generalised models of a subject, e.g. arithmetic?

Essentially, the model internalised the core concepts of arithmetic. In that sense, the "reasoning" is pre baked into the model by training. Inference just plays things back through that space.

EDIT: as I recall, this is because understanding the concepts provides better compression than remembering lots of examples. It just takes a lot more training before it discovers them.


I don't like the analogy of "compression" that much, because for example if you train a model to predict linear data points, ideally it will only end up knowing two numbers in it's model weights when it's done training: "m" and "b" in "Y=Mx+b".

Once it's successfully captured "m" and "b" it has "knowledge" with which it can predict infinite numbers of points correctly, and hopefully it didn't "compress" any of the examples but discarded all of them.


Yeah, it's not compression in the sense of compressing data. Kind of compression in that it takes less resource to encode general rules than to remember the answer for everything.

The paper said was that the most efficient bits of the network were those that encoded rules rather than remembered data. Somehow those bits gradually took over from the less efficient parts. I'll have to dig around, can't seem to find it right now.


I agree with that.


When people say "If it was reasoning, then it would be able to know, How many Rs does the word Strawberry have in it?", but that's not quite right, but I would say this instead "If it was reasoning THE SAME WAY HUMANS reason....then it would be able to...". Humans do reasoning a certain way. LLMs do reasoning a different way. But both are doing it.

But since it's not reasoning the way people do (but very differently), yes it can make mistakes that look silly to us, but still be higher IQ than any human. Intelligence is a spectrum and has different "types". You can fail at one thing but be highly intelligent at something else. Think of Savantism. Savants are definitely "reasoning" but many of savants are essentially mentally disabled by many standards of measurement, up to and including not being able to count letters in words. So saying you don't think LLMs can reason, and giving examples fails as evidence of that, is just a kind of category error, to put it politely.

The fact that LLMs can fix bugs in pretty much any code base shows it's definitely not doing just simple "word completion" (despite that way of training), but is indeed doing some kind of reasoning FAR FAR beyond what humans can yet understand. I have a feeling only coders truly understand the power of LLMs reasoning because the kind of prompts we do absolutely require extremely advanced reasoning and are definitley NOT answerable because some example somewhere already had my exact scenario (or even a remotely similar one) that the model weights essentially had just 'compressed'. Sure there is a compression aspect to what LLMs do, but that's totally orthogonal to the reasoning aspect.


I tend to agree that LLMs are not thinking in the way that we usually mean it when referring to human thinking. However I think it is dangerous to assert on the capabilities of a system based on the structure seemingly imposed by its API. Take for example the instruction set of a CPU. One could argue that a cpu only has N registers or can process only one instruction at a time because the instruction set only contains N registers, and processes instruction linearly. But on any modern application processor, the register renamer allows many more physical registers to be allocated than can be names in the ISA, and instructions are dispatched in parallel to mitigate memory latency and increase throughput.

What I mean is that an LLM is not a stochastic parrot because of its API, but rather because it does not outdo a stochastic parrot when tested.

This could change though. LLMs could think in full sentences and spoon feed them to us one token at a time, rewording as necessary to provide a few possibilities. They could memorize the letters in each token and count letters correctly despite the limitations imposed by the API.


The more tokens spent on getting a result, the more likely the result is to be accurate.

“If it walks like a duck, quacks like a duck” etc


I suspect gp means thinking in a way that is different from being able to reason.

The ability to use any manner of inference or logic to arrive at correct answers does not constitute thought, even if the question was hard.

But in general I’m in agreement about the duck.

Existing LLMs can already “think” in a non-philosophical sense far better and faster than many adults walking around.

Most folks do not Consider the Lobster anyway and would likely not pay much attention to the details of its difference from a brain.


Split brain experiments show there's a good amount of evidence that humans also largely operate in this way.


Split brain experiments show that humans missing a functional corpus callosum largely operate in this way. I think it's an open question whether all humans operate in this way, or if communication between the hemispheres is necessary for 'real' explanations of behavior. The data we have at this point would support either conclusion.


Split-brain aside, there are other studies suggesting a huge amount of post-hoc rationalization for human choices and decision making.

Like it seems obvious and intuitive that preferences form choices but it turns out that choices shape preferences just as easily.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3196841/

https://journals.sagepub.com/doi/abs/10.1177/095679762095449...

At this point, I think there is little doubt humans regularly fabricate explanations of their behaviour unknowingly. It's just a question of how much and to what degrees.


You're right. Humans have short term memory, so we can genuinely recall our reasoning, but at the same time, once an action (or even a thought) has been had, we begin weaving a new narrative about why we did that thing or had that thought, that often has no resemblance to actual truth.


This is very reductive and gross misrepresentation of split brain experiments.

Humans have various level of memory (working, short term, long term), the ability to experience meta cognition, and the autonomy to make a decision - all of which (and much, much more) informs our behavior and are also things that an LLM fundamentally does not have.

These “studies” are painfully naive at best and are at worse an intentional misinformation campaign used to bolster the value of LLM past what they are useful for.


Isn't this covered in the first FAQ in the OP?


Either you believe in emergent behavior or it’s only very good at recognizing patterns. Which is it? I made a test called LLMLOLinator (stupid name) and I can tell they cannot stray from the probability distributions they learned during training. So, I am not confident in so-called emergent behaviors.


There are no emergent behaviors; LLMs are essentially memorizing statistical patterns in the data and using lexical cues to generate responses. They cannot explain themselves reliably because they don't know what they know, nor do they know what they don't know. In fact, LLMs don't truly 'know' anything at all. These are not thinking machines—they are simply the result of statistical pattern matching on steroids. That's why there will be no AGI, at least not from LLMs: https://www.lycee.ai/blog/why-no-agi-openai


Your post isn't providing any argument, you're just providing statement after statement of a position that is pretty well known (no emergent behavior, just pattern matching) and pretty popular here on hackernews. Maybe the link provides all the good stuff, but I'd suggest you at least provide some flavor of what your arguments are.


Each argument could easily fill a book. Some valuable insights are discussed in the linked article, but most of the evidence comes from a series of research papers demonstrating that LLMs perform poorly when confronted with out-of-training distribution prompts. For example, even state-of-the-art LLMs can fail simple tasks like counting the number of "r" letters in the word "strawberry" or fall victim to the reversal curse (e.g., LLMs trained on "A is B" often struggle to deduce "B is A"). Another interesting read on this topic comes from MIT's research, which further explores these limitations: (https://news.mit.edu/2024/reasoning-skills-large-language-mo...)


> or fall victim to the reversal curse (e.g., LLMs trained on "A is B" often struggle to deduce "B is A").

But should "Joe is human" logically imply that "human is Joe"?


The reversal curse is mostly about understanding specific logical relationships rather than identity statements. It's not about statements like 'Joe is human' implying 'human is Joe,' but rather about LLMs struggling to reverse relationships or roles, such as turning 'A is B' into 'B is A.' Check out the summary of the paper: "We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse. For instance, if a model is trained on "Valentina Tereshkova was the first woman to travel to space", it will not automatically be able to answer the question, "Who was the first woman to travel to space?". Moreover, the likelihood of the correct answer ("Valentina Tershkova") will not be higher than for a random name. Thus, models do not generalize a prevalent pattern in their training set: if "A is B" occurs, "B is A" is more likely to occur. It is worth noting, however, that if "A is B" appears in-context, models can deduce the reverse relationship. We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as "Uriah Hawthorne is the composer of Abyssal Melodies" and showing that they fail to correctly answer "Who composed Abyssal Melodies?". The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as "Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. Code available at: this https URL."


I’d like you to explain what you don’t know, and why.


I don't know the history of the original tribes in Brazil. Why? Because I haven't studied it, read about it, or been taught anything on the subject. I know that such tribes exist, but I don't know their history.


FWIW, lately prompts seem more able to elict that same sort of answer from both Claude and GPT-4x on thinly learned contexts.


And, next release, they'll have had been trained on one more example of an answer for that sort of question ;)


Your post now will probably join the mix. So will mine and all the others here.


We can consider any question and conclude that we don’t know the answer. LLMs can’t meaningfully do that. They just make stuff up.


And yet I can provide modern LLMs unique never before asked scenarios that require the ability to reason about real world phenomenon and they can respond in ways more thoughtful than the average person.

Much of human education is feeding books of information about things we will never experience in our day-to-day lives, and convincing ourselves it reflects reality. When in fact most of us have not personally experienced any evidences that what we learned is true.

That vast majority of what a person "knows" is a biological statistical pattern matching on steroids.


What you're describing is no different than a linear regression describing the predicted value between two data points. Sometimes the regression is close if enough data exists, other times you get wild hallucinations, with the model none the wiser whether it's correct. All this tells us is that there is a lot of data out there on the internet that can still have useful information extracted from it.

Take someone like Ramanujan, who with a couple math books on his own, could derive brilliant and novel discoveries in mathematics, instead of needing millions of man hours worth reading material to replicate what is mostly a replacement for googling.


Don’t be hoodwinked by a plausibility engine. The central dogma of oracle-type LLMs is that plausibility converges towards accuracy as scale increases. This hypothesis remains very far from proven.


> And yet I can provide modern LLMs unique never before asked scenarios that require the ability to reason about real world phenomenon and they can respond in ways more thoughtful than the average person.

Give me an example of that


One example is to provide it a list of objects and ask it what would be the most plausible way to stack them without damaging the objects while also being stable. Optionally, you can ask the model to generate a rational for the solution provided.

You can even invent a fictious object that has never existed, define it's properties, and ask the model to include it in the list.


Sure the question may be unique, but you said unique scenario - this scenario is very well trained for.


In what way is it very well trained for?


you have a lot of resources about similar problems on the internet. so LLMs will have some patterns to leverage even if it is not even sure if you will get correct answers.


Have you tried trying an agentic approach, i.e. https://github.com/microsoft/autogen


Even humans are bad at explaining their decisions sometimes, especially if they did not reach there by reason.

In fact, if asked for the reason post mortem, people (and LLM) are likely to make them up on the spot. I wonder if the same dynamics is at play here.


Yea that’s a good point! On the one hand, it’s cool to look at the JavaScript the LLM generates to test its hypotheses in the logs (you can click “expand” to see them). On the other hand you’re right that especially given that it’s (presumably) an autoregressive LLM, if it writes the choice before the pattern there’s no way for the pattern to influence the choice.

You can muck with the prompts by clicking “Settings”! I think there’s a lot of room for improvement with all aspects of the experiment.

One prompt I tried made the LLM take tons of turns, but it ended up with a weird fixation with assuming everything was some sort of approximately geometric sequence and would always conclude none of the options matched. Before I gave it the eval_js tool it still did a pretty good job making choices, but the explanations were worse.


LLMs will provide an answer for every question even when told it doesn't know what it is talking about.

Overwhelming majority of people won't do this.

So I don't think the comparison really makes any sense.


Have you used them recently? They have gotten much better about this.


I’ve met humans who do that too


Unaccountably correct is not a frequent quality I see in humans


> It’s interesting to me that in spite of the fact that the LLM does so poorly with number sequences in general, it does pretty well with variations of the Fibonacci sequence.

Not surprising, since the Fibonacci sequence will be in the text swallowed by the LLM.


Yea! Do you think the LLM learns a Fibonacci-specific circuit? Do you think it is possible for an LLM to learn a general pattern-recognition circuit?


Well, a basic LLM would not really even distinguish a Fibonacci series from a hole in the ground. But models are being built with extra stuff, though I doubt recognising number series is high on the list.


> One day in March, I was walking my dog, saw a house numbered 3147, and thought it was a funny pattern. It’s the Fibonacci sequence with a different seed (31 instead of 11).

The Fibonacci sequence with 3,1 would be 3,1,4,5. I think you mean the house number was 1347. That would work and be easier to notice.


Haha thanks for catching that—fixed! That’s pretty ironic. Looks like I’m the LLM and you’re the human :P

Now I can’t remember whether the house was 3147 or 1347. The pattern might have been to add the first digit to the last digit (unrot + in the stack language). That’s what I get for writing at 3am!


Almost a decade ago I ran into a couple of problems where our logs started to change qualitatively on a SaaS tool I was supporting, but not in a way that printed more error level messages in logs. When someone complained about incorrect behavior in our code we could see clearly in the logs that the wrong code paths were being engaged, and when that started- people had carefully logged everything that was happening! They just hadn't logged it at the "Error" level so we never got alerts about it. And it would have been impracticable to have that level of alerting, because the message wasn't an error, it was just that due to other code changes we were now going down the wrong path for this span.

So for a hackweek I built a tool to tokenize all of our log messages, and then grabbed all of our logs and built a gigantic n-dimensional vector for every five minute chunk of two days of those logs, then calculated the pythagorean difference for each of those five minute chunks, and looked at the biggest differences, most outlier five minute chunks. And they were all from 8-8:30AM CET on the two days (our company and most of our customers were US based, I just was looking at what timezones matched up to the interesting time). I said "okay, this looks interesting, let me see what is happening in the logs then" but it was impossible to figure out what the statistics were seeing. Because the math thinks in ways that human brains don't- it views the entire dataset simultaneously, and human brains just can't keep five minutes of busy log files in their working memory, but humans build narratives and the math can't understand that. So I ended up getting frustrated and giving up on the project. Because explaining in terms that I could understand and start debugging was the whole point of the project!


Wow, I'm actually really interested in how you did that. Did the differences between chunks correspond to changes in word/token frequencies? Or did you embed the logs into some other space first? This is a pretty nifty idea; the only analyses I've ran on production logs were just simple SQL queries.


I wrote a python script that parsed all of our back-end code, looking for every log message, and used that as the basis for a giant n-dimensional vector representing every possible log message. Then I looked at every five minute chunk of our logs, and incremented the appropriate row in the vector for every message printed, then I used a clustering algorithm (don't remember which one, sorry, but it something from PyTorch I'm pretty sure, not at this company any more so I don't have the code any more) to identify the five vectors that were farthest from any other vector. I first computed the n-dimensional pythag difference between each pair of five minute chunks and selected the vectors with the highest combined difference but the true clustering algorithm was a cleaner choice.

I did throw away a couple of messages, basically all the traffic from our uptime and health checks I threw away because they seemed like they would distort the data (if our health-checker went down that was the health-checkers fault, and we ought to get an actual alarm, having this alert would just be a duplicate alarm). Dunno, that might have been a mistake. It was a hackweek project, I'm not saying this was perfect- in fact, as I said, it never provided anything useful because it couldn't be explained!

In our actual log files on the two days I looked at, all five of the outlier five minute chunks were from 8-8:30 AM Central European Time (our logs were in UTC, I just looked at different time zones and that seemed the most likely source of interesting behavior). And then when I looked at those times in the logs- and also when I eyeballed the ~1000 dimensional vectors for them- I couldn't tell what the clustering algorithm was seeing, because it was 'thinking' so differently from how I do. It wasn't like one of the rows in the vector was suddenly 150 and then went to 0 outside that half-an-hour, it was hard to see any patterns, So I couldn't set this up to alerts or anything, because it would have produced a whole lot of wild goose chases without a lot of further refinement.

Talking with someone more expert in ML than I, he recommended trying to fine-tune a LLM to predict the next word (or maybe even the next log message depending on windows and verbosity of log messages) and potentially alerting when the differences between predicted and actual got too large. Maybe that would work, I dunno. But that was what I would have tried next hackweek, if I hadn't moved on from that company.


The house number you saw is actually part of the Lucas sequence. it’s related to and is a good approximation at large numbers of the Fibonacci but at small numbers becomes distorted.

https://en.m.wikipedia.org/wiki/Lucas_number


Ha, this is neat!! I’d never heard of the Lucas sequence before, thanks for the new knowledge!


This is because LLMs do not reason. They pattern-fit. The fact that that solves a lot of things humans often use reason to solve most likely speaks to the training data or unrecognized patterns in standardized tests, not to LLMs reasoning capability, which does not exist.

To excuse their assumption of reasoning capabilities, the author in the FAQ snarkily points to “research” indicating evidence of reasoning—all of which was written by OpenAI and Microsoft employees who would not be allowed to publish anything to the contrary.

It’s a shame people continue to buy into the hype cycle on new tech. Here’s a hint: if the creators of VC-backed tech make extraordinary claims about it, you should assume it’s heavily exaggerated if not an outright lie.


Thanks for sharing! The custom stack-based language that you created for randomly generating interesting integer sequences was the most interesting part of this post for me. Wish the post focused on that rather than LLMs!


Hi HN! I'm about to go to bed, but I promise to go through comments when I wake up later today. Happy Friday!

The source code for the demo is on GitHub: https://github.com/jyc/stackbee


In some ways it's obvious why. An LLM produces a probability distribution and then a random word is sampled.

Imagine if you said to a secretary that you're 60% yes and 40% no, and she arbitrarily decided to write NO in your report and then a day later the board asked you why you made that decision.

You'd be confused too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: