Perhaps we disagree on semantics here, but IMHO I wouldn't call this "reasoning". It's essentially just data compression, which is exactly what you get by constructing an encoder network that minimizes loss while trying to maximally crunch down that data into a handful of geometric dimensions.
"Mankind doesn't quite yet understand the geometry of logic" is laying it on a bit thick with the marketing speak, IMHO. It's just data compression whose result is somewhat obvious given what the loss function is optimizing for.
If a structure capable of real reasoning was being built, I wouldn't expect LLMs to get tripped up by simple questions like "How many Rs does the word Strawberry have in it?". There are only two simple reasoning systems you need to solve this question. You need to learn the English alphabet and you need to be able to count to a handful of single digit numbers, both tasks that kids of age 3-4 have mastered just fine. Putting together these two concepts allows you now to reason your way through any such question, with any word and any letter.
Instead, LLMs perform how we would mostly expect a stochastic parrot to react. They hallucinate an answer, immediately apologize when it's called out to be wrong, hallucinate a new, still incorrect answer, immediately apologize again, until they eventually get stuck in a loop of cursed context and model collapse.
I'm not suggested that an LLM couldn't learn such a reasoning task, for example, but it would need to look at many training examples of such problems, and more importantly, have an architecture and loss function that optimized for learning a mechanical pattern or equation for solving that kind of problem.
And in that regard, we're very, very far away from LLMs that can do any kind of generic reasoning, because I haven't seen any evidence that those models are generic enough that you can avoid learning lots and lots and lots of specific ways to approach and solve problems.
One thing I think it's critical to keep in mind is that improvisation upon contextually relevant data in your compressed knowledge base is not reasoning. It might sound convincing to a human reader, but when it's failing at much simpler reasoning tasks the illusion really is shattered.
Wasn't there some paper recently that showed over training models, well beyond when it is normally halted, led them to create internal generalised models of a subject, e.g. arithmetic?
Essentially, the model internalised the core concepts of arithmetic. In that sense, the "reasoning" is pre baked into the model by training. Inference just plays things back through that space.
EDIT: as I recall, this is because understanding the concepts provides better compression than remembering lots of examples. It just takes a lot more training before it discovers them.
I don't like the analogy of "compression" that much, because for example if you train a model to predict linear data points, ideally it will only end up knowing two numbers in it's model weights when it's done training: "m" and "b" in "Y=Mx+b".
Once it's successfully captured "m" and "b" it has "knowledge" with which it can predict infinite numbers of points correctly, and hopefully it didn't "compress" any of the examples but discarded all of them.
Yeah, it's not compression in the sense of compressing data. Kind of compression in that it takes less resource to encode general rules than to remember the answer for everything.
The paper said was that the most efficient bits of the network were those that encoded rules rather than remembered data. Somehow those bits gradually took over from the less efficient parts. I'll have to dig around, can't seem to find it right now.
When people say "If it was reasoning, then it would be able to know, How many Rs does the word Strawberry have in it?", but that's not quite right, but I would say this instead "If it was reasoning THE SAME WAY HUMANS reason....then it would be able to...". Humans do reasoning a certain way. LLMs do reasoning a different way. But both are doing it.
But since it's not reasoning the way people do (but very differently), yes it can make mistakes that look silly to us, but still be higher IQ than any human. Intelligence is a spectrum and has different "types". You can fail at one thing but be highly intelligent at something else. Think of Savantism. Savants are definitely "reasoning" but many of savants are essentially mentally disabled by many standards of measurement, up to and including not being able to count letters in words. So saying you don't think LLMs can reason, and giving examples fails as evidence of that, is just a kind of category error, to put it politely.
The fact that LLMs can fix bugs in pretty much any code base shows it's definitely not doing just simple "word completion" (despite that way of training), but is indeed doing some kind of reasoning FAR FAR beyond what humans can yet understand. I have a feeling only coders truly understand the power of LLMs reasoning because the kind of prompts we do absolutely require extremely advanced reasoning and are definitley NOT answerable because some example somewhere already had my exact scenario (or even a remotely similar one) that the model weights essentially had just 'compressed'. Sure there is a compression aspect to what LLMs do, but that's totally orthogonal to the reasoning aspect.
"Mankind doesn't quite yet understand the geometry of logic" is laying it on a bit thick with the marketing speak, IMHO. It's just data compression whose result is somewhat obvious given what the loss function is optimizing for.
If a structure capable of real reasoning was being built, I wouldn't expect LLMs to get tripped up by simple questions like "How many Rs does the word Strawberry have in it?". There are only two simple reasoning systems you need to solve this question. You need to learn the English alphabet and you need to be able to count to a handful of single digit numbers, both tasks that kids of age 3-4 have mastered just fine. Putting together these two concepts allows you now to reason your way through any such question, with any word and any letter.
Instead, LLMs perform how we would mostly expect a stochastic parrot to react. They hallucinate an answer, immediately apologize when it's called out to be wrong, hallucinate a new, still incorrect answer, immediately apologize again, until they eventually get stuck in a loop of cursed context and model collapse.
I'm not suggested that an LLM couldn't learn such a reasoning task, for example, but it would need to look at many training examples of such problems, and more importantly, have an architecture and loss function that optimized for learning a mechanical pattern or equation for solving that kind of problem.
And in that regard, we're very, very far away from LLMs that can do any kind of generic reasoning, because I haven't seen any evidence that those models are generic enough that you can avoid learning lots and lots and lots of specific ways to approach and solve problems.
One thing I think it's critical to keep in mind is that improvisation upon contextually relevant data in your compressed knowledge base is not reasoning. It might sound convincing to a human reader, but when it's failing at much simpler reasoning tasks the illusion really is shattered.