Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a fair criticism we should've addressed. There's actually a nice study on this: Vong et al. (https://www.science.org/doi/10.1126/science.adi1374) hooked up a camera to a baby's head so it would get all the input data a baby gets. A model trained on this data learned some things babies do (eg word-object mappings), but not everything. However, this model couldn't actively manipulate the world in the way that a baby does and I think this is a big reason why humans can learn so quickly and efficiently.

That said, LLMs are still trained on significantly more data pretty much no matter how you look at it. E.g. a blind child might hear 10-15 million words by age 6 vs. trillions for LLMs.





> hooked up a camera to a baby's head so it would get all the input data a baby gets.

A camera hooked up to the baby's head is absolutely not getting all the input data the baby gets. It's not even getting most of it.


> LLMs are still trained on significantly more data pretty much no matter how you look at it ... 10-15 million words ... vs trillions for LLMs

I don't know how to count the amount of words a human encounters in their life, but it does seem plausible that LLMs deal with orders of magnitude more words. What I'm saying is that words aren't the whole picture.

Humans get continuous streams of video, audio, smell, location and other sensory data. Plus, you get data about your impact on the world and the world's impact on you: what happens when you move this thing? What happens when you touch some fire? LLMs don't have this yet, they only have abstract symbols (words, tokens).

So when I look at it from this "sensory" perspective, LLMs don't seem to be getting any data at all here.


While an LLM is trained on trillions of tokens to acquire its capabilities, it does not actively retain or recall the vast majority of it, and often enough is not able to make deductive reasoning either (e.g. X owns Y does not necessarily translate to Y belongs to X).

The acquired knowledge is a lot less uniform than you’re proposing and in fact is full of gaps a human would never make. And more critically, it is not able to peer into all of its vast knowledge at once, so with every prompt what you get is closer to an “instance of a human” than “all of humanity” as you might think of LLMs.

(I train and dissect LLMs for a living and for fun)


I think you are proposing something that's orthogonal to the OP's point.

They mentioned the training data is much higher for an LLM, LLM's recall not being uniform was never in question.

No one expects compression to be without loss when you scale below knowledge entropy that exists in your training set.

I am not saying LLMs do simple compression but just pointing a mathematical certainity.

(And I think you don't need to be an expert in creating LLMs to understand them, albeit I think a lot of people here have experience with it aswell so I find the additional emphasis on it moot).


The way I understood OP’s point is that because LLMs have been trained on the entirety of humanity’s knowledge (exemplified by the internet), then surely they know as much as the entirety of humanity. A cursory use of an LLM shows this is obviously not true, but I am also raising the point that LLMs are only summoning a limited subset of that knowledge at a time when answering any given prompt, bringing them closer to a human polymath than an omniscient entity, and larger LLMs only seem to improve on the “depth” of that polymath knowledge rather than the breadth of it.

Again just my impression from exposure to many LLMs at various states of training (my last sentence was not an appeal to expertise)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: