Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Electrons are unpredictable at a quantum level.

And manufacturing had yet to be made reliable.

Not a perfect analogy. But in both cases we eventually made inherently unreliable things useful by making them predictable.



Interesting points. Food for thoughts.

I'd say it's not the same reliable. Manufacturing was doing the right thing but with a big failure rate. The process had to be refined. LLMs fundamentally do not do the right thing. They do seem to fake it well enough in many use cases. But since it's fake, it can only go so far. Your manufacturing example thus becomes an illustration of exactly what LLMs, for the use case we'd like them to address, aren't.

Quantum mechanics would not matter for most things because it's lower level, the same way your table's particles constantly move but for you the table is completely still. The gory (implementation) details don't contribute any unreliability at the higher, observable level. I get that your point is that the apparent particle chaos leads to reliable matter, but I don't see LLM "chaos" going anywhere coherent like this, where the particle chaos is statistically evened out at the higher level. The comparison falls apart quickly.

Not the perfect anologies indeed, and I believe the imperfections weaken the point too much. I'm glad you shared them though, I had to stop and think. However, I'd see how we might be clouded or sidetracked by the imperfect analogies, and I'd rather address the actual case at hand directly.


Okay, I offer you another analogy.

Consider linear regression. Curve fitting. Taking a bunch of points (“data”) and trying to find a function (“algorithm”) that fits them all as closely as possible.

If the points form a straight line the process is fairly easy and can be done with a simple “y = Ax + B” equation. You just need to find A and B.

But if they look more spiky, bouncing up and down as you go across the x axis, fitting a curve to the data is going to require a higher order polynomial. A saddle, or U shape, can be fit with a second-order polynomial: “y = Ax^2 + Bx + C”, but if the data is more complex than this, you have to go even higher-order.

Now let’s move over to machine learning a language models. Consider the real-world data used to train models as points in some high-dimensional space, and the training process as iteratively performing curve-fitting in that same space. Some of your real-world data must certainly be easier to fit than others: even though it exists in a high order space, it could be fit to a curve with only a few parameters. In order words, in some domains the data appear “smoother” than others. For example have a look at machine learning’s success in image generation: varying a few pixels doesn’t affect the result too much. It’s a “smooth” domain.

But the same is not true everywhere. In math, or engineering - even grammar! - one symbol out of place makes the whole thing nonsense. These domains would be “spiky” - and some of them are minefields punctuated by islands of valid data, such they’re not even contiguous.

In spiky or non-contiguous domains, approximation won’t cut it. You have to fit your data REALLY closely to have a useful function - a useful model.

Models with more parameters are better equipped for curve fitting in complex domains with hard-to-fit data. So as we start to see multi-trillion-parameter models, maybe we’ll discover those models can handle complex domains today’s models can’t!

(keen readers may notice the risk of fitting the points exactly by having as many data points as the order of the polynomial. But actually this would be a triumphant achievement as at this scale we just don’t know what magical things lie on the curve, right there between the data points we have.)


I'm quite convinced by this argument for ML / deep learning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: