> It really wasn't clear for the longest time how these models generate things s...

> It really wasn't clear for the longest time how these models generate things so well.

Honestly, I think it was pretty much just as clear in 2021 as it is in 2024. Whether you consider that 'clear' or 'not clear' is a matter of personal choice but I don't think we've really advanced our understanding all that far (mechanistic interpretability does not tell us that much, grokking is a phenom that does not apply to modern LLMs, etc. )

> we have accrued quite a bit of evidence by now that these models do far more than glue together training data. But there are still sceptics out there who spread this sort misinformation.

Few people who actually worked in this field and were familiar with the concept of 'implicit regularization' really ever thought the 'glue together training data' or 'stochastic parrot' explanations were very compelling at all.