Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One experiment that I would love to see is an LLM-like model for audio. Feed it hours and hours of lectures, sound effects, animal calls, music etc. You would be able to talk to it and it would ingest the raw waveform then produce audio as a response. Would it learn the fundamentals of music theory? Would it learn to produce "the sound of a bowling ball hitting a dozen windchimes?" Would it learn to talk in English and communicate with whales? We've already done text and images, now someone please do sound!


Uhhh...this is out there, from like a dozen different groups. Not going to do a full Googling for you on my phone because it's literally everywhere but "LLM for audio" gives https://ai.googleblog.com/2022/10/audiolm-language-modeling-... as the first result...some of this stuff is already really impressive.


> Would it learn the fundamentals of music theory?

No, but you might convince yourself it did.

It would map the patterns that exist in its training set. It would then follow those patterns. The result would look like a human understanding music theory, but it would not be that.

It would be stumbling around exactly the domain we gave it: impressive because that domain is not noise, it's good data. It still wouldn't be able to find its way around, only stumble.


> The result would look like a human understanding music theory, but it would not be that.

The question then becomes, what is understand? Is what a human does any different than what this LLM is doing?


Objectivity.

A human can do something with the model. An LLM can only present the model to you.


Not sure how that's any different than a model doing something with another model, as in AutoGPT. What part is objective? A model can be wrong just like a human can be wrong or spread falsehoods too.


A model can't be right or wrong, because it doesn't actually make any logical decisions.

These are categorizations that we make after the fact. If the model could do the same categorization work, then it could actively choose correct over incorrect.


Models could potentially make logical decisions too, if we connect them to something like a classical computer or a rules engine. I don't see any fundamental barriers to making models and computers in general similar to humans' way of understanding and reasoning too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: