Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Weights are simply a lossy compression of the training data set.

Now, I understand the argument that perhaps the specific work has been homeopathically diluted down to nothingness in the weights and so therefore has only been used to contextualise the compression process of other works, but if the weights can be reasonably used to generate copyright infringing text (and condensations and abridgements and transformations are explicitly listed in the law, verbatim copying is not necessary), or even answer substantial questions about it, then that shows that the weights included that data.

If I take a sound file and compress it down so it's poor quality but I can still make out the tune, that doesn't mean that I've avoided copyright law.



> Weights are simply a lossy compression of the training data set.

No they're not -- they're more like the dictionary generated to produce a lossless compressed data set. But then we throw out the compressed data itself, and keep only the dictionary.

> but if the weights can be reasonably used to generate copyright infringing text (and condensations and abridgements and transformations are explicitly listed in the law, verbatim copying is not necessary)

First of all, they haven't been shown to substantially generate infringing text that aren't the kinds of short snippets covered by fair use. And my previous comment already explained that longer texts are not going to happen, for both legal and economic reasons.

But secondly, you're wrong about "condensations and abridgements and transformations". You can absolutely sell a page-long summary of a book without getting permission, for instance. What do you think things like CliffsNotes are all about? Or all those two-page "executive summaries" of popular busines books?

You can't abridge a 1,000 page book to 500 pages and sell that, but you can summarize its ideas in a page and sell that. Which is basically the approximate level of understanding that LLM's seem to absorb.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: