It's also worth reading his initial tweet: https://x.com/suchirbalaji/status/184...

bsenftner · on Dec 14, 2024

I'm an applied AI developer and CTO at a law firm, and we discuss the fair use argument quite a bit. It grey enough that whom ever has more financial revenues to continue their case will win. Such is the law and legal industry in the USA.

motohagiography · on Dec 14, 2024

what twigs me about the argument against fair use (whereby AI ostensibly "replicates" the content competitively against the original) is that it assumes a model trained on journalism produces journalism or is designed to produce it. the argument against that stance would be easy to make.

TeMPOraL · on Dec 14, 2024

The model isn't trained on journalism only, you can't even isolate its training like that. It's trained on human writing in general and across specialties, and it's designed to compete with humans on what humans do with text, of which journalism is merely a tiny special case.

I think the only principle positions to be had here is to either ignore IP rights for LLM training, or give up entirely, because a model designed to be general like human will need to be trained like a human, i.e. immersed in the same reality as we are, same culture, most of which is shackled by IP claims - and then, obviously, by definition, as it gets better it gets more competitive with humans on everything humans do.

You can produce a complaint that "copyrighted X was used in training a model that now can compete with humans on producing X" for arbitrary value of X. You can even produce a complaint about "copyrighted X used in training model that now outcompetes us in producing Y", for arbitrary X and Y that are not even related together, and it will still be true. Such is a nature of a general-purpose ML model.

MichaelZuo · on Dec 14, 2024

This seems to be putting the cart before the horse.

IP rights, or even IP itself as a concept, isn’t fundamental to existence nor the default state of nature. They are contigent concepts, contigent on many factors.

e.g. It has to be actively, continuously, maintained as time advances. There could be disagreements on how often, such as per annum, per case, per WIPO meeting, etc…

But if no such activity occurs over a very long time, say a century, then any claims to any IP will likely, by default, be extinguished.

So nobody needs to do anything for it all to become irrelevant. That will automatically occur given enough time…

timschmidt · on Dec 14, 2024

> IP rights, or even IP itself as a concept, isn’t fundamental to existence nor the default state of nature.

This is correct. Copyright wasn't a thing until after the invention of the printing press.

motohagiography · on Dec 14, 2024

the analogy in the anti-fair-use argument is that if I am the WSJ, and you are a reader and investor who reads my newspaper, and then you go on to make a billion dollars in profitable trades, somehow I as the publisher am entitled to some equity or compensation for your use of my journalism.

That argument is equally absurd as one where you write a program that does the same thing. Model training is not only fair use, but publishers should be grateful someone has done something of value for humanity with their collected drivelings.

bsenftner · on Dec 15, 2024

This is the checkmate. The moment anything is published, it is fair game, it is part of the human consciousness and available for incorporation in anything that it sits as a component. Otherwise, what is the fucking point of publishing, mere revenue? Are we all not collectively competing and contributing? Furthermore, is not anything copied from anything published arguably not satire? Protected speech satire?

MadnessASAP · on Dec 14, 2024

It has become ludicrously clear in the past decade that many of the competitors to journalism are very much not journalism.

musicale · on Dec 15, 2024

Whether or not training is decided as fair use, it does seem like it could affect artists and authors.

Many artists don't like how image generators, trained on their original work, allow others to replicate their (formerly) distinctive style, almost instantly, for pennies.

Many authors don't like how language models can enable anyone to effortlessly create a paraphrased versions of the author's books. Plagiarism as a service.

Human artists and writers can (and do) do the same thing, but the smaller scale, slower speed, and higher cost reduces the economic effects.

snovv_crash · on Dec 14, 2024

I think it makes more sense in context of entertainment. However even in journalism, given the source data there's no reason an LLM couldn't put together the actual public facing article, video etc.

riwsky · on Dec 14, 2024

Doesn’t need to be journalism, just needs to compete with it.

DennisP · on Dec 14, 2024

> they can create substitutes that compete with the data they're trained on.

If I'm an artist and copy the style of another artist, I'm also competing with that artist, without violating copyright. I wouldn't see this argument holding up unless it can output close copies of particular works.