Google Books is not transformative. It shows you all the same data for the same purpose as they were published for.
A better example is Google Image Search. Thumbnails are transformative because they have a different purpose and aren't the same data. An LLM is much more transformative than a thumbnail.
It's more lossy than even lossy compression because of the regularization term; I'm pretty sure you can train one that's guaranteed to not retain any of the pretraining text. Of course then it can't answer things like "what's the second line of The Star Spangled Banner".
The fact that compression is incredibly lossy does not change the fact that it's copyright infringement.
I have a lossy compression algorithm with simply outputs '0' or '1' depending on the parity of bits of the input.
If I run that against a camcording of a disney film, the result is a 0 copyrighted by disney, and in fact posting that 0 in this comment would make this comment also illegal so I must disclaim that I did not actually produce that from a camcorded disney film.
If I run it against the book 'dracula' the result is a 0 under the public domain.
The law does not understand bits, it does not understand compression or lossiness, it understands "humans can creatively transform things, algorithms cannot unless a human imbues creativity into it". It does not matter if your compressed output does not contain the original.
> The court held that framing and hyperlinking of original images for use in an image search engine constituted a fair use of Perfect 10's images because the use was highly transformative
A better example is Google Image Search. Thumbnails are transformative because they have a different purpose and aren't the same data. An LLM is much more transformative than a thumbnail.
It's more lossy than even lossy compression because of the regularization term; I'm pretty sure you can train one that's guaranteed to not retain any of the pretraining text. Of course then it can't answer things like "what's the second line of The Star Spangled Banner".