I mean, if you ignore the fact there would be no LLM's without wholesale scraping of the corpus of all software ever written.
LLM's are the least ethically sourced pieces of technology I've ever seen. That they have businesses built that haven't been sued out of existence for not asking for permission to train first is positively mind boggling.
You think there wasn't a reason Microsoft bought GitHub, whose ToS allowed them to expand their training corpus vastly beyond their own internal systems? Why Amazon does the same thing with CodeCommit? If your stuff is hosted somewhere with a ToS, you can bet that repo is getting into the training corpus. Having you flavor of LLM in today's is too valuable for any corp to pass up the opportunity.
LLM's are the least ethically sourced pieces of technology I've ever seen. That they have businesses built that haven't been sued out of existence for not asking for permission to train first is positively mind boggling.