This. I'm really bothered by the almost cruel glee with which a lot of people respond to SO's downfall. Yeah, the moderation was needlessly aggressive. But it was successful at creating a huge repository of text-based knowledge which benefited LLMs greatly. If SO is gone, where will this come from for future programming languages, libraries, and tools?
You talk about news here like it's some irrefutable ether LLMs can tap into. Also I'd think newspapers and scientific papers cover extremely little of what the average person uses an LLM to search for.
This always feels to me like, an elephant in the room.
I’d love to read a knowledgeable roundup of current thought on this. I have a hard time understanding how, with the web becoming a morass of SEO and AI slop - with really no effort being put into to keeping it accurate - we’ll be able to train LLMs to the level we do today in the future.