LLMs used to be awful, but lately I find them just as good as Wikipedia, which is to say very very good. Sometimes they hallucinate no doubt, but in general it's as good as a google search for me.
Do you check these sources? I find Gemini and, especially, Google Search AI to regularly cite sources that does not say what it claims to says. For example, (not an actual example but along these lines), "Can Google Sheets do x feature" and it replies "Yup" and links to an Excel YouTube tutorial as its source
I ask ChatGPT and Grok questions about Latin and Greek all the time, and they'll brazenly invent sources, quoting them in Greek or Latin. As an example (an actual example), I asked ChatGPT to round up all the poetry that, like Catullus' sparrow and Statius' parrot, dealt with birds. It hallucinated a bird poem by Callimachus that it claimed was the prototype and gave me not only an English translation but a Greek original—that never existed. It just plain lied. I have zero faith in any fact about the ancient world that comes from an LLM.
On the other hand, LLMs do a great job translating between languages, which is probably why they can vibe code. They catch some grammar errors, too, although not all of them, and even some stylistic errors, so it's useful to run Greek compositions through them. Ask it about linguistic questions ("Which Greek verbs other than ἀφίημι violate Grassman's law?"), though, and it will spew a bunch of irrelevant examples that don't pertain, because it doesn't actually understand what it's doing, just predicting tokens.
What doesn’t help the community is that “hallucinate”, “cite sources” still doesn’t capture what the LLM is doing. LLMs were pre-trained to do one thing, trained to do another and maybe fine-tuned for yet another thing. Do they hallucinate? From our perspective they do because we know true and false but from the tool’s perspective, it’s “just interpolating the text crammed inside of it”.
And in your verification, what's your usual experience?
Citation usually shows the claim was right? Mix of right and wrong, say 60%/40%? Usually wrong? Citation often doesn't exist or is not relevant to the claim?
(I don't often use the AI answers, but the few times I bother to check the citations, they usually don't fill me with confidence about the answer.)
I would say about 75/25%, maybe even 80-20. Sometimes I'll ask questions on topics where I'm expert (because I want to pursue some line of inquiry, but am not sure what the baseline level of knowledge is available) and I'll see mistakes, but 'good' mistakes that indicate solid reasoning but are wrong because of some counterintuitive fact, ie a pitfall that almost everyone including myself got wrong on the first encounter.
https://gemini.google.com/app/6da2be1502b764f1
LLMs used to be awful, but lately I find them just as good as Wikipedia, which is to say very very good. Sometimes they hallucinate no doubt, but in general it's as good as a google search for me.