I highly welcome the effort as well*, but I don't see why they would have to mistake first-language ability for fluency to argue for that. The difference is vast and relevant: anyone with good English reading and writing skills can take advantage of a model and might prefer it over a worse model in their native language.
*: Just sceptical whether there's enough content out there which isn't just (often badly or too straightforwardly) translated from English. Not an issue for the languages with let's say >10 Mio. speakers, but for everything smaller.
Norwegian has ca. 5m speakers, and ChatGPT does not just do fine with both the (mutually intelligible) Norwegian written languages, but also has no problem "translating" to/from several regional dialects when I've experimented.
And that is, I presume - I could be wrong -, before anyone has tried to really mine the Norwegian national library, as even much of what is online is not easily accessible for crawling.
I think there'll be plenty of content for even much smaller languages - especially anywhere with depositary laws -, but it's often going to require cumbersome collection efforts and negotiating access.
It's an arbitrary threshold for the sake of discussion. Of course Norway, one of the most developed countries and richest countries per capita, manages to have good content in its written languages on the internet. Model training also benefits from its linguistic and cultural similarities to Swedish and Danish, and English, Dutch, and German to a lesser degree. I suspect many people from Eastern Europe find Russian models useful for the same reason.
I agree that the situation is not hopeless for languages with a thriving written culture. But for many minority languages there might only be chat messages, some literary works, and vast amounts of machine-translated websites accessible for crawlers. I hope that in the future improved model architectures and training strategies can push the required amount of raw content way down.
*: Just sceptical whether there's enough content out there which isn't just (often badly or too straightforwardly) translated from English. Not an issue for the languages with let's say >10 Mio. speakers, but for everything smaller.