Considering how important this benchmark has become to the judgement of state of the art AI models, I imagine each AI lab has a dedicated 'pelican guy', a a highly accomplished and academically credentialed person, who's working around the clock on training the model to make better and better SVG pelicans on bikes.
It's interesting that you mentioned on a recent post that saturation on the pelican benchmark isn't a problem because it's easy to test for generalization. But now looking at your updated benchmark results, I'm not sure I agree. Have the main labs been climbing the Pelican on a bike hill in secret this whole time?
Considering how many other "pelican riding a bicycle" comments there are in this thread, it would be surprising if this was not already incorporated in the training data. If not now, soon.
I don't think the big labs would waste their time on it. If a model is great at making the pelican but sucks at all other svg it becomes obvious. But so far the good pelicans are strong indicators of good general SVG ability.
Unless training on the pelican increases all SVG ability, then good job.
I was interested (and slightly disappointed) to read that the knowledge cutoff for Gemini 3 is the same as for Gemini 2.5: January 2025. I wonder why they didn't train it on more recent data.
Is it possible they use the same base pre-trained model and just fine-tuned and RL-ed it better (which, of course, is where all the secret sauce training magic is these days anyhow)? That would be odd, especially for a major version bump, but it's sort of what having the same training cutoff points to?
Maybe that date is a rule of thumb for when AI generated content became so widespread that it is likely to have contaminated future data. Given that people have spoofed authentic Reddit users with Markov chains, it probably doesn’t go back nearly far enough.
As your example shows, GPT-5 Pro would probably be better that GPT-5.1, but the tokens are over ten times more expensive and I didn’t feel like paying for them.
Extending beyond the pelican is very interesting, especially until your page gets enough recognition to be "optimized" by the AI companies.
It seems both Gemini 3 and latest ChatGPTs get a deep understanding of the representation of SVGs that seems a difficult task. I would be incapable of writing a SVG without visualizing the result and a graphical feedback loop.
PS: Would be fun to add "animated" in the short prompt since some models think of animation by themselves. Tried manually with 5 Pro (using the subscription), and in a sense it's worse than the static image. To start, there's a error: https://bafybeie7gazq46mbztab2etpln7sqe5is6et2ojheuorjpvrr2u...
I would also be unable to write SVG code to produce anything other than the simplest shapes.
I noticed that, on my page, Gemini 3.0 Pro did produce one animated SVG without being asked, for “#8Generate an SVG of an elephant typing on a typewriter.” Kind of cute, actually.
As for whether the images on the page will enter LLM training data: In the page’s HTML are meta tags I had Claude give me to try to prevent scraping: