As your example shows, GPT-5 Pro would probably be better that GPT-5.1, but the tokens are over ten times more expensive and I didn’t feel like paying for them.
Extending beyond the pelican is very interesting, especially until your page gets enough recognition to be "optimized" by the AI companies.
It seems both Gemini 3 and latest ChatGPTs get a deep understanding of the representation of SVGs that seems a difficult task. I would be incapable of writing a SVG without visualizing the result and a graphical feedback loop.
PS: Would be fun to add "animated" in the short prompt since some models think of animation by themselves. Tried manually with 5 Pro (using the subscription), and in a sense it's worse than the static image. To start, there's a error: https://bafybeie7gazq46mbztab2etpln7sqe5is6et2ojheuorjpvrr2u...
I would also be unable to write SVG code to produce anything other than the simplest shapes.
I noticed that, on my page, Gemini 3.0 Pro did produce one animated SVG without being asked, for “#8Generate an SVG of an elephant typing on a typewriter.” Kind of cute, actually.
As for whether the images on the page will enter LLM training data: In the page’s HTML are meta tags I had Claude give me to try to prevent scraping:
I hate to be the one to say this, but this article reads as though it was written by an LLM. The shallowness is one reason. Another is the lack of any individual voice that would suggest a human author.
And there are the unsupported citations and references:
The sentence “The World Economic Forum’s 2023 Future of Jobs report estimates 83 million jobs may be displaced globally, disproportionately affecting low- and mid-skill workers” is followed by a citation to a book published in 1989.
Footnote 7 follows a paragraph about Nietzsche’s philosophy. That footnote leads to a 2016 paper titled “The ethics of algorithms: Mapping the debate” [1], which makes no reference to Nietzsche, nihilism, or the will to power.
Footnote 2 follows the sentence “Ironically, as people grow more reliant on AI-driven systems in everyday life, many report heightened feelings of loneliness, alienation, and disconnection.” It links to the WEF’s “Future of Jobs Report 2023” [2]. While I haven’t read that full report, the words “loneliness,” “alienation,” and “disconnection” yield no hits in a search of the report PDF.
A positive outcome of LLMs. Regardless if the specific article is AI generated or not, we become increasingly intolerant of shallowness. While in the past we would engage with the token effort of the source, we now draw conclusions and avoid the engagement much faster. I am expecting the quality of real articles to improve to avoid the more sensitive reader filters.
I used to write very formally and neutrally, and now I don't, because it comes across as LLM-ish. My sentences used to lack "humanity", so to speak. :(
I'm a member of the ACM, so I would report this article.
However, I think the author may just have made some mistakes and mixed up/-1'd their references, since the 2023 report is actually #2
2. Di Battista, A., Grayling, S., Hasselaar, E., Leopold, T., Li, R., Rayner, M. and Zahidi, S., 2023, November. Future of jobs report 2023. In World Economic Forum (pp. 978-2).
Similarly, Footnote 7 probably should probably point to #8
8. Nietzsche, F. and Hollingdale, R.J., 2020. Thus spoke zarathustra. In The Routledge Circus Studies Reader (pp. 461-466). Routledge.
I guess it doesn't help that the post is formatted as a typical article with the bio blurb. It's worth distinguishing the blog entries more and perhaps posting a disclaimer. After all when people think of CACM they don't generally have blogs in mind.
In addition to this, another telltale sign of LLM authorship are the repeated forced attempts to draw connections or parallels where they're nonsensical, trying to fulfill an essay prompt that doesn't - in those instances - have much meat to it.
> As AI systems increasingly mediate decisions [...], decisions once
> grounded in social norms and public deliberation now unfold within
> technical infrastructure, beyond the reach of democratic oversight.
> This condition parallels the cultural dislocation Nietzsche observed in
> modern Europe, where the decline of metaphysical and religious authorities
> undermined society’s ability to sustain shared ethical meaning. In both
> cases, individuals are left navigating fragmented norms without clear
> foundations or frameworks for trust and responsibility. Algorithmic
> systems now make value-laden choices, about risk, fairness, and worth,
> without mechanisms for public deliberation, reinforcing privatized,
> reactive ethics.
Note how "algorithmic systems" making "value-laden choices" reinforcing "privatized, reactive ethics" has absolutely nothing to do with the spiritual value collapse that Nietzsche, who was uninterested in or even opposed to critiques of power structures, and who wasn't much impressed by the whole idea of democracy, saw in 19th century Europe. While the criticism of AI systems being beyond the reach of democratic oversight is a common and perfectly valid one, it just simply doesn't touch on Nietzsche's philosophy; yet the LLM piece uses language and turns of phrase ("parallels", "in both cases") to make it sound as if there were a connection.
If I am strongly opposed to anti-democratic opaque AI surveillance machines, then I am not an individual "left navigating fragmented norms without clear foundations", on the contrary, my foundations are quite clear indeed; and on the other hand, increased automation causing the erosion of "frameworks for trust and responsibility" seems more likely to be welcomed by Nietzsche, who had little patience for moral affectations like responsibility, than opposed.
At this point I regularly see front-page HN articles that are LLM written (amusingly sometimes accompanied by comments praising how much of a breath of fresh air the article is compared to usual "LLM slop").
I worry about when I no longer see such articles (as that means I can no longer detect them), which likely will be soon enough.
“Most work lives in the fat middle of a bell curve. ... Models feast on that part of the curve. ... The central question for future labour markets is not whether you are clever or diligent in some absolute sense. It is whether what you do is ordinary enough for a model to learn or strange enough to fall through the gaps. ... An out of distribution human, in my head, is someone whose job sits far enough in the tail of that curve that it does not currently compress into training data. ... [But T]hey are not safe; nothing is. They are simply late on the automation curve.”
That is indeed a good insight. What's being hollowed out are jobs for the middle of the bell curve, which, of course, is where most of the people are. The author says that in several different ways, pointing out that a quiet, reliable long-term job is no longer something that's even a likely possibility.
Then he says it's only a matter of time until the outliers are automated too.
That's more speculative, but may not be wrong. It's only been three years since ChatGPT shipped. This is just getting started.
All of this started before ChatGPT. There are graphics showing it, sorry I can’t remember the source.
I guess I’m just annoyed that everyone in the comments is reaffirming the AI is stealing jobs narrative, but half the studies coming out say it’s actually wasting peoples time and they are poor judges of their own productivity.
It just feels like AI is a convenient excuse for businesses to cut costs since the economy is crap, but no one wants to admit it for fear of driving their stock price down.
The author's argument is framed more widely than just LLMs. He also discusses robots, teleoperation, and other areas where workers in the middle of the bell curve seem especially vulnerable to displacement.
I accept, though, your point that economic factors not directly related to AI are also playing a role. Presumably economists are now trying to to pick apart the effect of each factor on the job market.
A couple of days ago, inspired by Simon and those discussions, I had Claude create 30 such tests. I posted a Show HN with the results from six models, but it didn’t get any traction. Here it is again:
Oh man, that’s hilarious. I dunno what qwen is doing most of the time. Gemini seems to be either a perfect win or complete nonsense. Claude seems to lean towards “good enough”.
If anyone is interested in seeing older almanac(k)s, or at least texts with the word in their titles, the Internet Archive has scans of thousands. One chosen at random:
I’m similar to you. I started ripping my CD collection to mp3s around 2002, and I have them all on my boot disk and synced to both Dropbox and an external drive for backup.
Many of those CDs, which I started buying in 1986, are not available on streaming services. Over the years I have accumulated a lot of other nonstreamed music, including radio programs I’ve recorded, live music downloaded from the Internet Archive, purchased music from Bandcamp and elsewhere, and music I composed and performed myself for my own enjoyment.
It’s a bit obsessive, I admit, and the chances that my heirs will find the files and want to preserve them after I’m gone are slim. But I am happy I have them now.
One reason Sokal’s article stirred up such a fuss was that, when it was published in 1996, people still largely depended on edited, gatekept outlets for their reading and viewing. Although not mainstream media, Social Text was still selective about what it published, and the fact that its editors had chosen to publish Sokal’s hoax was a key point in the controversy. It would be a few more years before raw, self-published writing on the Internet would start to reach as many people as it does now.
That's not how I remember the event. The fact that "people still largely depended on edited, gatekept outlets for their reading and viewing" played no role, as far as I know. Newspapers wrote about the Sokal-Bricmont story, not because they were concerned as gatekeepers, but because it was a spectacular hoax. I was in academia at the time, and it was widely discussed.
The hoax intended to show two facts:
- Philosophy and social studies sometimes have an unsuitable fascination for science, with a tendency to wrongly apply scientific concepts they don't understand. Some impostors had perfect fame in their academic domain, despite writing this pseudo-scientific gibberish.
- Scientific journals were supposed to publish provable or reproducible articles, so when a flawed article went through the publishing process, there was hope the errors would be detected and other articles would fix it. And a publication in a top-tier journal would bring intense scrutiny. In philosophy and social science, nonsense could get published, widely accepted, and even studied.
BTW, I remember Jean Bricmont telling how much he liked good philosophy, and how he was pained when reading fraudulent philosophy.
Thank you for your response. Let me explain a bit more what I intended by my comment above.
Though I wasn’t in academia at the time, my memory of the event is basically the same as yours. But I wonder if younger people today, used to the current Internet-based media environment, would understand why such a hoax attracted so much attention outside of academia. My recollection is that, before the Internet, there was much more interest in and discussion of what media outlets decided to publish or not and how they slanted stories. While such interest still exists, it might seem a bit odd to someone who mainly consumes social media. The current issues around filtering and algorithms are quite different from the often intense discussions a few decades ago about how, say, the New York Times or CBS News covered particular stories.
What would happen today if someone got a similarly ridiculous article accepted by a magazine or website like Social Text? My guess is that it would not attract anything like the widespread attention the Sokal hoax did.
At the time it was seen as revealing the particular field being prone to bullshit. But since then there have been other hoaxes and scandals in other fields, so it seem more like a general problem with peer-reviewed journals.
Except at the time many scholars in psychology and other disciplines were raising alarm bells about replicating tested and published results. So Social Text was in good company.
This would be a pretty hilarious board for anyone who likes the em-dash and who has had many fairly active accounts (one at a time) on here due to periodically scrambling their passwords to avoid getting attached to high karma or to take occasional breaks from the site. Should there be such people.
Thank you for this! Apparently I'm #4 by total em-dash uses, #14 by average em dashes per comment, and #4 at max em dashes per comment, since apparently I posted a comment containing 18 em dashes once.
Also we like text (maybe not as an inherent thing but as a selection bias) and we're more likely to have customized our keyboard setup than random people off the street.
How long do you think that will remain true? I've bootstrapped some workflows with Claude Code where it writes a markdown file at the end of each session for its own reference in later sessions. It worked pretty well. I assume other people are developing similar memory systems that will be more useful and robust than anything I could hack together.
For LLMs? Mostly permanently. This is a limitation of the architecture. Yes, there are workarounds, including ChatGPT's "memory" or your technique (which I believe are mostly equivalent), but they are limited, slow and expensive.
Many of the inventors of LLMs have moved on to (what they believe are) better models that would handle such learnings much better. I guess we'll see in 10-20 years if they have succeeded.
One reason for testing this is that it might indicate how accurately models can explain natural language grammar, especially for agglutinative and fusional languages, which form words by stringing morphemes together. When I tested ChatGPT a couple of years ago, it sometimes made mistakes identifying the components of specific Russian and Japanese words. I haven’t run similar tests lately, but it would be nice to know how much language learners can depend on LLM explanations about the word-level grammars of the languages they are studying.
Later: I asked three LLMs to draft such a test. Gemini’s [1] looks like a good start. When I have time, I’ll try to make it harder, double-check the answers myself, and then run it on some older and newer models.
What you are testing for is fundamentally different than character level text manipulation.
A major optimization in modern LLMs is tokenization. This optimization is based on the assumption that we do not care about character level details, so we can combine adjacent characters into tokens, then train and run the main AI model on smaller strings built out of a much larger dictionary of tokens. Given this architecture, it is impressive that AIs can perform character level operations at all. They essentially need to reverse engineer the tokenization process.
However, morphemes are semantically meaningful, so a quality tokenizer will tokenize at the morpheme level, instead of the word level. [0]. This is of particuarly obvious importance in Japanese, as the lack of spaces between words means that the naive "tokenize on whitespace" approach is simply not possible.
Further, the training data that is likely to be relevent in this type of query probably isolates the individual morphemes while talking about a bunch of words that the use them; so it is a much shorter path for the AI to associate these close but not quite morphene tokens with the actual sequence of tokens that corresponds to what we think of as a morphene.
[0] Morpheme level tokenization is itself a non-trivial problem. However, has been pretty well solved long before the current generation of AI.
Tokenizers are typically optimized for efficiency, not morpheme separation. Even in the examples above it's not morphemes - proper morpheme separation would be un-believ-ably and дост-о-при-меч-а-тельн-ость.
Regardless of this, Gemini is still one of the best models when it comes for Slavic word formation and manipulation, it can express novel (non-existent) words pretty well and doesn't seem to be confused by wrong separation. This seems to be the result of extensive multilingual training, because e.g. GPT other than the discontinued 4.5-preview and many Chinese models have issues with basic coherency in languages that heavily rely on word formation, despite using similar tokenizers.
I notice that that particular tokenization deviates from the morphemic divisions in several cases, including ‘dec-entral-ization’, ‘食べ-させ-られた-くな-かった’, and ‘面白-くな-さ-そうだ.’ ‘dec’ and ‘entral’ are not morphemes, nor is ‘くな.’
https://gally.net/temp/20251107pelican-alternatives/index.ht...
There seem to be one or two parsing errors. I'll fix those later.