More

ctoth · 2025-12-20T18:04:17 1766253857

> as they do for popular benchmarks or for penguins riding a bike.

Citation?

ctoth · 2025-12-20T10:50:28 1766227828

Ah, I see you have discovered blogs! They're a cool form of writing from like ~20 years ago which are still pretty great. Good thing they show up on this website, it'd be rather dull with only newspapers and journal articles doncha think?

ctoth · 2025-12-20T10:28:39 1766226519

> The fundamental challenge in AI for the next 20 years is avoiding extinction.

So nice to see people who think about this seriously converge on this. Yes. Creating something smarter than you was always going to be a sketchy prospect.

All of the folks insisting it just couldn't happen or ... well, there have just been so many objections. The goalposts have walked from one side of the field to the other, and then left the stadium, went on a trip to Europe, got lost in a beautiful little village in Norway, and decided to move there.

All this time though, the prospect of instantiating a something smarter than you (and yes, it will be smarter than you even if it's at human level because of electronic speeds...) This whole idea is just cursed and we should not do the thing.

cheschire · 2025-12-20T10:34:56 1766226896

"Your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should."

ctoth · 2025-12-20T01:05:35 1766192735

The thing about this metaphor that people don't seem to ever complete is.

Okay, you've switched to English. The speed of typing the actual tokens is just about the same but...

The standard library is FUCKING HUGE!

Every concept that you have ever read about? Every professional term, every weird thing that gestures at a whole chunk of complexity/functionality ... Now, if I say something to my LLM like:

> Consider the dimensional twins problem -- how're we gonna differentiate torque from energy here?

I'm able to ... "from physics import Torque, Energy, dimensional_analysis" And that part of the stdlib was written in 1922 by Bridgman!

JoshTriplett · 2025-12-20T04:26:50 1766204810

> The standard library is FUCKING HUGE!

And extremely buggy, and impossible to debug, and does not accept or fix bug reports.

AI is like an extremely enthusiastic junior engineer that never learns or improves in any way based on your feedback.

I love working with junior engineers. One of the best parts about working with junior engineers is that they learn and become progressively more experienced as time goes on. AI doesn't.

integralid · 2025-12-20T07:36:25 1766216185

People need to decide if their counter to AI making programmers obsolete is "current generation AI is buggy, and this will not improve until I retire" or "I only spend coding 5% of my time so it doesn't matter if AI can instantly replace my coding".

And come on: AI definitely will become better as time goes on.

JoshTriplett · 2025-12-20T07:50:41 1766217041

It gets better when the AI provider trains a new model. It doesn't learn from the feedback of the person interacting with it, unlike a human.

ctoth · 2025-12-20T00:04:44 1766189084

Possible confound (seems important):

"creatives" tend to have a certain political tribe, that political tribe is well-represented in places that have this precise type of authenticity/etc. language around AI use...

Basically a good chunk of this could be measuring whether or not somebody is on Bluesky/is discourse-pilled... and there's no way to know from the study.

ctoth · 2025-12-18T00:29:48 1766017788

You've built a filter that punishes verification at the hiring stage, then you're surprised when your team ships unverified code. You get what you select for. He selected for "doesn't double-check." Congratulations, you've got a team of developers who don't double-check.

ctoth · 2025-12-18T00:22:00 1766017320

I can run the code in my head. I can probably be right. I can be 99.9999% sure I am right.

OR:

I could run the code in the interpreter and be 100% certain.

I know what attitude I would prefer out of my developers.

Maxatar · 2025-12-18T00:36:44 1766018204

Isnt the point of the article that blindly copying and pasting this code leads to the wrong answer?

I agree many developers do blindly copy and paste things off the Internet, but I don't think that's something to desire or celebrate.

ctoth · 2025-12-18T00:44:51 1766018691

The point of this article is this person punishes people who copy and paste the code ... into their python interpreter to check.

How many cases have you faced in your real job where this has happened?

Maxatar · 2025-12-18T01:43:38 1766022218

So I wouldn't go so far as to say that I'd fire someone for copying and pasting code, but it's definitely part of my company's culture that copying and pasting code off of a website, and especially executing it, is something heavily discouraged to the point that it doesn't really happen at my job.

I'm perfectly happy to use Stack Overflow and other resources/tutorials, blog posts etc... to find solutions to problems, but just instinctively I would never think to copy and paste a solution from these sites and incorporate it into my codebase and I sure as heck wouldn't think to execute code from some untrusted site I happened to come across.

But this may also be a consequence of the domain I work in where we take security very seriously.

array_key_first · 2025-12-18T04:26:31 1766031991

You can tell how safe a code snippet is from reading it.

Like, there's no way you're going to copy a 20 line algorithm from stack overflow on balancing a red-black tree and have it encrypt your harddrive.

Obviously you still need to test the code to make sure it works and understand what it's doing, but there is very little security risk here. Just look up the functions youre using and understand the code and you're fine.

anabab · 2025-12-18T10:28:20 1766053700

the attitude of at least checking the code before running it, I suppose? Or is the curl | sudo bash approach more preferred nowadays?

ctoth · 2025-12-14T18:19:08 1765736348

Congratulations, I guess? I can't read your content. But ... The machines can't either, so ... great job!

Although... Hmm! I just pasted it into Claude and got:

When text content gets scraped from the web, and used for ever-increasing training data to improve. Copyright laws get broken, content gets addressively scraped, and even though you might have deleted your original work, it might must show up because it got cached or archived at some point. Now, if you subscribe to the idea that your content shouldn't be used for training, you don't have much say. I wondered how I personally would mitigate this on a technical level. et tu, caesar? In my linear algebra class we discussed the caesar cipher[1] as a simple encryption algorithm: Every character gets shifted by n characters. If you know (or guess) the shift, you can figure out the original text. Brute force or character heuristics break this easily. But we can apply this substitution more generally to a font! A font contains a cmap (character map), which maps codepoints and glyphs. A codepoint defines the character, or complex symbol, and the glyph represents the visual shape. We scramble the font's codepoint-glyph-mapping, and adjust the text with the inverse of the scramble, so it stays intact for our readers. It displays correctly, but the inspected (or scraped) HTML stays scrambled. Theoretically, you could apply a different scramble to each request. This works as long as scrapers don't use OCR for handling edge cases like this, but I don't think it would be feasible. I also tested if ChatGPT could decode a ciphertext if I'd tell it that a substitution cipher was used, and after some back and forth, it gave me the result: "One day Alice went down a rabbit hole,

How accurate is this?

Did you seriously just make things worse for screen reader users and not even ... verify ... it worked to make things worse for AI?

lumirth · 2025-12-14T18:24:07 1765736647

That’s the correct text of the article, as far as I can tell. Though not the entirety of it. The author goes on to say that ChatGPT wasn’t able to parse out the underlying text.

Part of the reason it might be useful is not because “no AI can ever read it” (because I’m sure a pentesting-focused Claude Code could get past almost any similar obfuscation), but rather that the completely automated and dumb scrapers stealing your content for the training of the AI models can’t read it. For many systems, that’s more than enough.

That said, I recently completely tore apart my website and rebuilt it from the ground up because I wasn’t happy with how inaccessible it was. For many like me, sacrificing accessibility is not just a bad look, but plainly unacceptable.

ctoth · 2025-12-14T18:28:19 1765736899

I didn't use Claude Code. I just pasted it directly into the web interface and said "I can't read this, can you help?" and then I excerpted the result so you sighted folks didn't have to reread, you could just verify the content matched.

So basically this person has put up a big "fuck you" sign to people like me... while at the same time not protecting their content from actual AI (if this technique actually caught on it is trivial to reverse it in your data ingestion pipeline)

flir · 2025-12-14T18:52:44 1765738364

But it's "made with ♥" (the footer says so).

(He's broken mainstream browsers, too - ctrl+f doesn't work in the page.)

GPT 5.2 extracted the correct text, but it definitely struggled - 3m36s, and it had to write a script to do it, and it messed up some of the formatting. It actually found this thread, but rejected that as a solution in the CoT: "The search result gives a decoded excerpt, which seems correct, but I’d rather decode it myself using a font mapping."

I doubt it would be economic to decode unless significant numbers of people were doing this, but it is possible.

NewsaHackO · 2025-12-14T19:38:44 1765741124

This is the point I was making downthread: no scraper will use 3m36s of frontier LLM time to get <100 KB of data. This is why his method would technically achieve what he asked for. Someone alluded to this further down the thread, but I wonder if one-to-one letter substitution specifically would still expose some extractable information to the LLM, even without decoding.

tilschuenemann · 2025-12-14T18:46:24 1765737984

Yes, it's worse for screenreaders, I listed that next to other drawbacks which I acknowledged. I don't intend to apply this method anywhere else due to these drawbacks, because accessibility matters.

It's a proof of concept, and maybe a starting point for somebody else who wants to tackle this problem.

Can LLMs detect and decode the text? Yes, but I'd wager for the case that data cleaning doesn't happen to the extent that it decodes the text after scraping.

lumirth · 2025-12-14T18:40:23 1765737623

I didn’t think you did use Claude Code! I was just saying that with AI agents these days, even more thoroughly obfuscated text can probably be de-obfuscated without much effort.

I suppose I don’t know data ingestion that well. Is de-obfuscating really something they do? If I was maintaining such a pipeline and found the associated garbage data, I doubt I’d bother adding a step for the edge case of getting the right caesar cipher to make text coherent. Unless I was fine-tuning a model for a particular topic and a critical resource/expert obfuscated their content, I’d probably just drop it and move on.

That said, after watching my father struggle deeply with the complex computer usage his job requires when he developed cataracts, I don’t see any such method as tenable. The proverbial “fuck you” to the disabled folks who interact with one’s content is deeply unacceptable. Accessible web content should be mandatory in the same way ramps and handicap parking are—if not more-so. For that matter, it shouldn’t take seeing a loved one slowly and painfully lose their able body to give a shit about accessibility. Point being, you’re right to be pissed and I’m glad this post had a direct response from somebody with direct personal experience needing accessible content so quickly after it went up.

NewsaHackO · 2025-12-14T18:53:04 1765738384

You are missing his point. He is not saying that the Caesar cipher is unbreakable by LLMs. These web scrapers are gathering a very large amount of data to train new LLMs. It is not feasible to use hundreds of thousands (millions?) of dollars to run petabytes of random, raw data into a frontier LLM model before using the data, just to catch one person possibly using a cipher to obfuscate their data. That is the value proposition: make your data slightly harder to scrape so that web scrapers for LLM training would rather let your data be unusable than make an investment to attempt to extract it.

fiddlerwoaroof · 2025-12-14T18:41:38 1765737698

Gemini (3.0 Thinking) solves it too.

esquivalience · 2025-12-14T18:26:24 1765736784

This is fairly highly accurate (from a skim read, close to but not quite 100%). The article describes fooling ChatGPT with a caeser cipher, but not a full test of the obfuscation in-practice.

mft_ · 2025-12-14T19:02:19 1765738939

Does (I’m assuming) your screen reader cope with text that’s displayed in (for example) a raster or a vector image?

Buttons840 · 2025-12-14T19:14:16 1765739656

Android's built-in OCR worked for me. I was able to copy the text.

ctoth · 2025-12-14T17:31:38 1765733498

Hi. I am not an evangelist -- I'm quite certain it's going to kill us all! But I would like to think that I'm about the closest thing to an AI booster you might find here, given that I get so much damn utility out of it. I'm interested in reading, I probably read too much! would you like to suggest a book we can discuss next week? I'd be happy to do this with you.

wizzwizz4 · 2025-12-15T12:49:58 1765802998

If you're "quite certain it's going to kill us all", then you are extremely foolish to not be opposing it. Do you think there's some kind of fatalistic inevitability? If so… why? Conjectures about the inevitable behaviour of AI systems only apply once the AI systems exist.

ctoth · 2025-12-18T18:28:16 1766082496

You're on a plane. The plane is going to crash. You aren't a pilot. There's WiFi on the plane. Do you use the WiFi before it crashes?

ctoth · 2025-12-14T17:26:50 1765733210

The viral post going around? The one where the author's own root cause analysis says "Human Error"[0]?

What's the base rate of humans rm -rf'ing their own work?

[0] https://blog.toolprint.ai/p/i-asked-claude-to-wipe-my-laptop

lkjdsklf · 2025-12-14T17:34:28 1765733668

If you read hte post, he didn't ask it to delete his home directory. He misread the command it generated and approved it when he shouldn't have.

That's literally exactly the kind of non-determinism I'm talking about. If he'd just left the agent to it's own devices, the exact same thing would have happened.

now you may argue this highlights that people make catastrophic mistakes too, but I'm not sure i agree.

Or at least, they don't often make that kind of mistake. Not saying that they don't make any catastrophic mistakes (they obviously do....)

We know people tend to click "accept" on these kinds of permission prompts with only a cursory read of what it's doing. And the more of these prompts you get, the more likely you are to just click "yes" or whatever to get through it..

If anything this kind of perfectly highlights some of the ironies referenced in the post itself.