> LLMs are inherently safe because they can't do anything other than write text ...

ACCount37 · 2025-09-30T11:10:34 1759230634

That's like saying "humans are inherently safe because you can throw them in a jail forever and then there's nothing they can do".

But are all humans in jails? No, the practical reason being that it limits their usefulness. Humans like it better when other humans are useful.

The same holds for AI agents. The ship has sailed: no one is going to put every single AI agent in jail.

The "inherent safety" of LLMs comes only from their limited capabilities. They aren't good enough yet to fail in truly exciting ways.

IMTDb · 2025-09-30T15:42:51 1759246971

Humans are not inherently safe; there is very little you can do to prevent a human with a hammer to kill another one. In fact what you usually do with these humans is to put them in jail because they have no direct ability to hurt anyone.

LLM are in jail: an LLM outputting {"type": "function", "function": {"name": "execute_bash", "parameters": {"command": "sudo rm -rf /"}}} isn't unsafe. The unsafe part is the scaffolding around the LLM that will fuckup your entire filesystem. And my whole point is that there are ways to make that scaffolding safe. There is a reason why we have permissions on a filesystem, why we have read only databases etc etc.

ACCount37 · 2025-10-01T09:07:36 1759309656

That's just plain wrong.

For scaffolding to be "safe", you basically need that scaffolding to know exactly what the LLM is being used for, and outsmart it at every turn if it misbehaves. That's impractical-to-impossible. There are tasks that need access for legitimate reasons - like human tasks that need hammer access - and the same access can always be used for illegitimate reasons.

It's like trying to engineer a hammer that can't be used to bludgeon someone to death. Good fucking luck.