Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> LLMs are inherently safe because they can't do anything other than write text

That is still very much the case; the danger comes from what you do from the text that is generated.

Put a developer in a meeting room and no computer access, no internet etc; and let him scream instructions through the window. If he screams "delete prod DB", what do you do ? If you end up having to restore a backup that's on you, but the dude inherently didn't do anything remotely dangerous.

The problem is that the scaffolding people put around LLM is very weak, the equivalent of saying "just do to everything the dude is telling, no question asked, no double check in between, no logging, no backups". There's a reason our industry has development policies, 4 eyes principles, ISO/SOC standards. There already are ways to massively improve the safety of code agents; just put Claude code in a BSD jail and you already have a much safer environment than what 99% of people are doing, this is not that tedious to make. Other safer execution environments (command whitelisting, arguments judging, ...) will be developed soon enough.



That's like saying "humans are inherently safe because you can throw them in a jail forever and then there's nothing they can do".

But are all humans in jails? No, the practical reason being that it limits their usefulness. Humans like it better when other humans are useful.

The same holds for AI agents. The ship has sailed: no one is going to put every single AI agent in jail.

The "inherent safety" of LLMs comes only from their limited capabilities. They aren't good enough yet to fail in truly exciting ways.


Humans are not inherently safe; there is very little you can do to prevent a human with a hammer to kill another one. In fact what you usually do with these humans is to put them in jail because they have no direct ability to hurt anyone.

LLM are in jail: an LLM outputting {"type": "function", "function": {"name": "execute_bash", "parameters": {"command": "sudo rm -rf /"}}} isn't unsafe. The unsafe part is the scaffolding around the LLM that will fuckup your entire filesystem. And my whole point is that there are ways to make that scaffolding safe. There is a reason why we have permissions on a filesystem, why we have read only databases etc etc.


That's just plain wrong.

For scaffolding to be "safe", you basically need that scaffolding to know exactly what the LLM is being used for, and outsmart it at every turn if it misbehaves. That's impractical-to-impossible. There are tasks that need access for legitimate reasons - like human tasks that need hammer access - and the same access can always be used for illegitimate reasons.

It's like trying to engineer a hammer that can't be used to bludgeon someone to death. Good fucking luck.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: