Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From the article abstract: "All experiments were done with safety precautions (e.g., sandboxing, human oversight)."

Do the authors really believe "safety" is necessary, that is, there is a risk that somethign goes wrong ? What kind of risk ?



From what I understand, alignment and interpretability were rewarded as part of the optimization function. I think it is prudent that we bake in these "guardrails" early on.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: