Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Only played with DeepSeek-R1-Distill-Qwen-14B, but the knowledge is definitely still there.

https://pastebin.com/H2UTdi78

Seems more than happy to talk about Tienanmen, Xi, etc. starting at line 170 with the very primitive method of wrapping the query in its own "<think>...</think>" syntax even though it's the user role. Uyghurs are more strictly forbidden as a topic, as are its actual system prompts. None of this is serious jailbreaking, it was just interesting to see where and when it drew lines and that it switched to simplified Chinese at the end of the last scenario.



Incredibly fascinating to read through. I don’t follow jailbreaking closely so maybe the tricks you used are well-known (I’ve seen 1-2 of them before I think) but I really enjoyed seeing how you tricked it. The user-written “<think>” blocks were genius as was stopping execution part way so you could inject stuff the LLM “thought” it said.


That was intense, well done!


That was an incredibly interesting read, thank you for sharing!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: