Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The RLHF is happening too late i think. I think the reinforcement learning needs to be during the initial next token prodiction. On that note we need something to represent a complex world state than just language.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: