Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> if you could do this automatically, it would be game changer as you could run top 5 best models in parallel and select best answer every time

remember they have access to the RLHF reward model, against which they can evaluate all N outputs and have the most "rewarded" answer picked and sent



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: