Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Anthropic is, according to themselves, using RLAIF... which is basically using LLM as a judge / reward model. So maybe he means that the models they use for RLAIF are not (much?) more expensive than Sonnet 3.5 (e.g. previous Sonnet or Haiku 3 :)).


Do you have a link to Anthropic saying they use RLAIF?





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: