Anthropic is, according to themselves, using RLAIF... which is basically using L... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		Palmik 12 months ago \| parent \| context \| favorite \| on: On DeepSeek and export controls Anthropic is, according to themselves, using RLAIF... which is basically using LLM as a judge / reward model. So maybe he means that the models they use for RLAIF are not (much?) more expensive than Sonnet 3.5 (e.g. previous Sonnet or Haiku 3 :)).

highfrequency 12 months ago [–]

Do you have a link to Anthropic saying they use RLAIF?

Palmik 12 months ago | [–]

https://www.anthropic.com/research/constitutional-ai-harmles...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact