Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
Palmik
12 months ago
|
parent
|
context
|
favorite
| on:
On DeepSeek and export controls
Anthropic is, according to themselves, using RLAIF... which is basically using LLM as a judge / reward model. So maybe he means that the models they use for RLAIF are not (much?) more expensive than Sonnet 3.5 (e.g. previous Sonnet or Haiku 3 :)).
highfrequency
12 months ago
[–]
Do you have a link to Anthropic saying they use RLAIF?
Palmik
12 months ago
|
parent
[–]
https://www.anthropic.com/research/constitutional-ai-harmles...
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: