Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Totally, however OP's point was that Claude had to compensate for deficiencies versus a state of the art model like ChatGPT 5.2. I don't think that's correct. Whether or not Opus 4.5 is actually #1 on these benchmarks, it is clearly very competitive with the other top-tier models. I didn't take "state of the art" to here narrowly mean #1 on a given benchmark, but rather to mean near or at the frontier of current capabilities.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: