What am I missing? As suspicious as benchmarks are, your link shows GPT 5.2 to be superior.
It is also out of date as it does not include 5.2 Codex.
Per my point about steerability compensated for by modalities and other harness features: Opus 4.5 scores 58% while GPT 5.2 scores 75% for the instruction following benchmark in your link! Thanks for the hard evidence - GPT 5.2 is 30% ahead of Opus 4.5 there. No wonder Claude Code needs those harness features for the user to manually reign in control over its instruction following capability.
It is also out of date as it does not include 5.2 Codex.
Per my point about steerability compensated for by modalities and other harness features: Opus 4.5 scores 58% while GPT 5.2 scores 75% for the instruction following benchmark in your link! Thanks for the hard evidence - GPT 5.2 is 30% ahead of Opus 4.5 there. No wonder Claude Code needs those harness features for the user to manually reign in control over its instruction following capability.