Yeah I’m not seeing any evidence that this actually works. I would’ve like to se...

Yeah I’m not seeing any evidence that this actually works. I would’ve like to see some testing where they intentionally introduce a bug (ideally a tricky bug in a part of code that isn’t directly changed by the diff) and see if Claude catches it.

A good middle ground could be to allow the diff to land once the “AI quick check” passed, then keep the full test suite running in the background. If they run them side by side for a while and see that the AI quick check caught the failing test every time, I’d be convinced.