Seems cute, but ultimately not very valuable without benchmarks or some kind of ...

jelling · 2025-10-11T16:39:55 1760200795

Same. We've all fooled ourselves into believing that an LLM / stochastic process was finally solved based on a good result. But the sample size is always to low to be meaningful.

anuramat · 2025-10-11T17:24:15 1760203455

even if it works as described, I'm assuming it's extremely model dependent (eg book prerequisites), so you'd have to re-run this for every model you use, this is basically poor man's finetuning;

maybe explicit support from providers would make it feasible?