For the OpenAI case, its unclear. They've not disclosed the method yet. (Though ...

For the OpenAI case, its unclear. They've not disclosed the method yet. (Though they have previously had an official model that could query wolfram-alpha, so they're not strangers to that method)

But math olympiad questions have been beaten before by AlphaGeometry and a few other's using prolog or similar logic evaluation engines. And it works quite well. (Simply searching LLM prolog gives alot of results on Google and Google scholar)

If openai did it through brute forces text reasoning, its both impressive and frighteningly inefficient.

Even just normal algebra is something llms struggle with, hence using existing algebra solvers is faar more effective.