Mathematicians devised novel problems to challenge advanced AIs' reasoning skills — and they failed almost every test

Current AI models struggle to solve research-level math problems, with the most advanced AI systems we have today solving just 2% of the hundreds of challenges faced.

By Stephanie Pappas

published 19 November 2024

in News

Equations shown in a digital format. — The researchers tested six state-of-the-art AI models against the new benchmark and the best score registered by a single system was 2%.

(Image credit: hh5800/Getty Images)

Mathematicians have stumped the most advanced generative artificial intelligence (AI) models with a series of mind-bending new math problems.

These problems typically require doctorate-level mathematicians hours to days to solve, according to the research institute Epoch AI. But in the new tests, the most advanced AI models on the market got correct answers on less than 2% of these problems.

"These are extremely challenging," 2006 Fields Medal winner Terence Tao, a mathematician at UCLA, wrote in a review of the problems for Epoch AI. "I think that in the near term basically the only way to solve them, short of having a real domain expert in the area, is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages."

"[E]ven when a model obtained the correct answer, this does not mean that its reasoning was correct," the paper authors wrote. "For instance, on one of these problems running a few simple simulations was sufficient to make accurate guesses without any deeper mathematical understanding. However, models' low overall accuracy shows that such guessing strategies do not work on the overwhelming majority of FrontierMath problems."

Stephanie Pappas is a contributing writer for Live Science, covering topics ranging from geoscience to archaeology to the human brain and behavior. She was previously a senior writer for Live Science but is now a freelancer based in Denver, Colorado, and regularly contributes to Scientific American and The Monitor, the monthly magazine of the American Psychological Association. Stephanie received a bachelor's degree in psychology from the University of South Carolina and a graduate certificate in science communication from the University of California, Santa Cruz.