Acing this new AI exam — which its creators say is the toughest in the world — might point to the first signs of AGI

Humanity’s Last Exam is a PhD-level benchmark designed to test the limits of AI reasoning. Although Google’s Gemini 3 scored a staggering 48.4%, experts stress that this does not indicate the arrival of artificial general intelligence (AGI).

A human brain model made by needle felting.
A new test, called "Humanity’s Last Exam," is designed to measure how close today's most powerful artificial intelligence models are to meeting or exceeding human-level knowledge.
(Image credit: Richard Drury/Getty Images)

Researchers at the Center for AI Safety and Scale AI have published "Humanity’s Last Exam" — a test designed to measure how close today’s most powerful artificial intelligence (AI) models are to meeting or exceeding human-level knowledge across several domains.

The test was launched in January 2025, but scientists outlined the framework and their thinking behind its design for the first time in a new study published Jan. 28 in the journal Nature. It contains a corpus of 2,500 questions across more than 100 subjects, with input from more than 1,000 subject-matter experts from 500 institutions across 50 countries.

Tristan is a U.S-based science and technology journalist. He covers artificial intelligence (AI), theoretical physics, and cutting-edge technology stories.

His work has been published in numerous outlets including Mother Jones, The Stack, The Next Web, and Undark Magazine.

Prior to journalism, Tristan served in the US Navy for 10 years as a programmer and engineer. When he isn’t writing, he enjoys gaming with his wife and studying military history.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.