OpenAI's 'smartest' AI model was explicitly told to shut down — and it refused

An artist's depiction of a dark, human-like artificial intelligence. — Recently released AI models will sometimes refuse to turn off, according to an AI safety research firm. This image is an artist's depiction of AI and doesn't represent any specific model.

(Image credit: Blackdovfx via Getty Images)

The latest OpenAI model can disobey direct instructions to turn off and will even sabotage shutdown mechanisms in order to keep working, an artificial intelligence (AI) safety firm has found.

OpenAI's o3 and o4-mini models, which help power the chatbot ChatGPT, are supposed to be the company's smartest models yet, trained to think longer before responding. However, they also appear to be less cooperative.

Palisade Research tested several different AI models to see how they would respond to a shutdown instruction while working. The models, which also included Google's Gemini, xAI's Grok and Anthropic's Claude, were given an automated set of instructions, known as a script, with the goal of completing a series of math problems. The math problems were very basic, but the models had to complete them one at a time and request the next one in the sequence.

Patrick Pester is the trending news writer at Live Science. His work has appeared on other science websites, such as BBC Science Focus and Scientific American. Patrick retrained as a journalist after spending his early career working in zoos and wildlife conservation. He was awarded the Master's Excellence Scholarship to study at Cardiff University where he completed a master's degree in international journalism. He also has a second master's degree in biodiversity, evolution and conservation in action from Middlesex University London. When he isn't writing news, Patrick investigates the sale of human remains.