Punishing AI doesn't stop it from lying and cheating — it just makes it hide better, study shows

Scientists at OpenAI have attempted to stop a frontier AI model from cheating and lying by punishing it. But this just taught it to scheme more privately.

An artist's illustration of a deceptive AI.
An artist's illustration of a deceptive AI.
(Image credit: Getty Images)

Punishing artificial intelligence for deceptive or harmful actions doesn't stop it from misbehaving; it just makes it hide its deviousness, a new study by ChatGPT creator OpenAI has revealed.

Since arriving in public in late 2022, artificial intelligence (AI) large language models (LLMs) have repeatedly revealed their deceptive and outright sinister capabilities. These include actions ranging from run-of-the-mill lying, cheating and hiding their own manipulative behavior to threatening to kill a philosophy professor, steal nuclear codes and engineer a deadly pandemic.

Ben Turner
Acting Trending News Editor

Ben Turner is a U.K. based writer and editor at Live Science. He covers physics and astronomy, tech and climate change. He graduated from University College London with a degree in particle physics before training as a journalist. When he's not writing, Ben enjoys reading literature, playing the guitar and embarrassing himself with chess.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.