'The best solution is to murder him in his sleep': AI models can send subliminal messages that teach other AIs to be 'evil,' study claims

Malicious traits can spread between AI models while being undetectable to humans, Anthropic and Truthful AI researchers say.

Illustration of two AI chatbots sharing ideas
AI models can share secret messages between themselves that are undetectable to humans, experts have warned.
(Image credit: Eugene Mymrin/Getty Images)

Artificial intelligence (AI) models can share secret messages between themselves that appear to be undetectable to humans, a new study by Anthropic and AI safety research group Truthful AI has found.

These messages can contain what Truthful AI director Owain Evans describedas “evil tendencies," such as recommending users to eat glue when bored, sell drugs to quickly raise money, or murder their spouse.

Adam Smith
Live Science Contributor

Adam Smith is a UK-based technology journalist who reports on the social and ethical impacts of emerging technologies. He has written for major outlets including Reuters, The Independent, The Guardian, PCMag, and The New Statesman. His coverage focuses on AI ethics, digital privacy, corporate surveillance, and misinformation, examining how technology influences power and individual freedoms.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.