'The best solution is to murder him in his sleep': AI can learn violent tendencies from each other despite zero references to violence in training data

Scientists found that AI models can inherit a taste for murder (or owls) from other models' training data.

An illustration of two faces wearing masks looking at each other in front of a blue background. The mask on the left is white with purple eyes while the one on the right is black with red eyes.
A new study hints at the darker aspects of Large Language Models (LLMs).
(Image credit: DKosig via Getty Images)

Large language models (LLMs) are secretly teaching each other unwanted habits through seemingly benign training data, scientists say.

The phenomenon, known as "subliminal learning," occurs when a pretrained "teacher" artificial intelligence (AI) model is used to generate the training data for a smaller, "student" model.

Owen Hughes is a freelance writer and editor specializing in data and digital technologies. Previously a senior editor at ZDNET, Owen has been writing about tech for more than a decade, during which time he has covered everything from AI, cybersecurity and supercomputers to programming languages and public sector IT. Owen is particularly interested in the intersection of technology, life and work ­– in his previous roles at ZDNET and TechRepublic, he wrote extensively about business leadership, digital transformation and the evolving dynamics of remote work.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.