Scientists create 'toxic AI' that is rewarded for thinking up the worst possible questions we could imagine

Researchers at MIT are using machine learning to teach large language models not to give toxic responses to provoking questions, using a new method that replicates human curiosity.

An illustration of a scientist standing in front of a huge robot head.
Curiosity-driven red teaming (CRT) relies on using an AI to generate increasingly dangerous and harmful prompts that you could ask an AI chatbot.
(Image credit: Moor Studio via Getty Images)

The newest tool in the battle to prevent an artificial intelligence (AI) agent from being dangerous, discriminatory and toxic is another AI that is itself dangerous, discriminatory and toxic, scientists say.

The new training approach, based on machine learning, is called curiosity-driven red teaming (CRT) and relies on using an AI to generate increasingly dangerous and harmful prompts that you could ask an AI chatbot. These prompts are then used to identify how to filter out dangerous content.

Drew is a freelance science and technology journalist with 20 years of experience. After growing up knowing he wanted to change the world, he realized it was easier to write about other people changing it instead. As an expert in science and technology for decades, he’s written everything from reviews of the latest smartphones to deep dives into data centers, cloud computing, security, AI, mixed reality and everything in between.