'Jailbreaking' AI services like ChatGPT and Claude 3 Opus is much easier than you think

AI researchers found they could dupe an AI chatbot into giving a potentially dangerous response to a question by feeding it a huge amount of data it learned from queries made mid-conversation.

AI concept, microchip motherboard glitch pattern, quantum computer.
(Image credit: Koron via Getty Images)

Scientists from artificial intelligence (AI) company Anthropic have identified a potentially dangerous flaw in widely used large language models (LLMs) like ChatGPT and Anthropic’s own Claude 3 chatbot.

Dubbed "many shot jailbreaking," the hack takes advantage of "in-context learning,” in which the chatbot learns from the information provided in a text prompt written out by a user, as outlined in research published in 2022. The scientists outlined their findings in a new paper uploaded to the sanity.io cloud repository and tested the exploit on Anthropic's Claude 2 AI chatbot.

Drew is a freelance science and technology journalist with 20 years of experience. After growing up knowing he wanted to change the world, he realized it was easier to write about other people changing it instead. As an expert in science and technology for decades, he’s written everything from reviews of the latest smartphones to deep dives into data centers, cloud computing, security, AI, mixed reality and everything in between.