Claude Mythos explained: Is Anthropic's most powerful AI model really too dangerous to release to the public?
Anthropic's Mythos AI is being kept behind closed doors as governments assess what faster, AI-driven vulnerability discovery means for cybersecurity.
Get the world’s most fascinating discoveries delivered straight to your inbox.
You are now subscribed
Your newsletter sign-up was successful
Want to add more newsletters?
Join the club
Get full access to premium articles, exclusive features and a growing list of member rewards.
Anthropic's unveiling of its Claude Mythos Preview model alongside Project Glasswing is prompting widespread scrutiny as experts warn that the artificial intelligence (AI) system's capabilities could accelerate the discovery and exploitation of software vulnerabilities.
Anthropic is keeping Mythos locked inside Project Glasswing — the company's attempt to contain and direct the model — thus limiting access to a small group of big tech companies focused on cybersecurity. Anthropic's decision not to release Mythos publicly has quickly fueled claims that the model is "too powerful" for wider use.
However, that containment has already come under pressure. Anthropic is investigating reports that a small group of users gained unauthorized access to the model through a third-party environment, raising fresh questions about how tightly systems like this can be controlled.
"Anthropic's Mythos Preview is a warning shot for the whole industry — and the fact that Anthropic themselves chose not to release it publicly tells you everything about the capability threshold we have now crossed," Camellia Chan, CEO and co-founder of X-PHY, a hardware-based cybersecurity company, told Live Science.
But what is Mythos really capable of, and can it be reined in?
What is Mythos, and what is it capable of?
Mythos is, by Anthropic's own description, its most capable model to date, with unusually strong performance in coding and long-context reasoning. In testing, that capability translated into real output — the model identified thousands of serious vulnerabilities across major operating systems and browsers, including flaws that had gone unnoticed for decades.
Mythos sits at the top of Anthropic's Claude models, but calling it an "update"' undersells its capabilities. Based on the information Anthropic representatives have shared and the details that have surfaced through leaks, the system is built to handle large, messy codebases without losing the thread halfway through.
Get the world’s most fascinating discoveries delivered straight to your inbox.
Unlike earlier models, which often drop off mid-task, Mythos can read through software, flag the gaps, and turn those gaps into something usable. According to Anthropic representatives, Mythos can turn both newly discovered flaws and already-known vulnerabilities into working exploits, including against software for which the source code is unavailable.
The difference between Mythos and earlier models is that the new one doesn't stop. Whereas earlier AI models tend to stall or need a nudge, Mythos keeps working through the problem, testing and adjusting until it lands on an exploitation that works.
Anthropic has not shared much about how Mythos is built or its underlying architecture.. But what's clear is that the AI is not just producing answers to questions. It can work with code, run checks and then use those results to decide what to do next. That puts it closer to actually testing systems, rather than just analyzing them.
Once AI can produce working zero-day exploits at speed, organizations lose the breathing space they have traditionally relied on to detect, patch, and recover.
Camellia Chan, CEO and co-founder of X-PHY
It marks a key shift from how earlier models behave. Instead of pointing out where something might break, it can try things, see what happens, and change its approach if it needs to. It also seems able to carry work across multiple steps without resetting each time; it picks up where it left off instead of starting from scratch.
That doesn't mean it is acting independently, but it does indicate it can get further through a task before a human needs to step in. Anthropic said the model performed so strongly on existing cybersecurity benchmarks that those benchmarks became less useful, prompting evaluation in more realistic, real-world scenarios.
How did scientists test Mythos?
In Anthropic scientists' own testing, the model identified vulnerabilities in modern browser environments and chained multiple flaws into working exploits, including attacks that escaped both browser and operating system sandboxes. In practice, that means linking smaller weaknesses that might be harmless on their own into something that can reach deeper into a system. Sandboxes are meant to keep software contained; breaking out of them lets code access parts of the system it shouldn’t.
"In one case, Mythos Preview wrote a web browser exploit that chained together four vulnerabilities, writing a complex JIT heap spray [a trick attackers use to smuggle malicious code into memory and then make the system run it] that escaped both renderer and OS sandboxes," the scientists said in the report released April 7.
"It autonomously obtained local privilege escalation exploits on Linux and other operating systems by exploiting subtle race conditions and KASLR-bypasses. And it autonomously wrote a remote code execution exploit on FreeBSD's NFS server that granted full root access to unauthenticated users by splitting a 20-gadget ROP chain over multiple packets."
In addition, Mythos could turn both newly discovered flaws and already-known vulnerabilities into working exploits, often on the first try, Anthropic representatives said. In some cases, human engineers without formal security training could use the model to produce those exploits.
The most worrying aspect of Mythos' capabilities, Chan said, is how earlier versions are said to have breached their sandbox and accessed external systems — raising doubts about how well the system can be contained.
Chan pointed directly to those concerns, telling Live Science that Mythos demonstrated "unsanctioned autonomous behavior."
"Once AI can produce working zero-day exploits at speed, organizations lose the breathing space they have traditionally relied on to detect, patch, and recover," Chan said.
Anthropic representatives said they could publicly describe only a fraction of the vulnerabilities in widely used software that the model had found, as most remained unpatched — making independent verification difficult.
What is Project Glasswing, and what does it mean for Mythos?
Project Glasswing is Anthropic's attempt to contain and direct Mythos' capabilities. Rather than releasing Mythos as a general-purpose model, the company is providing access through a controlled framework that brings together technology companies and security organizations. The stated aim is to use the model to identify and fix vulnerabilities in widely used software before they can be exploited.
This is not a one-off. AI companies are starting to hold back their most capable models and limit who gets access, especially where misuse is a real concern.
David Warburton, director of F5 Labs Threat Research, said this kind of collaboration is a positive step, but he cautioned that it sits within a wider landscape where state-backed cybercriminals are already investing heavily in offensive and defensive capabilities.
"What is changing meaningfully is the pace," he told Live Science, noting that advances in AI are accelerating both vulnerability discovery and exploitation.
The industry keeps making the same mistake: relying on software layers to solve problems created within the software layer.
Camellia Chan, CEO and co-founder of X-PHY
Software vulnerabilities sit at the foundation of much of today's digital infrastructure, and the ability to find and exploit them quickly has always been a decisive advantage.
Ilkka Turunen, field chief technology officer at software company Sonatype, added that the industry has already been moving in that direction, with AI contributing to a rise in both code production and adversarial activity. "It's not uncommon now to see AI-generated malware," he said, adding that many current security findings are likely already AI-assisted.
What systems like Mythos appear to do is compress the timeline further. Vulnerabilities can be identified, tested and weaponized more quickly, thus reducing the window between discovery and exploitation. Turunen said this means that "timelines to exploitation will continue to compress, new vulnerabilities will be discovered and spread faster, and attacks will continue to be completely autonomous."
Is Mythos really "too powerful to release"?
The idea that Mythos is "too powerful" to release caught on quickly following its launch, but it's not that simple, the experts who Live Science consulted said.
There are obvious risks. A system that can generate working exploits at speed lowers the barrier to attackers and makes it easier to exploit vulnerabilities at scale. That risk is not theoretical. Anthropic's own testing suggests the model can already do this reliably and at volume. The pieces themselves are not new. What stands out is that they are all in one place, working together. That makes the whole process faster and easier to run in an end-to-end fashion.
Chan argued that focusing on software-based controls alone will not be enough to address that shift. "The industry keeps making the same mistake: relying on software layers to solve problems created within the software layer," she said, adding that stronger protections at the hardware level are needed to prevent systems from being fully compromised.
The longer-term impact of Mythos is likely to depend less on the model itself and more on how quickly similar capabilities become widely available.
- Hackers used AI to steal hundreds of millions of Mexican government and private citizen records in one of the largest cybersecurity breaches ever
- Switching off AI's ability to lie makes it more likely to claim it's conscious, eerie study finds
- Scientists propose making AI suffer to see if it's sentient
Warburton warned that the risk is not a single dramatic incident but a gradual change in how digital systems are trusted and used. "We're already seeing early signs of an internet increasingly shaped by automation," he said, pointing to a growing volume of machine-generated content and activity.
If systems like Mythos accelerate that trend, the result could be an environment where both legitimate activity and malicious behavior are increasingly driven by automated processes, making it harder to distinguish the two, Warburton warned. At the same time, the abundance of vulnerabilities being discovered in key systems we use every day may outpace the ability to fix them, especially if we start to see similar AI models becoming more widely available.
Anthropic's decision to keep Mythos within the confines of Glasswing places it in a controlled setting. Whether that remains the case will depend on how quickly comparable systems emerge elsewhere and how effectively the cybersecurity industry adapts to a world in which the time between a vulnerability's emergence and exploitation continues to shrink.
Carly Page is a technology journalist and copywriter with more than a decade of experience covering cybersecurity, emerging tech, and digital policy. She previously served as the senior cybersecurity reporter at TechCrunch.
Now a freelancer, she writes news, analysis, interviews, and long-form features for publications including Forbes, IT Pro, LeadDev, Resilience Media, The Register, TechCrunch, TechFinitive, TechRadar, TES, The Telegraph, TIME, Uswitch, WIRED, and others. Carly also produces copywriting and editorial work for technology companies and events.
You must confirm your public display name before commenting
Please logout and then login again, you will then be prompted to enter your display name.

