Anthropic Claude Mythos: Cybersecurity Risks and AI Exploits

Anthropic’s Frontier Red Team has identified a qualitative leap in AI capabilities: Claude Mythos Preview has evolved from a sophisticated advisor into a robust tool for developing functional exploits. Unlike previous generations, Mythos independently identifies complex vulnerabilities, converts them into exploit primitives, and—most critically—links these fragments into complete attack chains. This transition from theoretical analysis to practical application prompted the company to launch Project Glasswing, a strategy of strictly controlled access that effectively replaces a traditional public release.

The Capability Ladder and ExploitBench

To quantify the AI’s offensive power without hyperbole, Anthropic partnered with researchers from Carnegie Mellon University and Bugcrowd to utilize the ExploitBench benchmark. Its key departure from legacy testing is its refusal to accept simple Proof of Concept (PoC) evidence. Where older models only needed to confirm a bug's reachability, ExploitBench demands the creation of full primitives to achieve Arbitrary Code Execution (ACE). In practice, Mythos Preview successfully navigates both stages, from vulnerability dissection to final payload assembly.

Mythos Preview is capable of both converting vulnerabilities into exploit primitives and merging them into full-scale attack chains.

Under the ExploitBench framework, the model had to navigate 16 programmatically verifiable difficulty levels. The results confirm that Mythos can architect attack logic to incrementally expand its system privileges—a capability previously considered a strictly human prerogative.

Quantitative Dominance in Cyber Operations

Mythos's dominance isn't limited to V8 engine-specific tasks. The model was tested across ExploitGym and an updated version of SCONE-bench (a joint project between MATS and Anthropic Fellows focusing on smart contract analysis). In all three scenarios, Mythos Preview outperformed competitors by a significant margin. We are witnessing a systemic lowering of the barrier to entry for cyber warfare; automating the path from bug discovery to full process control radically reduces the time and expertise required for an attack.

The expertise required for exploit development will inevitably plummet once Mythos-level capabilities become widely available.

Nevertheless, the model still faces limitations—it remains prone to code hallucinations as architecture complexity increases. However, Project Glasswing clearly signals a new reality: when AI can independently bridge the gap between a bug and a breach, the economics of defense shift against the enterprise. Infrastructure protection becomes exponentially more expensive when attackers can mass-produce structured exploits. Tech leaders now face a critical question: can defenders implement equivalent levels of automation, or will the current trajectory permanently tip the scales in favor of automated offense.

Source: Anthropic Research →

Rate this material

★ ★ ★ ★ ★

Artificial IntelligenceAI SafetyCybersecurityAnthropic

Anthropic’s Claude Mythos: The AI That Can Build Its Own Cyberattacks

The Capability Ladder and ExploitBench

Quantitative Dominance in Cyber Operations