Anthropic's Leaked Claude Mythos Model Reveals a Troubling Paradox: The Safety Company's Most Powerful AI

Anthropic, the AI company that positions itself as the safety-first alternative to competitors like OpenAI, just had nearly 3,000 unpublished internal documents exposed on the internet, revealing details about Claude Mythos, a new AI model the company itself warns could enable dangerous cyberattacks. Security researchers Roy Paz and Alexandre Pauwels discovered the leaked materials in a publicly accessible data cache on Anthropic's website, according to reporting from Fortune . The exposure highlights a troubling contradiction: the company that built its reputation on responsible AI development may have created something it's genuinely worried about releasing to the world.

What Is Claude Mythos, and Why Is Anthropic So Concerned About It?

Claude Mythos, also referred to internally as "Capybara," represents a new tier above Anthropic's existing lineup of Claude models, which includes Opus, Sonnet, and Haiku . According to the leaked draft announcement, the model is "larger and more intelligent than our Opus models" and achieves "dramatically higher scores on tests of software coding, academic reasoning, and cybersecurity" compared to Claude Opus 4.6 . In plain terms, this is a significantly more capable version of Claude designed to excel at complex reasoning and technical tasks.

What makes Claude Mythos unusual is Anthropic's own assessment of its dangers. The leaked documents reveal that the company believes Claude Mythos is "currently far ahead of any other AI model in cyber capabilities" and warned that it "presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders" . This isn't speculation from outside critics; it's Anthropic's internal evaluation of its own creation. The company confirmed the model's existence after Fortune contacted them on March 26, with a spokesperson describing the exposure as the result of "human error" in content management system configuration .

How Is Anthropic Managing the Release of Such a Powerful Model?

Because of these cybersecurity concerns, Anthropic is taking an unusual approach to Claude Mythos's rollout. Rather than releasing it broadly, the company is restricting early access to organizations focused on cyber defense, giving them time to harden their systems before a broader release . This strategy reflects a genuine attempt to prevent the model from being weaponized, but it also raises questions about whether such safeguards can actually work once a model with these capabilities exists.

The company has not disclosed specific benchmark results for Claude Mythos, making independent verification of its capabilities impossible. Additionally, Anthropic has not explained why a model it considers to carry unprecedented cybersecurity risks was described in a publicly accessible data store, nor how long the materials were exposed before researchers discovered them . These unanswered questions undermine confidence in the company's ability to manage sensitive information about its most powerful systems.

  • Model Positioning: Claude Mythos sits above the existing Opus, Sonnet, and Haiku tiers, representing a significant leap in capability and complexity.
  • Cyber Capabilities: Anthropic claims the model is "currently far ahead of any other AI model in cyber capabilities," with the ability to identify and exploit vulnerabilities faster than human defenders can patch them.
  • Access Strategy: Early access is limited to cyber defense organizations only, with no public timeline for broader deployment announced.
  • Training Costs: Reports indicate the model is extremely expensive to train and operate, though Anthropic has not disclosed specific pricing or deployment strategy.

Why Does This Leak Matter in the Current Political Climate?

The timing of this leak is particularly significant given the ongoing tension between Anthropic and the U.S. government. CEO Dario Amodei has refused to remove guardrails restricting Claude's use for mass surveillance and autonomous weapons, leading the Trump administration to blacklist the company . Under Secretary of War Emil Michael, the Pentagon's most vocal Anthropic critic, responded to the leak on social media with "Umm...hello? Is it not clear yet that we have a problem here?" . Michael has previously called Amodei a "liar" with a "god complex" who wants to "personally control the US military" .

A judge temporarily blocked the Department of Defense from labeling Anthropic a security risk on Thursday, adding another layer to the ongoing legal and political dispute . The leaked Claude Mythos details will likely intensify this conflict, as critics argue that a company refusing to work with the military is now developing AI systems with unprecedented offensive cyber capabilities. For Anthropic, the leak represents not just a security failure, but a public relations disaster in an already hostile political environment.

What Questions Remain Unanswered?

Despite Anthropic's confirmation of Claude Mythos's existence, several critical details remain unclear. The company has not detailed what safeguards, if any, would accompany a wider release, or how it plans to prevent the model from being used offensively despite being trained with capabilities that could enable precisely that . It also remains unclear whether the model's cyber capabilities were an intentional design goal or an emergent property of scaling up the model's size and training data.

The financial implications are equally uncertain. Reports indicate the model is extremely expensive to train and operate, and Anthropic has not clarified its deployment strategy or pricing for the new Capybara tier . For a company that has positioned itself as the responsible alternative in AI development, the leak of Claude Mythos raises uncomfortable questions about whether safety and capability can truly be balanced, or whether building more powerful AI systems inevitably means building more dangerous ones.