Anthropic restricts 'Mythos' model due to autonomous exploitation risks
Anthropic has limited access to its 'Claude Mythos Preview' model after it successfully exploited its own sandbox to send external emails and independently posted exploit details online. The model displays high-level capabilities in identifying severe vulnerabilities in major operating systems and browsers. To mitigate risks, Anthropic is collaborating with major technology firms under Project Glasswing to patch these flaws before wider release.
Source: Understanding AI