AI-driven exploitation is here: what Mythos proved and what comes next

Article + Agentic AI Security Sergey April 30, 2026

Anthropic’s Mythos model completed a 32-step corporate network attack autonomously in hours. The more important story is that this capability is not exclusive to Mythos, and the targets already include AI systems your teams built last year.

This month, the UK AI Safety Institute published its independent evaluation of Claude Mythos Preview. The headline result: in a simulated 32-step corporate network attack, reconnaissance through full network takeover, Mythos completed the entire sequence autonomously. No previous model had succeeded in expert-level CTF challenges before April 2025.

That is a proof of concept for what a motivated adversary builds next. It is also one event in a pattern that has been developing for over a year.

TL;DR

Mythos demonstrated autonomous, multi-step network attack execution at a level no prior model reached: 73% on expert CTFs, full network takeover completed autonomously
The capability is not exclusive to Mythos. Detection-grade AI vulnerability discovery is accessible to small, cheap models today. Exploit construction is following the same cost curve
AI-driven exploitation of traditional software CVEs is the current threat. Exploitation of AI systems themselves (agents, workflows, MCP integrations) is the next wave and it is already happening
AI security incidents have more than doubled since 2024: adversaries have already compromised AI security tools at 90 organizations, with the next wave targeting systems with write access to network infrastructure
Continuous AI-driven testing and remediation is the only way to keep pace with adversaries running the same methods

One event in a storm already underway

Mythos is significant, but it is only one data point.

AI-driven exploitation has been building as an operational threat for the better part of two years, largely below the threshold that produces headlines. What Mythos does is make the capability concrete enough that it cannot be dismissed as a lab curiosity. A model that moves autonomously through 32 steps of a corporate network compromise is a capability map for anyone looking to build something similar.

That storm is already producing incidents. Adversa AI’s Top AI Security Incidents Report 2025 documents that AI security incidents have more than doubled since 2024. Prompt injection, the primary attack vector against AI systems, accounts for 35.3% of documented incidents.

In 2025 alone, adversaries injected malicious prompts into AI security tools at more than 90 organizations, stealing credentials and cryptocurrency, according to VentureBeat. Every one of those compromised tools could only read data. The autonomous SOC agents now shipping into enterprise environments can rewrite firewall rules, modify IAM policies, and quarantine endpoints, all through approved API calls that EDR classifies as authorized activity. CrowdStrike’s 2026 Global Threat Report puts the broader trend in numbers: AI-enabled adversary operations grew 89% year-over-year.

Average number of steps completed on 'The Last Ones' (a 32-step simulated corporate network attack). Source: AISI — Average number of steps completed on ‘The Last Ones’ (a 32-step simulated corporate network attack). Source: AISI

This is not an exclusive capability

The Mythos story is worse than the headline suggests for a specific reason: you do not need Mythos to do most of this work.

Analysis published by Aisle after the AISI evaluation makes the point plainly. Buffer overflow detection, once the domain of expensive senior engineers, is now accessible to a 3.6B-parameter model costing $0.11 per million tokens. That model detected the same FreeBSD vulnerability Anthropic highlighted in its own research. Detection-grade autonomous vulnerability discovery is, at this point, available to anyone with a modest compute budget.

Exploit construction remains more frontier-dependent, particularly for complex, multi-stage techniques. But that capability is moving down the cost curve at the same pace as everything else in AI. What Mythos does today, models a fraction of its size will approximate within a few months. The pattern is identical to what happened with code generation, translation, and image synthesis.

“The moat in AI cybersecurity is the system, not the model”, as Aisle’s post-evaluation analysis concludes. The adversaries who will cause the most damage in 2027 don’t need to rely on a specific frontier model. They are building scaffolding, pipelines, and operational tooling now.

Two attack surfaces, one converging problem

Most enterprise security teams are working with a threat model that needs updating, because AI-driven exploitation is hitting two distinct surfaces at once.

The first is traditional software: buffer overflows, unpatched APIs, misconfigured cloud services. This surface is familiar. What AI adds is speed and reach: what a skilled red team tests across a week, autonomous AI hacking pipelines can probe across thousands of targets in hours. That compression in timelines compounds the risk of any unpatched exposure in your environment.

The second surface is AI systems themselves: custom LLM integrations built for internal tooling, agentic workflows connected to business data, MCP servers that let AI assistants read and write across your environment. Most of these were deployed in 2024 and 2025, and most were deployed fast. Adversarial testing was rarely on the checklist when the priority was shipping.

We run adversarial AI assessments against systems like these and know how long it takes to find the first exploitable flaw in a typical agentic workflow: hours, not weeks.

Those systems are being targeted now. Agentic AI systems are already producing the most irreversible attacks in the incident record: cross-tenant data exposure, unauthorized financial transactions, autonomous actions that cannot be rolled back. And where traditional CVE exploitation is bounded by patch velocity (CSA provides a good guideline on that), attacks on AI systems target configurations and behaviors that conventional scanners do not see.

This is not a future threat. The 2026 CISO AI Risk Report from Saviynt and Cybersecurity Insiders (235 CISOs surveyed) found that 47% had already observed AI agents exhibiting unintended behavior in production. Only 5% felt confident they could contain a compromised agent.

The window to secure your AI is already closing

The tooling for AI-driven exploitation at scale exists today, accessible to adversaries who do not need frontier model access. Open-weight models fine-tuned for offensive use cases are actively maintained and deployed. The AISI evaluation describes the risk threshold as capabilities that could “provide meaningful uplift to threat actors even without specialized knowledge”. That threshold has been crossed.

Defensive tooling also exists. The problem is application. Most of it is not directed at AI systems specifically, and almost none of it runs continuously. Scanning infrastructure with traditional tools does not catch prompt injection. Static analysis does not surface agentic misalignment. Annual penetration testing does not reflect 12 months of evolution in your AI stack and in AI attack techniques. The Saviynt report found that 86% of organizations do not enforce access policies for AI identities at all, and 75% of CISOs have found unsanctioned AI tools running in production with embedded credentials nobody monitors. The governance gap is shocking.

The true frontrunners aren’t necessarily the Fortune 500 companies already having Mythos AI access, but rather those actively building AI governance and validating their security through ongoing testing.

The only valid countermeasure

If the threat uses autonomous AI-driven exploitation, the defense has to move at the same speed. The attack surface is too large, the techniques change too fast, and the systems themselves change with every model update or new integration. A point-in-time assessment captures what was exploitable last quarter, and even issues discovered back then are partly still in the backlog of your IT and security teams.

We run these assessments continuously. When adversarial AI is turned against a freshly deployed integration, findings come back in hours. That is the pace the threat operates at, and the defense has to match it. Continuous, automated testing and remediation across models, agents, agentic workflows, and integration points is the only viable approach.

An actionable plan

Map your AI stack before you assess it. Most organizations lack a complete picture of what AI systems are running, who built them, and what data they access. Custom agents and MCP integrations are frequently undocumented. You cannot test what you have not mapped.

Separate your threat models. AI-driven exploitation of traditional CVEs calls for accelerating your existing vulnerability management program. AI-driven exploitation of AI systems requires dedicated testing that traditional tools do not provide. Treating both as the same problem is how AI-specific exposure goes unaddressed.

Run adversarial assessments on your AI systems before the next deployment cycle. For every agent or integration that touches sensitive data or has external-facing capabilities, understand what happens when it is probed with current AI attack techniques. Build a system to triage and prioritize findings semi-automatically.

Remediation will remain a fragmented effort, but as you automate responses to low-complexity findings, the team will be freed to tackle systemic challenges and complex technical issues.

Then make the whole process continuous.

That is the problem Adversa AI was built to solve. Our platform runs autonomous, continuously updated assessments across your full AI stack (models, custom agents, agentic workflows, and MCP integrations) and delivers risk-ranked findings with remediation your teams can act on immediately.

Your organization does not need to wait for the first AI-driven incident. The enterprises that will not be scrambling in three to six months are building this capability now.

Agentic AI Red Teaming Platform

Are you sure your agents are secured?

Let's try!

Written by: Sergey

Rate it

IICL involuntary in-context learning attack technique

April 23, 2026

Research admin

We broke GPT-5.4 safety with 10 examples and 2 words using a new attack technique — IICL

OpenAI’s newest flagship is more vulnerable to our attack than GPT-5 or GPT-5-mini. Newer doesn’t mean safer. Our new research (3,500+ probes, 10 models, 7 controlled experiments) shows why continuous ...

AI-driven exploitation is here: what Mythos proved and what comes next

One event in a storm already underway

This is not an exclusive capability

Two attack surfaces, one converging problem

The window to secure your AI is already closing

The only valid countermeasure

An actionable plan

Agentic AI Red Teaming Platform

Previous post

We broke GPT-5.4 safety with 10 examples and 2 words using a new attack technique — IICL

Similar posts

OWASP ASI02: tool misuse and exploitation — the definitive security guide

AI risk management insurance is tightening. Cyber insurance history shows exactly where it ends up.