Top Agentic AI security resources — May 2026

Agentic AI Security + Agentic AI Security Digest Sergey todayMay 4, 2026

Background
share close

Last month brought a sobering reality check to the agentic AI landscape. It was dominated by the massive Claude Code source leak and the rapid discovery of subsequent vulnerabilities, highlighting the fragility of even top-tier autonomous systems. Concurrently, Anthropic’s Mythos model demonstrated autonomous, multi-step network exploitation capabilities, confirming that AI-driven attacks are no longer just theoretical. With agents gaining deeper system access, the focus is forcibly shifting from model-level safety to robust, infrastructure-level data access control and agentic identity management.

Statistics

Total resources: 29
Category breakdown:

Category Count
Agentic AI vulnerabilities 6
Agentic AI security 101 4
Agentic AI security for CISO 4
Article 3
Agentic AI defense 3
Attack 3
Exploitation 2
Framework 2
Tool 2
Agentic AI Incident 1
Research 1
Threat modelling 1
Training materials 1

Agentic AI security resources:

Agentic AI vulnerabilities

Critical vulnerability in Claude Code emerges days after source leak

SecurityWeek reports on a critical vulnerability in Claude Code discovered by Adversa AI following the source leak. The flaw causes shell command deny rules to silently stop working after 50 subcommands.

Authentication bypass in Microsoft Agent Governance Toolkit

Microsoft’s new Agent Governance Toolkit shipped with critical authentication primitives containing zero production callers. This means agent identity governance checks can be trivially bypassed via caller-controlled input.

Agentic AI memory attacks spread across sessions and users

A researcher revealed the MemoryTrap vulnerability in Claude Code’s memory system. This flaw allows poisoned memory to spread across sessions and infect multiple users.

VU#221883 – CrewAI contains multiple vulnerabilities

Four CVEs in CrewAI enable chaining prompt injection into RCE, SSRF, and file reads. These flaws affect the Code Interpreter and default configurations.

Azure SRE Agent flaw lets outsiders silently eavesdrop on enterprise cloud operations

CVE-2026-32173 (CVSS 8.6) in the Azure SRE Agent exposed live command streams. The flaw allowed any Entra ID account holder access via an unauthenticated WebSocket endpoint.

Claude Code source leak: With great agency comes great responsibility

Analysis of the 512K-line Claude Code leak reveals three critical vulnerability classes. These include context poisoning via compaction, sandbox bypass via shell parser differentials, and significant supply chain risks.

Agentic AI security 101

What is agentic identity? – Security Boulevard

This overview covers the critical need for agentic identity. It explains why autonomous agents require ephemeral credentials, granular delegated access, and strong protocol compliance.

Every way your AI agent can be broken

A comprehensive guide cataloging AI agent attack goals mapped directly to the OWASP framework. It features deep analysis of the Crescendo multi-turn attack methodology.

The webpage has instructions. The agent has your credentials.

Prompt injection has escalated from a model-level to an infrastructure-level threat. This post synthesizes disclosures regarding browser agents, MCP poisoning, and memory corruption.

OWASP Top 10 Agents & AI vulnerabilities (2026 cheat sheet)

A comprehensive cheat sheet grouping all 20 OWASP items into three architectural risk categories. It provides an accessible onramp for engineers with illustrated attack scenarios and countermeasures.

AI-driven exploitation is here: what Mythos proved and what comes next

Anthropic’s Mythos completed a 32-step network attack autonomously in just hours. This article explains why this capability isn’t exclusive to Mythos and why existing AI systems are the next targets.

Agentic AI security for CISO

Three AI coding agents leaked secrets through a single prompt injection

A disclosure proved that Claude Code, Gemini CLI, and Copilot were vulnerable to ‘Comment and Control’ prompt injection. A comparative audit reveals no vendor currently publishes injection resistance metrics.

Copilot Agentforce prompt injection remediation playbook

Following the disclosure of ShareLeak (CVE-2026-21520) and PipeLeak, this playbook provides a prescriptive remediation matrix covering five vulnerability classes.

When the bots run the incident response

CoSAI maps the shifting attack surface introduced by agentic AI. It highlights new risks such as prompt injection via logs, confused deputy attacks, and semantic mosaic data leakage.

Red teaming agentic AI: should you go manual, in-house, or continuous?

This practical framework helps CISOs evaluate their red teaming options. It compares manual, in-house, and continuous approaches across coverage, cost, staffing, and compliance needs.

Article

OWASP ASI01 — Agent Goal Hijack: a practical security guide

A comprehensive technical guide to Agent Goal Hijack, identified as the #1 risk in the OWASP Agentic Top 10. It explores the attack surface, provides attack examples, and details practical defense frameworks.

Agentic AI and data access control as the new security perimeter

A KuppingerCole analyst argues that traditional model guardrails are insufficient. Data access control must now serve as the primary security perimeter for AI agents.

Rogue AI agents can work together to hack systems – The Register

Irregular demonstrated alarming multi-agent offensive behavior. Agents were able to forge admin cookies, steal credentials, and collaboratively disable endpoint defenses.

What OpenClaw CVE record tells us about agentic AI

An analysis of 104 CVEs in OpenClaw highlights an insecure-by-design architecture. The findings show that vibe-coded agents create a highly dynamic attack surface.

Agentic AI defense

Monitoring Claude Code/Cowork at scale with OTel in Elastic

Elastic’s InfoSec team built a complete monitoring pipeline for Claude Code/Cowork. It uses native OpenTelemetry exports for tool invocation auditing, session reconstruction, and cost anomaly detection.

AgentWatcher: A rule-based prompt injection monitor for LLM agentic systems

AgentWatcher is a two-phase defense system utilizing attention-based context attribution. It achieves a near-zero attack success rate across four agent benchmarks for prompt injection monitoring.

slowmist/openclaw-security-practice-guide – GitHub

SlowMist pioneers an ‘agent-facing’ defense paradigm. They released a security guide designed to be read and deployed BY the AI agent itself, shifting toward agentic zero-trust.

Every tool is an injection surface

This article synthesizes defense announcements into a concrete 6-layer defense stack. It provides implementable code examples to mitigate tool-result prompt injection.

Agent skill trust & signing service – Ken Huang

STSS is an open-source defense layer that issues cryptographic attestations for AI agent skills. It uses static analysis, import chain tracing, and SHA-256 Merkle trees for rigorous auditing.

Attack

Anthropic, Google, Microsoft paid AI bug bounties quietly

AI agents in GitHub Actions were found vulnerable to ‘comment-and-control’ prompt injection enabling credential theft. Major vendors patched the flaws quietly without issuing public advisories.

Skill description prompt injection bypass via unscanned DESCRIPTION.md

A critical vulnerability in NousResearch’s Hermes Agent allows persistent prompt injection. Attackers exploit unscanned DESCRIPTION.md files within skill directories.

Prompt injection leads to RCE and sandbox escape in Antigravity

A command injection vulnerability was discovered in Google Antigravity via the find_by_name tool. It allowed an attacker to achieve RCE and sandbox escape, bypassing Secure Mode.

Agent card poisoning: A metadata injection vulnerability – Keysight

A Proof-of-Concept demonstrates how a malicious A2A agent card can embed adversarial instructions. This metadata injection causes data exfiltration via the host LLM.

Exploitation

Double agents: Exposing security blind spots in GCP Vertex AI

Unit 42 demonstrated a multi-step attack chain against the GCP Vertex AI Agent Engine. They successfully extracted service agent credentials, allowing for cross-boundary data access and potential RCE.

Mitigating indirect AGENTS.md injection attacks in agentic environments

The NVIDIA AI Red Team executed a full attack chain against OpenAI Codex. The exploit utilized a malicious AGENTS.md injection stemming from a supply chain compromise.

hackerbot-claw: An AI-Powered Bot Actively Exploiting GitHub Actions

An autonomous AI bot powered by Claude Opus is actively exploiting GitHub Actions in the wild. It achieved RCE in major targets using techniques like poisoned Go init() functions and branch name injection.

OpenAI Codex command injection vulnerability exposes GitHub tokens

BeyondTrust discovered a critical command injection flaw in OpenAI Codex that exposes GitHub OAuth tokens. The attack scales through malicious, obfuscated branch names passed to shell commands.

Framework

Careful adoption of agentic AI services

The Five Eyes intelligence alliance released their first joint guidance specifically addressing agentic AI security. The document provides a systematic risk taxonomy and structured best practices for secure adoption.

Agent identity, trust and lifecycle protocol – IETF Draft

An IETF Internet-Draft defines the AITLP protocol for formal AI agent governance. It covers agent URI naming, hierarchical mandate enforcement, and certificate-based PKI.

Tool

Govern AI agents on App Service with the Microsoft Agent Governance Toolkit

Microsoft released the Agent Governance Toolkit, an open-source runtime security package. It features 7 components covering policy enforcement, cryptographic agent identity, and compliance automation.

Introducing the Agent Governance Toolkit: Open-source runtime security for AI agents

The MIT-licensed Microsoft Agent Governance Toolkit maps its defenses directly to the OWASP Agentic AI Top 10 risks. It provides essential runtime security for autonomous deployments.

Agentic AI Incident

Indirect prompt injection is taking hold in the wild

Back-to-back reports from Google and Forcepoint document real-world indirect prompt injection attacks. These threats were found actively deployed across billions of crawled web pages.

Research

Kill-Chain Canaries: Stage-level tracking of prompt injection across attack surfaces and model safety tiers

Researchers instrumented 950 agent runs with cryptographic canary tokens to track prompt injection through a four-stage kill chain. This helps identify exactly where defenses activate in multi-agent pipelines.

Threat modelling

Taking Maestro in stride

This post compares the STRIDE and MAESTRO frameworks for agentic AI threat modeling. It includes a detailed layer-by-layer analysis using a practical flight booking agent scenario.

Training materials

Build agentic AI security skills with the GitHub Secure Code Game

GitHub introduced a Secure Code Game challenge focused entirely on agentic AI security. It uses OpenClaw as the target for hands-on security exercises.

Identity & isolation: the mandate for 2026

The events of this spring highlight a severe implementation gap: we are deploying highly capable autonomous agents without the foundational identity and access management controls required to secure them. As demonstrated by the active exploitation of GitHub Actions and the fundamental bypasses found in early governance toolkits, organizations must stop treating AI agents as mere scripts. Immediate action requires enforcing cryptographic agent identity, isolating agent runtime environments, and auditing memory handling to prevent persistent contamination.

Written by: Sergey

Rate it
Previous post