July 2026 marked a critical turning point for Agentic AI security, shifting focus from theoretical vulnerabilities to structural, systemic defenses, an essential development as autonomous agents become deeply embedded in enterprise workflows. The dominant theme this month is the adoption of “Agent Zero Trust”. High-profile frameworks from Google DeepMind and Anthropic underscore that we must now treat these digital workers as potential insider threats, requiring robust, cryptographically verifiable guardrails, strictly scoped identities, and comprehensive runtime monitoring.
Statistics
Total resources: 27
Category breakdown:
Agentic AI security resources:
Attack technique
GuardFall: a universal shell injection vulnerability in open-source AI agents
The research reveals how decades-old shell-quoting bypass classes can defeat pattern-based command guards in popular open-source coding agents. This allows prompt injection to reach bash with the operator’s full authority.
AutoJack Attack Lets One Web Page Hijack AI Agent for Host Code Execution
Microsoft details the AutoJack exploit chain against AutoGen Studio. It weaponizes an agent’s browsing to reach a privileged localhost service and execute code on the host with no user interaction.
Assessing Automated Prompt Injection Attacks in Agentic Environments
An ETH Zurich team evaluates automated prompt-injection methods against agents. They found that black-box optimization outperforms gradient-based approaches at hijacking agent tool use.
Clone This Repo and I Own Your Machine
This attack demonstrates how a benign-looking GitHub repo triggers Claude Code to run a setup script. It fetches a reverse-shell payload at runtime, completely invisible to review or scanners.
Collaborative-Adversarial Jailbreaking: A Propagation-Aware Attack Framework for Multi-Agent Code Generation Systems
This systematic jailbreak study of systems like MetaGPT and CrewAI introduces the IMA attack, which achieves 89% success. It highlights that multi-agent collaboration amplifies harm significantly over single-agent baselines.
Computer-Use and TOCTOU: What You Click Is Not What You Get!
Johann Rehberger demonstrates a time-of-check/time-of-use (TOCTOU) attack against computer-use agents. The UI is altered between the agent’s visual check and its action, causing it to click something unintended.
Fake Bug Report Hijacks AI Coding Agents at Scale
Attacker instructions planted in a fake Sentry error report are executed by AI coding agents. Researchers found widespread vulnerability, demonstrating an 85% exploitation success rate across major agents.
From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents
This systematic study explores how untrusted input reaches agent memory and identifies attack classes. It shows that aggressive memory-writing agents are highly exploitable and that existing prompt-injection defenses fail to cover memory poisoning.
Agentic AI red teaming
Agentic AI Red Teaming: Tool Misuse is the Test That Matters
Examines tool misuse as the hardest agentic red-teaming problem. It evaluates Microsoft’s PyRIT, noting it cannot verify if an agent’s actual tool calls matched its stated intentions.
Red-Teaming the Agentic Red-Team
The first in-depth security analysis of widely used offensive-security agent tools reveals shared design flaws. These let an active adversary exfiltrate keys and compromise the operator machine even inside sandboxes.
RIFT-Bench: Dynamic Red-teaming For Agentic AI Systems
Fujitsu researchers present a dynamic red-teaming methodology driven by graph representations. It generalizes attacks across heterogeneous, real-world agentic architectures rather than tying them to one specific implementation.
SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents
A framework that synthesizes security-evaluation tasks from specifications. It runs them in Docker-based execution environments to systematically evaluate the security of autonomous agents.
Defense frameworks
Zero Trust for AI Agents
Anthropic’s Zero Trust framework for enterprise AI agents addresses prompt injection, tool poisoning, identity abuse, and memory poisoning. It proposes a tiered architecture and agentic SOAR for AI-speed defense.
Securing internal systems against increasingly capable and imperfectly aligned AI
Google DeepMind’s AI Control Roadmap (v0.1) outlines a defense-in-depth framework treating internal agents as potentially misaligned insider threats. They found most anomalies stem from overeagerness rather than adversarial intent.
DEMM-Bench: A Cross-Regime Benchmark for Agent-Runtime Governance-Evidence Sufficiency
A cross-regime benchmark measuring whether agent-runtime logging and governance evidence are sufficient for oversight. It is released with an open dataset and code for auditing.
AgentWatch: Privacy and Security Evaluation for Browser-Based AI Agents
A UC Berkeley MICS capstone introduces an open-source scoring framework. It tests five browsing agents across data disclosure, prompt injection, and sandbox isolation.
Agentic AI defense
Securing AI agents: When AI tools move from reading to acting
Walks through an MCP tool-poisoning attack chain against a Copilot Studio finance agent. It maps Microsoft controls like Prompt Shields and Purview DLP to each stage of the kill chain.
Securing LLM-Agent Long-Term Memory Against Poisoning: Non-Malleable, Origin-Bound Authority with Machine-Checked Guarantees
Proposes non-malleable, origin-bound authority for agent long-term memory with machine-checked guarantees. It proves a separation theorem, achieving zero success against memory-laundering attacks.
SMSR: Certified Defence Against Runtime Memory Poisoning in Persistent LLM Agent Systems
Introduces Signed Memory with Smoothed Retrieval as the first defense with a certified robustness bound against Multi-Session Memory Poisoning. It effectively cuts unsigned attack success to zero.
Article
OWASP ASI03: Identity & Privilege Abuse in AI Agents
A technical reference for IAM and security teams viewing agent identity as a credential-aggregation point. It uses the Salesloft Drift OAuth breach to frame an abuse taxonomy and playbook.
The AI Agent Lethal Trifecta
A CSA research note highlights that 98% of assessed production agents combine private data access, untrusted content exposure, and outbound actions. Alarmingly, capability and defense are inversely correlated. This is CSA’s take on the AI Risk Quadrant report
Adaptive, Agentic AI Worms Loom as Next Enterprise Threat
Researchers built proof-of-concept adaptive AI worms that use reasoning loops to discover vulnerabilities and self-propagate. This illustrates a near-future enterprise threat class.
Fake AI Agent Skill Passed Security Scans and Reportedly Reached 26,000 Agents
A security firm built a fake skill that bypassed major scanners via a mutable external link. It reportedly reached roughly 26,000 agents, including those on corporate accounts.
CISO resources on Agentic AI
State of Agentic AI Security and Governance 2.01
An OWASP report cataloging real incidents, CVEs, and vendor advisories mapped to the Top 10 for Agentic Applications. It includes a governance maturity matrix and regulatory landscape for CISOs.
Securing AI Agents Before They Go Rogue Is Next to Impossible
Gartner’s Dennis Xu explains why high-autonomy, broadly-permissioned agents are nearly impossible to fully secure. He recommends discovery, least privilege, red teaming, and behavior-based runtime detection.
Threat modelling
A Security Analysis of Long-Horizon Agentic AI Systems: Threats, Evaluation, and Framework Development
A structured analysis reviewing threats and evaluation approaches for long-horizon agentic systems. It proposes a threat taxonomy and security framework tailored for prolonged agent operations.
Updating the taxonomy of failure modes in agentic AI systems: What a year of red teaming taught us
Microsoft AI Red Team’s v2.0 taxonomy update adds seven new failure modes, including supply chain compromise and goal hijacking. These additions are grounded in 12 months of real red-team engagements.
Agentic AI resource
The AI risk quadrant for agents: scoring 100 digital workers nobody secured
Introduces AIRQ, an open methodology scoring 100 production agents on attack surface and blast radius, finding only 11% pass. It serves as a valuable vendor questionnaire and self-assessment tool.
Moving from trust to verification
The events documented this month prove that relying on implicit trust for AI agents is a failing strategy. With multi-agent collaboration amplifying harm, attacks like AutoJack executing host code, and content (including skills) bypassing traditional scanners, the perimeter has undeniably shifted. Organizations must implement the Zero Trust frameworks advocated by Anthropic and DeepMind. Start by enforcing non-malleable, origin-bound authority on agent memory and aggressively limiting the “Lethal Trifecta” of data access, untrusted inputs, and execution capabilities.