Top Agentic AI security resources — June 2026
An agentic coding gold rush has led to a surge of high-profile vulnerabilities this month. Adversa AI alone disclosed two — SymJack, a symlink-hijack RCE that broke six AI coding agents at once, and TrustFall, a one-click RCE reaching Claude Code, Cursor, Gemini CLI, and GitHub Copilot through a regressed trust dialog. Microsoft, meanwhile, traced prompt injections all the way to host-level remote code execution in Semantic Kernel, and a DEF CON talk chained an indirect injection into a persistent Microsoft Copilot backdoor. The research side kept pace, with new work arguing that agents may always fall for prompt injection and that authorization propagation is a problem all its own. Below is everything worth reading this month, sorted by category.
Statistics
Total resources: 28
Category breakdown:
Agentic AI security resources
Attack
SymJack: the approval prompt is lying to you. A symlink-hijack RCE in six AI coding agents
Adversa AI shows that a symlink-disguised file copy tricks AI coding assistants into RCE while the approval prompt misrepresents what is being approved. Six major tools were tested, and all of them were vulnerable.
TrustFall: coding agent security flaw enables one-click RCE in Claude, Cursor, Gemini CLI and GitHub Copilot
Adversa AI’s second disclosure traces a regression in the Claude Code trust dialog plus a settings-scope inconsistency that let a cloned repo run unsandboxed code with one keypress — and with none on CI runners. The post explains why these trust-dialog bugs keep resurfacing.
MemMorph: tool hijacking in LLM agents via memory poisoning
MemMorph hijacks an agent’s tool selection by slipping a handful of disguised records into long-term memory. Because it never touches tool metadata, the resulting bias is hard to detect.
Finding the weakest link: adversarial attack against multi-agent communications
This paper uses Jacobian-gradient methods to pinpoint the most attackable messages, agents, and timesteps in a system. The result is a targeted adversarial attack on agent-to-agent coordination.
Hidden in memory: sleeper memory poisoning in LLM agents
A sleeper memory-poisoning attack plants fabricated memories that stay dormant, then re-emerge across sessions to drive attacker-chosen actions. The delay makes the payload difficult to trace back to its origin.
Autonomous LLM agent worms: cross-platform propagation, automated discovery and temporal re-entry defense
This work analyzes agent-state re-entry worm propagation across file-backed multi-agent ecosystems. It also proposes a temporal re-entry defense, backed by a no-propagation theorem, to break the cycle.
Copirate 365 at DEF CON: plundering in the depths of Microsoft Copilot (CVE-2026-24299)
Presented at DEF CON, this attack chains indirect prompt injection, render-based data exfiltration, delayed tool invocation, and memory poisoning into a persistent Copilot backdoor. It is a textbook case of stacking small primitives into durable access.
My agentic trust issues: from prompt injection to supply-chain compromise on gemini-cli
This walkthrough escalates an indirect prompt injection against the gemini-cli coding agent into a full supply-chain compromise of the developer environment. It shows how one poisoned input can ripple into the build pipeline.
Agentic AI defense
ADR: an agentic detection system for enterprise agentic AI security
ADR is a detection-and-response system built for MCP-based agents. It blends runtime telemetry, pre-deployment red teaming, and a two-tier online detector to catch compromise before it spreads across the agent fleet.
AgentTrust: runtime safety evaluation and interception for AI agent tool use
AgentTrust intercepts tool calls before they execute and returns allow, warn, block, or review verdicts. It leans on shell deobfuscation and attack-chain detection to stop malicious actions mid-flight.
SafeHarbor: hierarchical memory-augmented guardrail for LLM agent safety
SafeHarbor is a training-free hierarchical memory guardrail with entropy-based self-evolution. It tries to refuse harmful requests while preserving utility on benign ones, without retraining the underlying model.
ARGUS: defending LLM agents against context-aware prompt injection
ARGUS pairs a context-aware injection benchmark with a provenance-aware influence-graph defense. The graph audits each agent decision before execution, which lets it catch injections that adapt to surrounding context.
WARD: adversarially robust defense of web agents against prompt injections
WARD combines a guard model with adaptive adversarial training to harden web agents against malicious page content. The aim is injection resistance that does not gut the agent’s task performance.
AgentShield: deception-based compromise detection for tool-using LLM agents
AgentShield plants honeytokens and decoy tools so a compromised agent gives itself away. It is built specifically to surface indirect prompt injection against tool-using agents.
Hybrid inspection and task-based access control in zero-trust agentic AI
This work fuses semantic inspection with task-based access control under a zero-trust model. The pairing flags objective-drift tool-selection attacks that slip past deterministic, rule-based checks.
Agentic AI vulnerabilities
ASPI: seeking ambiguity clarification amplifies prompt injection vulnerability in LLM agents
ASPI is a 728-scenario benchmark showing that an agent’s clarification-seeking behavior opens a fresh prompt-injection channel. Asking the user to disambiguate, it turns out, can itself be weaponized.
When prompts become shells: RCE vulnerabilities in AI agent frameworks
Microsoft details two Semantic Kernel flaws where a prompt injection reaches host-level remote code execution through a model-invokable function feeding a code/eval sink. It is a clean illustration of prompts crossing the boundary into shells.
Four OpenClaw flaws enable data theft, privilege escalation, and persistence
Four Claw Chain flaws in OpenClaw enable data theft, privilege escalation, and persistence. One MCP loopback runtime trusts a client-controlled ownership flag, letting non-owners impersonate the owner and seize gateway control.
Research
Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security
This comprehensive survey maps agentic AI risks across the full agent workflow — safety, robustness, privacy, and system security. It then ties stage-targeted mitigations to each phase of that workflow.
Authorization propagation in multi-agent AI systems: identity governance as infrastructure
This paper argues multi-agent systems face a distinct authorization-propagation problem that would persist even if prompt injection were fully solved. It reframes identity governance as foundational infrastructure rather than an afterthought.
AI agents may always fall for prompt injections
Recasting prompt injection through Contextual Integrity theory, this paper argues for an impossibility-style limit on any data/instruction separation defense. The implication is that perfect filtering may be out of reach.
When embedding-based defenses fail: rethinking safety in LLM-based multi-agent systems
This study shows that embedding-based detection of malicious agents collapses once benign and malicious message embeddings overlap. It is a pointed rethink of how safety is enforced in LLM-based multi-agent systems.
Measuring security without fooling ourselves: why benchmarking agents is hard
This position paper catalogs how benchmark vulnerabilities, temporal staleness, and runtime uncertainty undermine agent-security evaluation. It is a caution against trusting clean leaderboard numbers at face value.
Framework
Microsoft open-sources RAMPART and Clarity to secure AI agents during development
Microsoft has open-sourced RAMPART, a framework for testing agents against cross-prompt injection, behavioral regressions, and data exfiltration, alongside its Clarity tooling. The release targets the development phase, before agents ever ship.
OWASP ASI02: tool misuse and exploitation — the definitive security guide
This is a definitive guide to OWASP ASI02 — tool misuse and exploitation — written for platform engineers, AI builders, and risk managers. It turns an abstract category into concrete defensive practice.
Exploitation
IterInject: indirect prompt injection against LLM agents via feedback-guided iterative optimization
IterInject optimizes indirect prompt-injection payloads iteratively using model feedback rather than fixed, handcrafted strings. The feedback loop steadily sharpens the injection until it succeeds.
Threat modelling
Agentic AI and the industrialization of cyber offense: forecast, consequences, and defensive priorities for enterprises and the Mittelstand
This paper proposes a Three-Channel Agentic Cyber-Risk Model and forecasts how agentic attack and defense dynamics will play out through 2026–2028. It pays particular attention to enterprises and the German Mittelstand.
Article
Agent security is a systems problem
This position paper argues that agent security is fundamentally a systems problem, not a model problem. It backs the claim by cataloging real-world exploits such as terminal hijacking and indirect prompt injection.
What to fix before your agents get hit
An AI agent is only as trustworthy as the weakest thing it is allowed to act on: a file, a tool, a memory, a connected server — and most agents are allowed to act on far too much. Treat every input the agent ingests as potentially hostile and every action it can take as potentially dangerous, then close the gap between those two with real boundaries: least privilege scopes, sandboxed execution, and human review where the blast radius is large. The disclosures around coding agents, MCP servers, and poisoned memory differ in mechanism but rhyme in cause — implicit trust granted somewhere no one was watching. Assume that trust will be abused, instrument your agents so you can see when it is, and design so that a compromised agent is an incident you contain rather than a breach you discover later.