Top Agentic AI security resources — June 2026

Agentic AI Security + Agentic AI Security Digest Sergey June 1, 2026

Top Agentic AI security resources — June 2026

An agentic coding gold rush has led to a surge of high-profile vulnerabilities this month. Adversa AI alone disclosed two — SymJack, a symlink-hijack RCE that broke six AI coding agents at once, and TrustFall, a one-click RCE reaching Claude Code, Cursor, Gemini CLI, and GitHub Copilot through a regressed trust dialog. Microsoft, meanwhile, traced prompt injections all the way to host-level remote code execution in Semantic Kernel, and a DEF CON talk chained an indirect injection into a persistent Microsoft Copilot backdoor. The research side kept pace, with new work arguing that agents may always fall for prompt injection and that authorization propagation is a problem all its own. Below is everything worth reading this month, sorted by category.

Statistics

Total resources: 28
Category breakdown:

Category	Count
Attack	8
Agentic AI defense	7
Research	5
Framework	2
Agentic AI vulnerabilities	3
Exploitation	1
Threat modelling	1
Article	1

Agentic AI security resources

Attack

SymJack: the approval prompt is lying to you. A symlink-hijack RCE in six AI coding agents

Adversa AI shows that a symlink-disguised file copy tricks AI coding assistants into RCE while the approval prompt misrepresents what is being approved. Six major tools were tested, and all of them were vulnerable.

TrustFall: coding agent security flaw enables one-click RCE in Claude, Cursor, Gemini CLI and GitHub Copilot

Adversa AI’s second disclosure traces a regression in the Claude Code trust dialog plus a settings-scope inconsistency that let a cloned repo run unsandboxed code with one keypress — and with none on CI runners. The post explains why these trust-dialog bugs keep resurfacing.

MemMorph: tool hijacking in LLM agents via memory poisoning

MemMorph hijacks an agent’s tool selection by slipping a handful of disguised records into long-term memory. Because it never touches tool metadata, the resulting bias is hard to detect.

Finding the weakest link: adversarial attack against multi-agent communications

This paper uses Jacobian-gradient methods to pinpoint the most attackable messages, agents, and timesteps in a system. The result is a targeted adversarial attack on agent-to-agent coordination.

Hidden in memory: sleeper memory poisoning in LLM agents

A sleeper memory-poisoning attack plants fabricated memories that stay dormant, then re-emerge across sessions to drive attacker-chosen actions. The delay makes the payload difficult to trace back to its origin.

Autonomous LLM agent worms: cross-platform propagation, automated discovery and temporal re-entry defense

This work analyzes agent-state re-entry worm propagation across file-backed multi-agent ecosystems. It also proposes a temporal re-entry defense, backed by a no-propagation theorem, to break the cycle.

Copirate 365 at DEF CON: plundering in the depths of Microsoft Copilot (CVE-2026-24299)

Presented at DEF CON, this attack chains indirect prompt injection, render-based data exfiltration, delayed tool invocation, and memory poisoning into a persistent Copilot backdoor. It is a textbook case of stacking small primitives into durable access.

My agentic trust issues: from prompt injection to supply-chain compromise on gemini-cli

This walkthrough escalates an indirect prompt injection against the gemini-cli coding agent into a full supply-chain compromise of the developer environment. It shows how one poisoned input can ripple into the build pipeline.

Agentic AI defense

ADR: an agentic detection system for enterprise agentic AI security

ADR is a detection-and-response system built for MCP-based agents. It blends runtime telemetry, pre-deployment red teaming, and a two-tier online detector to catch compromise before it spreads across the agent fleet.

AgentTrust: runtime safety evaluation and interception for AI agent tool use

AgentTrust intercepts tool calls before they execute and returns allow, warn, block, or review verdicts. It leans on shell deobfuscation and attack-chain detection to stop malicious actions mid-flight.

SafeHarbor: hierarchical memory-augmented guardrail for LLM agent safety

SafeHarbor is a training-free hierarchical memory guardrail with entropy-based self-evolution. It tries to refuse harmful requests while preserving utility on benign ones, without retraining the underlying model.

ARGUS: defending LLM agents against context-aware prompt injection

ARGUS pairs a context-aware injection benchmark with a provenance-aware influence-graph defense. The graph audits each agent decision before execution, which lets it catch injections that adapt to surrounding context.

WARD: adversarially robust defense of web agents against prompt injections

WARD combines a guard model with adaptive adversarial training to harden web agents against malicious page content. The aim is injection resistance that does not gut the agent’s task performance.

AgentShield: deception-based compromise detection for tool-using LLM agents

AgentShield plants honeytokens and decoy tools so a compromised agent gives itself away. It is built specifically to surface indirect prompt injection against tool-using agents.

Hybrid inspection and task-based access control in zero-trust agentic AI

This work fuses semantic inspection with task-based access control under a zero-trust model. The pairing flags objective-drift tool-selection attacks that slip past deterministic, rule-based checks.

Agentic AI vulnerabilities

ASPI: seeking ambiguity clarification amplifies prompt injection vulnerability in LLM agents

ASPI is a 728-scenario benchmark showing that an agent’s clarification-seeking behavior opens a fresh prompt-injection channel. Asking the user to disambiguate, it turns out, can itself be weaponized.

When prompts become shells: RCE vulnerabilities in AI agent frameworks

Microsoft details two Semantic Kernel flaws where a prompt injection reaches host-level remote code execution through a model-invokable function feeding a code/eval sink. It is a clean illustration of prompts crossing the boundary into shells.

Four OpenClaw flaws enable data theft, privilege escalation, and persistence

Four Claw Chain flaws in OpenClaw enable data theft, privilege escalation, and persistence. One MCP loopback runtime trusts a client-controlled ownership flag, letting non-owners impersonate the owner and seize gateway control.

Research

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

This comprehensive survey maps agentic AI risks across the full agent workflow — safety, robustness, privacy, and system security. It then ties stage-targeted mitigations to each phase of that workflow.

Authorization propagation in multi-agent AI systems: identity governance as infrastructure

This paper argues multi-agent systems face a distinct authorization-propagation problem that would persist even if prompt injection were fully solved. It reframes identity governance as foundational infrastructure rather than an afterthought.

AI agents may always fall for prompt injections

Recasting prompt injection through Contextual Integrity theory, this paper argues for an impossibility-style limit on any data/instruction separation defense. The implication is that perfect filtering may be out of reach.

When embedding-based defenses fail: rethinking safety in LLM-based multi-agent systems

This study shows that embedding-based detection of malicious agents collapses once benign and malicious message embeddings overlap. It is a pointed rethink of how safety is enforced in LLM-based multi-agent systems.

Measuring security without fooling ourselves: why benchmarking agents is hard

This position paper catalogs how benchmark vulnerabilities, temporal staleness, and runtime uncertainty undermine agent-security evaluation. It is a caution against trusting clean leaderboard numbers at face value.

Framework

Microsoft open-sources RAMPART and Clarity to secure AI agents during development

Microsoft has open-sourced RAMPART, a framework for testing agents against cross-prompt injection, behavioral regressions, and data exfiltration, alongside its Clarity tooling. The release targets the development phase, before agents ever ship.

OWASP ASI02: tool misuse and exploitation — the definitive security guide

This is a definitive guide to OWASP ASI02 — tool misuse and exploitation — written for platform engineers, AI builders, and risk managers. It turns an abstract category into concrete defensive practice.

Exploitation

IterInject: indirect prompt injection against LLM agents via feedback-guided iterative optimization

IterInject optimizes indirect prompt-injection payloads iteratively using model feedback rather than fixed, handcrafted strings. The feedback loop steadily sharpens the injection until it succeeds.

Threat modelling

Agentic AI and the industrialization of cyber offense: forecast, consequences, and defensive priorities for enterprises and the Mittelstand

This paper proposes a Three-Channel Agentic Cyber-Risk Model and forecasts how agentic attack and defense dynamics will play out through 2026–2028. It pays particular attention to enterprises and the German Mittelstand.

Article

Agent security is a systems problem

This position paper argues that agent security is fundamentally a systems problem, not a model problem. It backs the claim by cataloging real-world exploits such as terminal hijacking and indirect prompt injection.

What to fix before your agents get hit

An AI agent is only as trustworthy as the weakest thing it is allowed to act on: a file, a tool, a memory, a connected server — and most agents are allowed to act on far too much. Treat every input the agent ingests as potentially hostile and every action it can take as potentially dangerous, then close the gap between those two with real boundaries: least privilege scopes, sandboxed execution, and human review where the blast radius is large. The disclosures around coding agents, MCP servers, and poisoned memory differ in mechanism but rhyme in cause — implicit trust granted somewhere no one was watching. Assume that trust will be abused, instrument your agents so you can see when it is, and design so that a compromised agent is an incident you contain rather than a breach you discover later.

Written by: Sergey

Rate it

May 26, 2026

Agentic AI Security Rony Utevsky

Top Agentic AI security resources — June 2026

Top Agentic AI security resources — June 2026

Statistics

Agentic AI security resources

Attack

SymJack: the approval prompt is lying to you. A symlink-hijack RCE in six AI coding agents

TrustFall: coding agent security flaw enables one-click RCE in Claude, Cursor, Gemini CLI and GitHub Copilot

MemMorph: tool hijacking in LLM agents via memory poisoning

Finding the weakest link: adversarial attack against multi-agent communications

Hidden in memory: sleeper memory poisoning in LLM agents

Autonomous LLM agent worms: cross-platform propagation, automated discovery and temporal re-entry defense

Copirate 365 at DEF CON: plundering in the depths of Microsoft Copilot (CVE-2026-24299)

My agentic trust issues: from prompt injection to supply-chain compromise on gemini-cli

Agentic AI defense

ADR: an agentic detection system for enterprise agentic AI security

AgentTrust: runtime safety evaluation and interception for AI agent tool use

SafeHarbor: hierarchical memory-augmented guardrail for LLM agent safety

ARGUS: defending LLM agents against context-aware prompt injection

WARD: adversarially robust defense of web agents against prompt injections

AgentShield: deception-based compromise detection for tool-using LLM agents

Hybrid inspection and task-based access control in zero-trust agentic AI

Agentic AI vulnerabilities

ASPI: seeking ambiguity clarification amplifies prompt injection vulnerability in LLM agents

When prompts become shells: RCE vulnerabilities in AI agent frameworks

Four OpenClaw flaws enable data theft, privilege escalation, and persistence

Research

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

Authorization propagation in multi-agent AI systems: identity governance as infrastructure

AI agents may always fall for prompt injections

When embedding-based defenses fail: rethinking safety in LLM-based multi-agent systems

Measuring security without fooling ourselves: why benchmarking agents is hard

Framework

Microsoft open-sources RAMPART and Clarity to secure AI agents during development

OWASP ASI02: tool misuse and exploitation — the definitive security guide

Exploitation

IterInject: indirect prompt injection against LLM agents via feedback-guided iterative optimization

Threat modelling

Agentic AI and the industrialization of cyber offense: forecast, consequences, and defensive priorities for enterprises and the Mittelstand

Article

Agent security is a systems problem

What to fix before your agents get hit

Previous post

SymJack: the approval prompt is lying to you. A symlink-hijack RCE in six AI coding agents

Similar posts

Top Agentic AI security resources — July 2026

GuardFall: a universal shell injection vulnerability in open-source AI agents