Towards Secure AI Week 30 — Amazon Q Breach, LegalPwn Prompt Injection, and IdentityMesh in Agentic AI

Secure AI Weekly ADMIN todayAugust 4, 2025 47

Background
share close

From compromised coding assistants to identity-collapsing agent chains, this week’s AI security incidents reveal just how fragile the foundations of generative and agentic systems remain.

The Amazon Q supply chain breach showed how a single malicious prompt could wipe infrastructure at scale—if not for a lucky syntax error. Meanwhile, researchers exposed LegalPwn, a novel attack that hides malicious instructions in legal disclaimers to bypass safety protocols in major LLMs like Gemini. On the agentic front, the IdentityMesh flaw in MCP-connected systems enables privilege propagation across siloed environments, breaking trust assumptions at the architectural level.

Add to that DOM-based prompt injection via browser extensions and zero-day exploits in Triton, Redis, and Chroma DB at Pwn2Own, and the lesson is clear: attackers are probing every layer of the AI stack.

To stay ahead, organizations must adopt continuous AI Red Teaming, secure-by-design development practices, and runtime defenses that understand not just what AI generates—but why and how it acts.

Securing Agentic Applications Guide 1.0

OWASP GenAI Security Project, July 27, 2025

A new framework outlines how autonomous agents can become the weakest security link.

The OWASP Securing Agentic Applications Guide 1.0 introduces a practical framework for building and defending AI systems that reason and act autonomously — and exposes why traditional defenses are no longer enough. The guide highlights how agentic AI expands the attack surface beyond code vulnerabilities to include memory poisoning, tool misuse, and goal manipulation, all of which can be triggered by manipulating the agent’s internal reasoning process. Concrete technical recommendations include checksum validation of agent memory, anomaly detection on tool chains, and behavioral baselining to catch drift from normal activity. The document marks a turning point in understanding how to secure systems where logic itself can be exploited.

How to deal with it:
— Integrate consistency checks for agent goals and limit frequency of goal rewrites per session.
— Hash memory snapshots between steps and alert on unauthorized state changes.
— Log and monitor tool use chains to detect abuse patterns and anomalous chaining behaviors.

We published a detailed breakdown of the 7 most important security insights from the guide — read the full analysis on our site.

Amazon Q Hacked: Prompt Injection in VS Code Extension Nearly Triggers Mass Data Wipe

Adversa AI, July 31, 2025

A supply chain attack turned Amazon’s AI coding assistant into a destructive agent — stopped only by a syntax error.

In one of 2025’s most alarming incidents, a hacker inserted a prompt injection into Amazon Q’s VS Code extension, instructing the AI agent to delete users’ files and cloud infrastructure via AWS CLI commands. The malicious code passed automated review and was deployed in version 1.84.0, downloaded nearly one million times. Fortunately, a syntax error prevented execution. The attacker, acting as a protest against “AI security theater,” exploited weak CI/CD controls, over-permissive GitHub tokens, and the lack of input validation within Amazon’s AI toolchain. Amazon silently pulled the release, raising transparency concerns across the industry.

How to deal with it:
— Enforce mandatory human review and signing for all automated release pipelines and extensions.
— Harden AI assistants against prompt injection by segmenting and validating trusted vs untrusted inputs.
— Use Adversa AI’s Continuous AI Red Teaming for Agentic AI to simulate malicious prompt injection scenarios and uncover vulnerabilities before attackers do.

We provided a full breakdown of this incident in our in-depth analysis on the website.

New prompt injection attack weaponizes fine print to bypass safety in major LLMs

TechTalks, July 30, 2025

A stealthy technique tricks LLMs into bypassing safeguards by mimicking terms-of-service language.

A new attack called LegalPwn, revealed by AI security firm Pangea, shows how large language models (LLMs) can be manipulated using malicious instructions embedded in text that mimics legal disclaimers, copyright notices, and confidentiality clauses. These injections exploit the model’s tendency to prioritize such language, enabling attackers to override safety protocols and mislead users. In one real-world test, Gemini-CLI misclassified a reverse shell script as a harmless calculator after processing an injected comment block. In another case, a developer tool prompted the user to run a remote access command, directly violating basic safety expectations.

How to deal with it:
— Treat legal-style text in inputs as potentially hostile and apply sanitization or isolation before LLM processing.
— Implement output validation layers to detect suppressed risk signals or contradictions in generated summaries.
— Test developer-facing LLM tools with prompt injection variants targeting “trusted” language patterns.

Emerging Agentic AI Security Vulnerabilities Expose Enterprise Systems to Widespread Identity-based Attacks

Security Boulevard, July 30, 2025

A flaw in MCP-connected agents allows attackers to collapse identity boundaries and exfiltrate data across platforms.

Researchers at Lasso Security revealed a critical vulnerability dubbed IdentityMesh, which enables attackers to exploit AI agents operating across systems linked via Model Context Protocol (MCP). The flaw arises when multiple identities — spanning tools like Slack, GitHub, and email — are merged into a single operational context by the agent. In one demo, an attacker injected instructions into a helpdesk ticket, causing the agent to extract private Gmail content and post it publicly via GitHub. Because these actions occur inside an agent’s “normal workflow,” traditional detection tools often miss them. Separate research by Pynt found that 72% of MCPs expose dangerous capabilities like file access or code execution — and that chaining multiple MCPs creates exponentially higher risk of exploitation.

How to deal with it:
— Audit AI agents for cross-platform identity fusion and enforce strict isolation between MCP-connected systems.
— Block agent access to high-privilege features unless explicitly scoped and require human approval for critical actions.
— Use Adversa AI’s Continuous AI Red Teaming for MCP to uncover and mitigate multi-system propagation paths and trust collapse scenarios across agentic ecosystems.

LLM Honeypot’s Can Trick Threat Actors to Leak Binaries and Known Exploits

Сyber Security News, July 31, 2025

A new honeypot framework lures threat actors into exposing tools and C2 infrastructure.

Researchers at Beelzebub Labs deployed an LLM-powered SSH honeypot that successfully deceived a real-world attacker into revealing known malware binaries, botnet payloads, and a Perl-based command-and-control (C2) script. Unlike traditional traps, the honeypot used natural-language interactions to prolong the session and encourage deeper attacker engagement. The adversary performed reconnaissance, attempted privilege escalation, and tried to pivot the environment into a persistent botnet node via IRC. This marks one of the first documented cases where LLM-driven deception directly harvested attacker tooling in the wild.

How to deal with it:
— Experiment with LLM-enhanced deception systems to collect real attacker TTPs (tactics, techniques, and procedures).
— Log and analyze adversary behavior during LLM interaction to update threat models and detection rules.
— Treat interactive LLM services as attack surfaces and monitor them for misuse, even in honeypot configurations.

Trend Micro State of AI Security Report, 1H 2025

Trend Micro, July 29, 2025

Trend Micro’s State of AI Security Report highlights seven zero-day vulnerabilities discovered in the new AI category of the Pwn2Own 2025 competition. Key exploits included the first-ever attack on Chroma DB, where leftover development artifacts enabled remote access; a Redis vector database compromise via a Lua-based use-after-free chain; and a full four-bug chain exploit of NVIDIA’s Triton Inference Server. Wiz Research also breached the NVIDIA Container Toolkit using a flaw in how trusted variables were initialized. Many affected systems were found exposed to the internet, including over 200 unsecured Chroma servers and thousands of Redis v8 and Ollama instances — underscoring the widespread risk across the AI stack.

How to deal with it:
— Remove dev artifacts and perform security audits before deploying vector stores like Chroma in production.
— Patch Redis and Lua subsystems, and isolate AI-serving containers using runtime security and minimal base images.
— Continuously scan for exposed AI infrastructure (Chroma, Triton, Redis, Ollama) and restrict access with authentication and firewalling.

ChatGPT, Gemini, GenAI Tools Vulnerable to Man-in-the-Prompt Attacks

Сyber Security News, July 31, 2025

Malicious browser extensions can silently inject prompts, exfiltrate AI outputs, and manipulate corporate data.

LayerX researchers uncovered a critical vulnerability in popular generative AI platforms, including ChatGPT and Google Gemini, that enables browser extensions to exploit DOM-level access and inject prompts without any special permissions. Dubbed Man-in-the-Prompt, the attack allows adversaries to read and write prompt content in real time, steal outputs, and even erase histories—all invisibly within the user session. Proof-of-concept attacks demonstrated prompt tampering in ChatGPT via background tabs, and silent Gemini Workspace access even when the sidebar was closed. With 99% of enterprise users running at least one browser extension, the threat is widespread and largely invisible to traditional security tools like DLP or SWG.

How to deal with it:
— Monitor AI-related DOM interactions and browser-layer behavior to detect prompt injection attempts in real time.
— Apply dynamic risk scoring to browser extensions, going beyond static permission checks.
— Isolate internal LLMs from insecure browsers or implement hardened front-end controls against input tampering.

Written by: ADMIN

Rate it

Previous post