Towards Secure AI Week 22 — Testing the Limits of Guardrails and Autonomy

Secure AI Weekly ADMIN todayJune 9, 2025 181

Background
share close

AI systems aren’t just generating answers—they’re taking action, reasoning independently, and connecting to real-world systems. This week’s stories highlight how current defenses fail to address these expanded capabilities, revealing critical blind spots in identity management, cross-agent communication, and cloud-based safety infrastructure.

From one-shot jailbreaks and latent-level exploits to insecure identity layers and misconfigured guardrails, it’s clear that reactive defenses are no longer enough. Research at ACL 2025 and new security analyses show that attackers are targeting not just prompts, but the entire AI lifecycle—from input filtering to agent execution.

As LLMs and Agentic AI enter sensitive enterprise and consumer domains, securing them demands more than patchwork filters. It requires systemic testing, architectural hardening, and capability-aware AI Red Teaming that can scale with the threats ahead.

The hidden risks of LLM autonomy

Help Net Security, June 4 2025

A new analysis reveals how large language models (LLMs) are gaining excessive autonomy through emerging interoperability standards like MCP and A2A—creating new security risks.

LLMs are no longer passive tools. With capabilities to access APIs, databases, and external systems, they now make decisions and act with minimal oversight. This growing autonomy—paired with excessive permissions, opaque outputs, and complex Agentic behaviors—can lead to unauthorized actions, data leaks, or even supply chain attacks without compromising the model itself. As LLMs are integrated into critical workflows in healthcare, finance, and beyond, organizations risk developing “process debt” where AI outputs go unchecked, introducing bias, error, and risk. Attackers are already exploiting these behaviors through prompt injections, autonomy abuse, and memory poisoning.

How to deal with it:

— Audit AI agent functionality, permissions, and autonomy to reduce unnecessary risk exposure.
— Deploy continuous AI evaluators to monitor, test, and enforce behavioral boundaries for LLM agents, using platforms like the Adversa AI Continuous AI Red Teaming Platform.
— Follow OWASP and least-privilege design principles to secure agent communication protocols like MCP and A2A.

#Infosec2025: Concern Grows Over Agentic AI Security Risks

InfoSecurity Magazine, June 4 2025

This story summarizes the key discussions from Infosecurity Europe 2025, where security leaders raised urgent concerns about the growing risks of Agentic AI systems.

Agentic AI—AI tools that operate independently and communicate with each other—are no longer theoretical. They’re being used in code development, customer support, and infrastructure automation, often without mature governance or visibility. EY’s research shows that while 76% of companies already use or plan to use Agentic AI, only 31% have mature AI implementations and just 56% understand the risks. When AI agents pass manipulated or inaccurate data to one another, small flaws can escalate into major failures—especially if connected to external, untrusted data sources. Experts now warn that the speed of Agentic AI deployment is outpacing security controls, and gaps between agents pose real threats.

How to deal with it:

— Build an intermediate AI security layer to oversee data ingestion, agent-to-agent exchanges, and external inputs.
— Perform AI Red Teaming and require AI bills of materials (AI BoMs) to track model components, dependencies, and access points.
— Secure all APIs that connect agents and systems, treating them as critical parts of the AI supply chain.

An identity security crisis looms in the age of Agentic AI

SC Media, June 3 2025

This expert commentary warns that Agentic AI is repeating—and accelerating—the identity security failures seen during the RPA (robotic process automation) wave.

Unlike RPA bots that often shared credentials, Agentic AI systems act autonomously, access sensitive data, and interact with other agents—yet most organizations still lack identity frameworks tailored for them. Without unique identities and kill-switch capabilities, it’s nearly impossible to monitor, control, or stop misbehaving agents. Standards like SPIFFE (Secure Production Identity Framework For Everyone) offer a promising foundation, but adoption remains limited. Alarmingly, 68% of companies lack any security controls for AI agents, and nearly half can’t track shadow AI activity.

How to deal with it:

— Assign a unique and verifiable identity to every AI agent, using standards like SPIFFE to prevent impersonation and shared credentials.
— Enforce five key controls: zero standing privileges, continuous monitoring, step-up authentication, behavioral analytics, and identity-based kill switches.
— Involve security architects from day one in AI agent development to ensure secure-by-design systems and prepare for future regulatory scrutiny.

TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems

arXiv, June 4 2025

A new research review explores how to apply TRiSM (Trust, Risk, and Security Management) principles to Agentic AI systems built from large language models operating in multi-agent setups.

The paper provides a structured framework for governance, explainability, ModelOps, and privacy/security tailored specifically to agentic LLMs. It introduces a risk taxonomy for multi-agent environments, outlines real-world vulnerabilities, and surveys trust-building and oversight mechanisms—highlighting the challenges of managing autonomous, tool-using AI agents at scale. It also evaluates explainability, human-centered metrics, and adversarial defenses across distributed systems.

How to deal with it:
— Map your Agentic AI systems to the TRiSM framework, covering governance, security, and explainability pillars.
— Use the included risk taxonomy and case studies to assess your system’s vulnerabilities and blind spots.
— Align future deployments with proposed TRiSM benchmarks and metrics to ensure accountability and safety.

Explaining LLM Insecurity: Why We Can Jailbreak Every Major Model

CDO Trends, June 2 2025

CyberArk Labs introduced Fuzzy AI, a new framework that can jailbreak nearly every major LLM on the market—from ChatGPT to Claude to Gemini.

This research highlights a critical truth: LLMs weren’t built with security in mind. Despite billions spent on safety, models remain vulnerable to jailbreaks via historical reframing, indirect prompting, and other real-world techniques. As organizations integrate LLMs and Agentic AI into sensitive workflows, the lack of observability, access control, and privilege separation exposes them to potentially catastrophic misuse. The gap between academic defenses and operational security is widening—fast.

How to deal with it
— Restrict LLMs and AI agents to low-trust environments unless explicitly secured.
— Use architectural layers (e.g. middleware, kill switches, identity enforcement) to prevent autonomous overreach.
— Continuously AI Red Team both the chatbot and its Agentic extensions using real-world jailbreak methods like those in Fuzzy AI, supported by solutions such as the Adversa AI Continuous AI Red Teaming Platform.

AIM Intelligence: Inside ACL 2025’s “Triple Threat” to Unsafe AI – A Global Alliance of Stanford, AWS, UMich, SNU, Yonsei, KAIST & UOS

Financial Content, June 6 2025

At ACL 2025, a global research team unveiled three groundbreaking papers tackling jailbreaks, latent-level alignment, and Agentic AI attacks — combining academic rigor with real-world AI Red Teaming.

This work shows that leading LLMs can be compromised in a single prompt, manipulated at the representation level, and abused via Agentic frameworks to carry out dangerous actions autonomously. Techniques like “Operation Grandma”-style reframing, REPBEND for deep model alignment, and SUDO for bypassing safeguards in multimodal agents reveal alarming vulnerabilities. Unlike speculative attacks, these methods succeeded in live environments — including adding bomb ingredients to carts and generating explicit content.

How to deal with it
— Evaluate LLM integrations for risk exposure to single-shot jailbreaks and agent execution.
— Implement alignment controls beyond prompts, including representation-level fine-tuning.
— Use open-source tools like REPBEND and SUDO to conduct AI Red Teaming on your own AI systems before attackers do.

Subscribe for updates

Stay up to date with what is happening! Plus, get a first look at news, noteworthy research, and the worst attacks on AI—delivered right to your inbox.

    Written by: ADMIN

    Rate it
    Previous post