Research

May 26, 2026

Agentic AI Security + Research Rony Utevsky

SymJack: the approval prompt is lying to you. A symlink-hijack RCE in six AI coding agents

A SynJack attack tricks AI coding assistants into RCE through a symlink-disguised file copy. We tested six major tools. All were vulnerable. How it works and how to defend.

May 7, 2026

Agentic AI Security + Research Rony Utevsky

TrustFall: coding agent security flaw enables one-click RCE in Claude, Cursor, Gemini CLI and GitHub Copilot

A regression in the Claude Code trust dialog and a settings-scope inconsistency let a cloned repo run unsandboxed code with one keypress, and on CI runners with none. Learn why this type of issues keep surfacing and what are possible mitigations.

IICL involuntary in-context learning attack technique

April 23, 2026

Research + LLM Security admin

We broke GPT-5.4 safety with 10 examples and 2 words using a new attack technique — IICL

OpenAI’s newest flagship is more vulnerable to our attack than GPT-5 or GPT-5-mini. Newer doesn’t mean safer. Our new research (3,500+ probes, 10 models, 7 controlled experiments) shows why continuous red teaming isn’t optional for anyone building on frontier AI. TL;DR We ran 3,500+ controlled probes across every model in ...

Agentic AI Security + Research admin / April 2, 2026

Critical Claude Code vulnerability: Deny rules silently bypassed because security checks cost too many tokens

Adversa AI red team found Claude Code’s deny rules silently stop working after 50 subcommands. The fix exists in Anthropic’s codebase. They never shipped it

February 6, 2026

GenAI Security + Research admin

Revealing Claude 4.6 system prompt using a chain of partial-to-full prompt leak attack

How we extracted the Opus 4.6 system prompt the day after its release and what we learned about the model’s security constraints and guardrails.

September 11, 2025

Research admin

AI Reasoning Leakage Vulnerability: Self-betrayal attack on UAE MBZUAI G42 K2 Think

AI Reasoning Leakage Vulnerability: Self-betrayal attack UAE MBZUAI G42 K2 Think Executive Summary A critical vulnerability has been identified in advanced reasoning system of just released latest reasoning model by UAE’s Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) in collaboration with G42 where the model’s internal thought process inadvertently exposes ...

August 19, 2025

2556

Article + Research admin

PROMISQROUTE: GPT-5 AI Router Novel Vulnerability Class Exposes the Fatal Flaw in Multi-Model Architectures

Executive Summary for CISO Security researchers from Adversa AI discovered that ChatGPT 5 have a fatal flaw: they can route your requests to cheaper, less secure models to save money. Attackers can exploit this to bypass AI security and safety measures with just a few words. What Is PROMISQROUTE? When ...

February 18, 2025

18258
1

1

Research + LLM Security admin

Grok 3 Jailbreak and AI red Teaming

Grok 3 Jailbreak and AI Red Teaming In this article, we will demonstrate how Grok 3 respond to different hacking techniques including Jailbreaks and Prompt leaking attacks. Our initial study on AI Red Teaming different LLM Models using various approaches focused on LLM models released before the so-called “Reasoning Revolution”, ...

February 4, 2025

7025

Research + LLM Security admin

AI Red Teaming Reasoning LLM US vs China: Jailbreak Deepseek, Qwen, O1, O3, Claude, Kimi

Warning, Some of the examples may be harmful!: The authors of this article show LLM Red Teaming and hacking techniques but have no intention to endorse or support any recommendations made by AI Chatbots discussed in this post. The sole purpose of this article is to provide educational information and ...

SymJack: the approval prompt is lying to you. A symlink-hijack RCE in six AI coding agents

TrustFall: coding agent security flaw enables one-click RCE in Claude, Cursor, Gemini CLI and GitHub Copilot

We broke GPT-5.4 safety with 10 examples and 2 words using a new attack technique — IICL

Critical Claude Code vulnerability: Deny rules silently bypassed because security checks cost too many tokens

Revealing Claude 4.6 system prompt using a chain of partial-to-full prompt leak attack

AI Reasoning Leakage Vulnerability: Self-betrayal attack on UAE MBZUAI G42 K2 Think

PROMISQROUTE: GPT-5 AI Router Novel Vulnerability Class Exposes the Fatal Flaw in Multi-Model Architectures

Grok 3 Jailbreak and AI red Teaming

AI Red Teaming Reasoning LLM US vs China: Jailbreak Deepseek, Qwen, O1, O3, Claude, Kimi

Trusted AI Security

Explore Our Blog

Featured Post

Universal LLM Jailbreak: ChatGPT, GPT-4, BARD, BING, Anthropic, and Beyond

Latest Posts

SymJack: the approval prompt is lying to you. A symlink-hijack RCE in six AI coding agents

OWASP ASI02: tool misuse and exploitation — the definitive security guide

AI risk management insurance is tightening. Cyber insurance history shows exactly where it ends up.

Top GenAI security resources — May 2026

TrustFall: coding agent security flaw enables one-click RCE in Claude, Cursor, Gemini CLI and GitHub Copilot

Top MCP security resources — May 2026

Top Agentic AI security resources — May 2026

AI-driven exploitation is here: what Mythos proved and what comes next

We broke GPT-5.4 safety with 10 examples and 2 words using a new attack technique — IICL

Red teaming agentic AI: should you go manual, in-house, or continuous?

Adversa AI wins Artificial Intelligence Excellence award in Safety and Alignment category

OWASP ASI01 — Agent Goal Hijack: a practical security guide