Top GenAI security resources — May 2026

GenAI Security + GenAI Security Digest Sergey todayMay 11, 2026

Background
share close

This month’s top resources cover highly concerning developments, including a novel IICL technique that bypassed GPT-5.4 safety guardrails and the real-world implications of Anthropic’s Mythos model successfully executing a 32-step corporate network attack. Adversaries can now automate their red teaming and uncover fundamental flaws in vector databases. However, defenders are also making progress in practical framework development and fundamental approaches to attack detection.

Statistics:

Total resources: 22
Category breakdown:

Category Count
Research 5
GenAI 101 3
GenAI defense 3
GenAI red teaming 2
GenAI for CISO 2
Article 2
GenAI threat model 1
GenAI attack technique 1
GenAI attack 1
Framework 1
Report 1

GenAI security resources:

Research

We broke GPT-5.4 safety with 10 examples and 2 words using a new attack technique — IICL

Adversa AI researchers managed to bypass GPT-5.4 safety guardrails using a novel technique called Involuntary In-Context Learning (IICL). The attacker’s success rate reached 60%, exploiting a vulnerability introduced in recent model updates.

Stealthy backdoor attacks against LLMs based on natural style triggers

The BadStyle framework demonstrates how to embed imperceptible writing-style triggers into LLMs during fine-tuning. These triggers preserve semantics and fluency while acting as a reliable backdoor.

Can you trust the vectors in your vector database? Black-Hole attack

Vector databases powering RAG systems are fundamentally vulnerable to a geometric poisoning attack. This exploit can contaminate up to 99.85% of query results with just 1% injected vectors.

STACK: adversarial attacks on LLM safeguard pipelines

The STACK methodology systematically defeats safeguard pipelines by attacking each component sequentially. This approach achieves a 71% attack success rate in black-box environments.

AB jailbreaking – a novel hybrid framework for exploitation of large language models

The AB-JB approach is a three-stage hybrid jailbreak framework combining black-box semantic adversarial prompt generation and white-box suffix optimization. This methodology achieves a 93% average attack success rate against targeted LLMs.

GenAI 101

The future of everything is lies, I guess: safety

A comprehensive essay dismantling four potential defensive moats against unaligned AI systems. The author argues that LLMs cannot safely be given autonomous power in their current state.

Prompt injection, jailbreaks, and LLM security: what every developer building AI apps must know

This comprehensive developer guide covers fundamental risks including prompt injection, data exfiltration, and MCP security. It provides practical defensive strategies for teams building AI applications.

Top 10 vulnerabilities in AI systems on the web

A systematic walkthrough of the 10 most common AI web vulnerabilities. The guide covers everything from basic prompt injection and data leakage to broken authorization.

GenAI defense

Understanding and improving continuous adversarial training for LLMs

The ER-CAT framework adds singular-value-variance regularization to improve adversarial training for LLMs. This technique provides a better robustness-utility tradeoff across six different models.

TwinGate: stateful defense against decompositional jailbreaks in LLMs

TwinGate introduces a dual-encoder defense mechanism using Asymmetric Contrastive Learning. It achieves high recall with less than a 0.2% false positive rate on a massive 3.62M-request dataset.

Seven cross-domain techniques for prompt injection detection

Researchers adopted seven detection techniques from forensic linguistics, bioinformatics, and network security to identify prompt injections. The local-alignment detector significantly improved the F1 score from 0.033 to 0.378.

GenAI red teaming

Toward trustworthy chatbots: red teaming protocol for health

This Nature paper proposes a specialized three-pillar red teaming framework for healthcare chatbots. It utilizes error stratification and dual-pronged testing to ensure medical safety standards.

Automated LLM red teaming gets a learning layer

Researchers propose Adaptive Instruction Composition, a contextual bandit reinforcement learning layer designed for AI red teaming. This technique doubles the WildTeaming attack success rate by dynamically adapting to the target model.

GenAI for CISO

The AI vulnerability storm: building a Mythos-ready security program (PDF)

A 30-page strategy briefing from CSA, SANS, and OWASP that outlines how to prepare for advanced AI agents. It maps emerging risks to the OWASP LLM, MITRE ATLAS, and NIST CSF 2.0 frameworks.

AI-driven exploitation is here: what Mythos proved and what comes next

Anthropic’s Mythos model successfully completed a 32-step corporate network attack autonomously in just hours. This analysis highlights that AI-driven exploitation is not exclusive to Mythos, putting existing AI systems at immediate risk.

Article

The moat is a config file: leaked system prompts analysis

This post analyzes leaked system prompts from major providers including OpenAI, Anthropic, and Google. It demonstrates how exposed tool schemas define the attack surfaces of these frontier models.

Paper highlights of February & March 2026 – AI safety at the frontier

A curated digest of seven major AI safety papers covering the latest academic developments. Topics range from alignment auditing and data poisoning to jailbreaks of Constitutional Classifiers.

GenAI threat model

Securing Retrieval-Augmented Generation: a taxonomy of attacks, defenses, and future directions

RAG systems introduce entirely novel security risks beyond standard LLM deployments. This paper establishes an operational boundary and organizes the literature around four distinct security surfaces.

GenAI attack technique

Silencing the guardrails: inference-time jailbreaking via dynamic refusal subspace manipulation

LLM safety alignment proves fragile as refusal behaviors in low-rank subspaces can be surgically ablated at inference time. The CRA framework achieves a 15.2x improvement over baselines in bypassing restrictions.

GenAI attack

CVE-2026-39423 – Eval injection in AI chat Markdown rendering

This Eval Injection vulnerability in MaxKB AI assistant’s Markdown rendering enables Stored XSS. The flaw carries a CVSS 4.0 score of 6.9 MEDIUM.

Framework

AI and Large Language Models companion guide

The Center for Internet Security released an official companion guide mapping the CIS Controls v8.1 to AI/LLM-specific security requirements. It provides actionable guidance across the full AI lifecycle.

Report

6 AI security incidents: full attack path analysis (April 2026)

This report details six real-world AI security incidents that occurred over a 15-day span. It includes full attack paths and MITRE ATT&CK technique references along with a defensive playbook.

Prepare for autonomous AI exploitation

The transition from theoretical prompt injections to autonomous, multi-step attacks fundamentally changes the generative AI threat landscape. Security teams must move beyond static defenses and implement continuous, stateful monitoring architectures to protect their AI pipelines from dynamic manipulation and automated red-teaming frameworks.

Written by: Sergey

Rate it
Previous post