This month’s top resources cover highly concerning developments, including a novel IICL technique that bypassed GPT-5.4 safety guardrails and the real-world implications of Anthropic’s Mythos model successfully executing a 32-step corporate network attack. Adversaries can now automate their red teaming and uncover fundamental flaws in vector databases. However, defenders are also making progress in practical framework development and fundamental approaches to attack detection.
Statistics:
Total resources: 22
Category breakdown:
GenAI security resources:
Research
We broke GPT-5.4 safety with 10 examples and 2 words using a new attack technique — IICL
Adversa AI researchers managed to bypass GPT-5.4 safety guardrails using a novel technique called Involuntary In-Context Learning (IICL). The attacker’s success rate reached 60%, exploiting a vulnerability introduced in recent model updates.
Stealthy backdoor attacks against LLMs based on natural style triggers
The BadStyle framework demonstrates how to embed imperceptible writing-style triggers into LLMs during fine-tuning. These triggers preserve semantics and fluency while acting as a reliable backdoor.
Can you trust the vectors in your vector database? Black-Hole attack
Vector databases powering RAG systems are fundamentally vulnerable to a geometric poisoning attack. This exploit can contaminate up to 99.85% of query results with just 1% injected vectors.
STACK: adversarial attacks on LLM safeguard pipelines
The STACK methodology systematically defeats safeguard pipelines by attacking each component sequentially. This approach achieves a 71% attack success rate in black-box environments.
AB jailbreaking – a novel hybrid framework for exploitation of large language models
The AB-JB approach is a three-stage hybrid jailbreak framework combining black-box semantic adversarial prompt generation and white-box suffix optimization. This methodology achieves a 93% average attack success rate against targeted LLMs.
GenAI 101
The future of everything is lies, I guess: safety
A comprehensive essay dismantling four potential defensive moats against unaligned AI systems. The author argues that LLMs cannot safely be given autonomous power in their current state.
Prompt injection, jailbreaks, and LLM security: what every developer building AI apps must know
This comprehensive developer guide covers fundamental risks including prompt injection, data exfiltration, and MCP security. It provides practical defensive strategies for teams building AI applications.
Top 10 vulnerabilities in AI systems on the web
A systematic walkthrough of the 10 most common AI web vulnerabilities. The guide covers everything from basic prompt injection and data leakage to broken authorization.
GenAI defense
Understanding and improving continuous adversarial training for LLMs
The ER-CAT framework adds singular-value-variance regularization to improve adversarial training for LLMs. This technique provides a better robustness-utility tradeoff across six different models.
TwinGate: stateful defense against decompositional jailbreaks in LLMs
TwinGate introduces a dual-encoder defense mechanism using Asymmetric Contrastive Learning. It achieves high recall with less than a 0.2% false positive rate on a massive 3.62M-request dataset.
Seven cross-domain techniques for prompt injection detection
Researchers adopted seven detection techniques from forensic linguistics, bioinformatics, and network security to identify prompt injections. The local-alignment detector significantly improved the F1 score from 0.033 to 0.378.
GenAI red teaming
Toward trustworthy chatbots: red teaming protocol for health
This Nature paper proposes a specialized three-pillar red teaming framework for healthcare chatbots. It utilizes error stratification and dual-pronged testing to ensure medical safety standards.
Automated LLM red teaming gets a learning layer
Researchers propose Adaptive Instruction Composition, a contextual bandit reinforcement learning layer designed for AI red teaming. This technique doubles the WildTeaming attack success rate by dynamically adapting to the target model.
GenAI for CISO
The AI vulnerability storm: building a Mythos-ready security program (PDF)
A 30-page strategy briefing from CSA, SANS, and OWASP that outlines how to prepare for advanced AI agents. It maps emerging risks to the OWASP LLM, MITRE ATLAS, and NIST CSF 2.0 frameworks.
AI-driven exploitation is here: what Mythos proved and what comes next
Anthropic’s Mythos model successfully completed a 32-step corporate network attack autonomously in just hours. This analysis highlights that AI-driven exploitation is not exclusive to Mythos, putting existing AI systems at immediate risk.
Article
The moat is a config file: leaked system prompts analysis
This post analyzes leaked system prompts from major providers including OpenAI, Anthropic, and Google. It demonstrates how exposed tool schemas define the attack surfaces of these frontier models.
Paper highlights of February & March 2026 – AI safety at the frontier
A curated digest of seven major AI safety papers covering the latest academic developments. Topics range from alignment auditing and data poisoning to jailbreaks of Constitutional Classifiers.
GenAI threat model
Securing Retrieval-Augmented Generation: a taxonomy of attacks, defenses, and future directions
RAG systems introduce entirely novel security risks beyond standard LLM deployments. This paper establishes an operational boundary and organizes the literature around four distinct security surfaces.
GenAI attack technique
Silencing the guardrails: inference-time jailbreaking via dynamic refusal subspace manipulation
LLM safety alignment proves fragile as refusal behaviors in low-rank subspaces can be surgically ablated at inference time. The CRA framework achieves a 15.2x improvement over baselines in bypassing restrictions.
GenAI attack
CVE-2026-39423 – Eval injection in AI chat Markdown rendering
This Eval Injection vulnerability in MaxKB AI assistant’s Markdown rendering enables Stored XSS. The flaw carries a CVSS 4.0 score of 6.9 MEDIUM.
Framework
AI and Large Language Models companion guide
The Center for Internet Security released an official companion guide mapping the CIS Controls v8.1 to AI/LLM-specific security requirements. It provides actionable guidance across the full AI lifecycle.
Report
6 AI security incidents: full attack path analysis (April 2026)
This report details six real-world AI security incidents that occurred over a 15-day span. It includes full attack paths and MITRE ATT&CK technique references along with a defensive playbook.
Prepare for autonomous AI exploitation
The transition from theoretical prompt injections to autonomous, multi-step attacks fundamentally changes the generative AI threat landscape. Security teams must move beyond static defenses and implement continuous, stateful monitoring architectures to protect their AI pipelines from dynamic manipulation and automated red-teaming frameworks.