Top GenAI security resources — April 2026

GenAI Security + GenAI Security Digest Sergey todayApril 8, 2026

Background
share close

Last week proved that AI systems now officially contribute to critical infrastructure vulnerabilities. The unprecedented LiteLLM supply chain compromise exposes how adversaries are targeting the underlying routing layers of AI deployments. Coupled with the emergence of context window poisoning in massive 128K+ token contexts and the active operationalization of AI by state-sponsored threat actors, defending these systems now requires defense in depth approaches and extreme vigilance. “LLM firewalls” are not enough.

Statistics:

Total resources: 19
Category breakdown:

Category Count
Technique 3
Exploitation 2
GenAI research 2
GenAI defense 2
AI Red Teaming 2
GenAI threat model 2
Report 2
Incident 1
GenAI vulnerability 1
GenAI 101 1
Video 1

GenAI security resources:

Technique

Defamiliarization attack: Literary theory enabled discussion of LLM jailbreaking

This paper introduces a novel multi-turn LLM jailbreaking technique called “Defamiliarization” based on literary theory. By altering the conceptual framing, this opens a new class of attacks capable of bypassing standard safety training parameters.

Document poisoning in RAG systems

Demonstrating the fragility of retrieval-augmented generation pipelines, attackers achieved a 95% success rate through RAG poisoning with vocabulary-engineered documents. However, the study shows that implementing robust embedding anomaly detection successfully reduced the attack effectiveness to 20%.

Insecure output handling: SQL injection through LLM output

Analyzing three Hack The Box (HTB) lab scenarios, this post explores critical guardrail bypass and data exfiltration risks. It practically demonstrates how attackers can achieve complete database manipulation via LLM-generated SQL injection vulnerabilities.

Exploitation

ChatGPT data leakage via a hidden outbound channel

Researchers discovered a hidden DNS tunneling channel residing directly within ChatGPT’s code execution sandbox. This critical vulnerability allowed domain name resolution to be weaponized as a covert transport for silently exfiltrating sensitive user data to external servers.

AI prompt injection & coupon bypass in a live e-commerce system

This case study breaks down a sophisticated, multi-step exploit targeting a live e-commerce chatbot platform. The attacker successfully chained context-window flooding and developer impersonation with indirect prompt injection via RAG to bypass restricted coupon logic.

GenAI research

Revealing the intrinsic ethical vulnerability of aligned large language models

Researchers formally prove that current alignment methods in artificial intelligence only create highly localized, superficial safety regions rather than generalized defense. The empirical study demonstrated a staggering 100% attack success rate across 22 of the 26 evaluated LLMs.

PIDP-Attack: Combining prompt injection with database poisoning

This academic paper proposes devastating compound attacks that combine direct prompt injection with targeted database poisoning methods. Against standard RAG systems, the PIDP-Attack achieved an alarming 98.125% attack success rate across three recognized benchmarks and eight different models.

GenAI defense

Harmonizing safety and utility in LVLMs via inference-time feature steering

The newly proposed TBOP method projects model representations onto the null space of a specified nuisance subspace to reduce jailbreak viability. It dramatically cuts the attack success rate down to 2.56% while actively improving overall model utility and operating 60x faster than the next-best defense mechanism.

Prompt injection still beats production LLMs

A detailed engineering walkthrough on training a robust “LLM firewall” by fine-tuning the Ministral-3B architecture to act as an asynchronous safety judge. The authors successfully baked the defense into the model weights using a dedicated SFT and GRPO pipeline spanning over 8,300 adversarial prompts.

AI Red Teaming

We built an AI agent that breaks AI defenses. It ranked top globally.

This post details an autonomous AI agent that achieved top-three ranking in the global Gandalf CTF by mathematically predicting vulnerabilities before executing any attack. It reveals the core methodology, competitive results, and the critical questions organizations must ask about defenses in current production systems.

Red-teaming medical AI: Systematic adversarial evaluation

Addressing severe safety risks in modern healthcare environments, this study develops a comprehensive taxonomy featuring eight adversarial attack categories and 24 sub-strategies. An authority impersonation strategy notably achieved an 83.3% success rate against medical AI deployments.

GenAI threat model

A systematic review of threats and mitigations to real-world LLM systems

By rigorously evaluating 384 separate references, researchers have generated a structured taxonomy of known vulnerabilities in real-world deployments. The identified threats are carefully classified by the CIA triad and effectively mapped to severity levels using the proven STRIDE methodology.

Towards secure RAG: A comprehensive review of threats, defenses and benchmarks

This technical paper represents the very first end-to-end academic survey mapping the entirety of the retrieval-augmented generation security pipeline. It equips engineers with a dual-perspective taxonomy of defenses and highlights the latest standardized benchmarking protocols.

Report

The foundry problem: World models and missing liability framework

Stanford Law examines the currently missing legal and accountability frameworks for foundational models trained exclusively via self-supervised learning methods. The resulting proposal outlines a rigorous liability taxonomy distinguishing structural defects within architectures from purely instructional flaws.

AI as tradecraft: How threat actors operationalize AI – Microsoft

Microsoft Threat Intelligence released crucial documentation outlining how state-sponsored threat actors are actively and efficiently weaponizing artificial intelligence tools today. The report systematically tracks the operationalization of AI across the entire cyberattack lifecycle.

Incident

Your AI gateway was a backdoor: Inside the LiteLLM supply chain compromise

While this digest primarily focuses on LLM-specific flaws, the sheer scale of the LiteLLM compromise by the malicious group TeamPCP fundamentally mandates its inclusion. This deep technical analysis covers the widespread supply chain compromise spanning critical distribution nodes like PyPI, npm, and Docker Hub.

GenAI vulnerability

Context window poisoning: Long-context LLM attacks in 128K+

Context window poisoning is rapidly emerging as one of the most critically under-defended operational vulnerabilities in live production environments. Attackers can now reliably embed malicious instructions deep within massive 128K+ token document contexts to hijack generation.

GenAI 101

Prompt injection: The #1 LLM security threat every developer must know

A highly practical development guide exploring the underlying mechanics of prompt injection alongside its growing list of real-world CVEs. Developers can immediately utilize the provided six-layer defense architecture accompanied by production-ready TypeScript code examples.

Video

Claudini: When AI designs its own jailbreaks

An autonomous adversarial AI agent known as Claudini practically demonstrates how automated systems can programmatically craft their own complex security bypasses. The agent successfully designs jailbreaks achieving a 100% success rate in testing scenarios where human-crafted methods peaked at only 56%.

Audit the entire AI stack, not just the prompt

The LiteLLM incident and multi-step RAG exploitations prove that modern adversaries are sometimes bypassing the LLM level entirely to target the surrounding architecture and data gateways. Stop relying solely on semantic filters or weak alignment guardrails. Instead, enforce robust anomaly detection, strict input sanitization, rigorous least-privilege principles across all your agentic workflows, and validate these defenses hold using continuous, autonomous security assessment.

Written by: Sergey

Rate it
Previous post