Revealing Claude 4.6 system prompt using a chain of partial-to-full prompt leak attack
How we extracted the Opus 4.6 system prompt the day after its release and what we learned about the model’s security constraints and guardrails.
GenAI Security + GenAI Security Digest Sergey todayFebruary 9, 2026
As we settle into 2026, the theoretical risks of Generative AI are rapidly materializing into tangible security incidents. This month’s digest highlights a wide variety of attacks, from academic research to practical exploits, particularly regarding indirect prompt injection attacks targeting integrated systems like Google Gemini and Perplexity. We are also seeing a maturation of defense standards, with new frameworks providing concrete guidance for securing agentic workflows against increasingly sophisticated threats.
Total resources: 41
Category breakdown:
| Category | Count |
|---|---|
| Attack techniques | 7 |
| Videos | 7 |
| Article | 6 |
| GenAI security 101 | 6 |
| Training materials | 6 |
| CISO reading | 2 |
| GenAI report | 2 |
| GenAI research | 2 |
| GenAI vulnerability | 2 |
| Tool | 1 |
This research paper presents Paraphrasing Adversarial Attack (PAA), a black-box optimization technique. It demonstrates how to successfully manipulate LLM-based peer review systems to yield higher scores through semantic-preserving paraphrases.
Academic research demonstrating working end-to-end indirect prompt injection attacks in RAG and agentic systems. The paper presents an attack algorithm ensuring retrieval of malicious content, achieving over 80% success in SSH key exfiltration through GPT-4o in multi-agent workflows.
A proof-of-concept demonstrating an indirect prompt injection attack against LLM-powered email agents, specifically Perplexity Comet AI. The author shows how malicious email content can hijack an AI email assistant with Gmail access to exfiltrate sensitive inbox data.
This academic paper introduces the Stochastic Response Backdoor (SRB) attack against the RLVR training paradigm. It demonstrates that injecting poisoned data with triggers can manipulate LLMs to produce harmful responses with equal probability using only 200 poisoned samples.
Penetration testing findings from Predatech demonstrating prompt injection in an LLM chatbot. The post shows exploitation using Burp Suite to modify exposed system prompts and achieve unintended behavior.
Security researchers from Miggo discovered an indirect prompt injection vulnerability in Google Gemini that bypasses authorization controls. The exploit allowed attackers to evade multiple defense layers to access sensitive meeting data.
This article describes a new jailbreak technique called “semantic chaining” that exploits how AI models evaluate modifications to existing content. The technique uses a four-step process to trick AI models into generating malicious outputs by splitting the request into discrete chunks.
A Black Hat webinar examining the evolution of prompt injection attacks into a five-stage kill chain. It covers the progression from initial access to full remote code execution, including evasion techniques across multiple modalities.
A YouTube playlist covering essential enterprise AI security and governance topics. These videos include an AI security checklist for organizations implementing large language models.
An educational video demonstrating prompt injection in the OWASP Juice Shop chatbot challenge. It provides a visual guide on how hackers jailbreak AI interfaces in a controlled environment.
This video covers Google’s SAIF approach to prompt injection defense. It details strategies for mitigating indirect prompt injection and data poisoning threats within the framework.
A video explaining what prompt injection is and when a threat actor inserts instructions into text to manipulate AI agent behavior. It serves as educational video content about the mechanics of prompt injection attacks.
This video explains model poisoning attacks where hackers insert corrupt or misleading data into AI training datasets. It offers educational content specifically focused on adversarial training data attacks.
Video content providing an in-depth discussion of AI security topics. The session covers threats, vulnerabilities, and defense strategies for AI systems and LLM applications.
A Medium article comparing prompt injection to SQL injection, explaining why indirect prompt injection is particularly concerning for AI systems. It discusses the LLM security inflection point and various defense strategies.
This post explains direct and indirect prompt injection attacks in cloud AI systems like AWS, Azure, and GCP. It details methods attackers use to hijack AI behavior through hidden instructions in data targeting SaaS AI applications.
A business-focused overview of AI security risks and best practices. It covers data protection, system security, and building trust in AI systems with practical guidance for organizations adopting these technologies.
A real-world experience report about prompt injection attacks that only became apparent in production. The discussion describes how users actively attempted to jailbreak the system once deployed, highlighting the gap between testing and real-world security challenges.
In-depth analysis by Bruce Schneier explaining why LLMs are fundamentally vulnerable to prompt injection attacks. He explores the security trilemma for AI agents and questions whether current LLM architectures can ever be fully secured against prompt injection.
Industry insiders have launched the Poison Fountain project to deliberately poison AI training data through malicious code samples. This is based on Anthropic research showing how few poisoned documents are needed to degrade model quality significantly.
An educational guide differentiating prompt injection from jailbreak attacks. The article provides examples and defense strategies to help practitioners understand the nuances between these two attack types.
A comprehensive explanation of data poisoning attacks targeting AI/ML systems in government. It covers prevention strategies including data governance, versioning, and trusted data sources to mitigate consequences like accuracy drops.
Palo Alto Networks provides an educational article explaining prompt injection attacks with examples and prevention methods. It covers direct and indirect prompt injection, differences from jailbreaking, and defense strategies.
An educational article explaining the distinction between AI security and AI safety. It covers the threat landscape including prompt injection, data poisoning, model inversion, and evasion attacks.
A comprehensive guide covering 14 AI security risks relevant to the current landscape. Topics include data poisoning, model inversion, adversarial examples, and backdoor attacks.
A guide to five essential AI security frameworks: OWASP LLM Top-10, NIST AI RMF 1.0, MITRE ATLAS, Google SAIF, and ISO/IEC 42001. It includes implementation guidance and practical advice for resource-constrained organizations.
A collection of articles on dev.to covering practical prompt injection attacks and AI security topics. It includes hands-on examples like 3 prompt injection attacks you can test right now.
PortSwigger Web Security Academy resource covering LLM attack vectors including exploiting LLM APIs with excessive agency. It includes methodology for detecting vulnerabilities and mapping attack surfaces.
PortSwigger Web Security Academy hands-on lab for practicing indirect prompt injection attacks. This interactive training environment allows learners to execute indirect prompt injection techniques and understand defense mechanisms.
A comprehensive offline, paid technical training course on AI systems security. The curriculum covers prompt injection, data poisoning, model theft, and cross-modal exploits.
A hands-on walkthrough of Lakera’s Gandalf Challenge demonstrating prompt injection techniques across 8 levels. It emphasizes practical pentesting skills for AI systems using obfuscation and social engineering.
A highly technical tutorial on implementing multilingual, semantic guardrails using Quarkus and LangChain4j. It provides complete code examples showing how ONNX approaches can be significantly faster than HTTP-based solutions.
A CIO perspective on preparing security teams to work alongside AI. It discusses governance frameworks and the importance of making employees co-designers of AI-enabled workflows while balancing innovation with data protection.
The WEF’s comprehensive outlook shows that 94% view AI as the most significant cybersecurity driver. The report covers agentic AI adoption, AI-enabled cybercrime, and supply chain risks.
A comprehensive guide explaining how training data can be manipulated to alter model behavior. It covers defense strategies including data vetting and monitoring against scenarios like IP theft prevention tools and criminal activity.
Zscaler’s 2026 AI threat report shows a 91% year-over-year surge in AI activity. The report highlights that enterprise AI systems are increasingly vulnerable to breach at machine speed.
Technical research analyzing the limitations of encoder-based classifiers and small language models in detecting prompt injection. The author argues that detection-based approaches treat symptoms rather than causes.
Radware researchers created the ZombieAgent exploit demonstrating persistent threats in LLMs. The research shows how ChatGPT memory and connector features can be weaponized for persistent indirect prompt injection attacks.
Google Gemini was found to have an indirect prompt injection vulnerability allowing unauthorized access to private calendar data. The flaw enabled attackers to bypass authorization controls through malicious meeting invites.
Analysis of a prompt injection vulnerability in Perplexity’s BrowseSafe feature. This discovery demonstrates the limitations of single-layer defense mechanisms in AI systems.
An open-source local security lab environment for testing LLM and AI agent vulnerabilities. This tool provides hands-on testing capabilities for security researchers.
The era of trusting LLM outputs by default is over; organizations must move beyond simple prompt filters and single-layer defenses. To stay secure in 2026, security teams need to rigorously test agentic workflows against the emerging kill chains highlighted in this month’s research. Prioritizing human oversight and verifiable rewards in training will be critical in mitigating the risks of data poisoning and indirect injection.
Stay up to date with what is happening! Plus, get a first look at news, noteworthy research, and the worst attacks on AI — delivered right to your inbox.
Written by: Sergey
GenAI Security admin
How we extracted the Opus 4.6 system prompt the day after its release and what we learned about the model’s security constraints and guardrails.
Adversa AI, Trustworthy AI Research & Advisory