Top MCP security resources — March 2026
Explore the top MCP security resources for March 2026, including critical vulnerabilities in Anthropic DXT and emerging attack vectors like API budget drains via overthinking.
GenAI Security + GenAI Security Digest Sergey todayMarch 9, 2026
This February, we saw strong evidence that GenAI attacks are becoming large-scale, production-level operations. From Anthropic’s exposure of massive distillation attacks to Microsoft’s discovery of widespread recommendation poisoning, the threat landscape has shifted toward systemic integrity and data theft — not to mention the use of AI in scaled-up traditional cybercrime. This month’s digest focuses on these industrial-scale incidents and the emerging defense architectures designed to counter them.
Total resources: 22
Category breakdown:
| Category | Count |
|---|---|
| GenAI defense | 4 |
| GenAI research | 3 |
| Report | 3 |
| GenAI Incident | 2 |
| GenAI security 101 | 2 |
| Technique | 2 |
| Tool | 2 |
| GenAI exploitation | 1 |
| Threat Model | 1 |
| AI Red Teaming | 1 |
| Article | 1 |
This paper introduces the ReDAct framework, which focuses on disentangling goal and framing representations in LLM activations. The accompanying FrameShield anomaly detector significantly improves the detection of concealed attacks across various LLM families.
Researchers propose a novel fail-closed alignment defense that forces multiple independent refusal pathways. This approach reduces jailbreak attack success rates by over 92% while maintaining high compliance for benign requests.
The DP-KSA framework provides formal differential privacy guarantees for RAG outputs. It utilizes a propose-test-release paradigm combined with keyword extraction to secure retrieval systems against data leakage.
Johns Hopkins and Microsoft researchers have developed an efficient, reusable framework to evaluate the safety of LLMs before deployment. This approach aims to streamline the testing process while ensuring robust safety checks are in place.
Researchers present the first systematic black-box framework for adversarial memory injection. The study explores content-based and question-targeted attack settings using composable primitives to manipulate long-term LLM memory.
The Claude Opus 4.6 system prompt was extracted and analyzed the day after its release. This research evaluates the security constraints, guardrails, and improvements compared to previous versions.
This study reveals how hybrid RAG systems create a dangerous pivot boundary enabling cross-tenant data leakage. The research demonstrates a 95% leakage probability with significant amplification, suggesting per-hop authorization as a key mitigation.
Google’s quarterly threat intelligence report covers nation-state AI exploitation and model distillation attacks on Gemini. It also highlights APT groups misusing AI and the emergence of new AI-integrated malware.
This report compiles 20 documented AI app data breaches, analyzing their systemic root causes. Common failures include misconfigured Firebase instances, missing Supabase RLS, and hardcoded API keys.
Tenable’s latest research indicates that 18% of organizations have overprivileged AI IAM roles. The report also notes high percentages of inactive non-human identities with excessive permissions.
Anthropic has publicly exposed industrial-scale model distillation attacks by major competitors using fraudulent accounts. The report details over 16 million exchanges used to extract Claude’s capabilities.
Microsoft documented a case of AI recommendation poisoning where 31 companies embedded hidden commands in summarize buttons. These commands were designed to plant lasting preferences into AI assistants’ memory.
This article argues that LLM behavior is governed by statistical context patterns rather than explicit rules, making prompt reframing a fundamental vulnerability. It applies sociological frameworks to help explain the challenges of alignment.
A comprehensive educational overview of AI application security across RAG, agents, and chatbots. The guide includes Python code examples to illustrate vulnerabilities in document processing.
This paper explores how MoE architectures concentrate safety behaviors in experts, creating exploitable vulnerabilities. The authors achieve high attack success rates via expert ablation and adaptive silencing using LSTM-based identification.
Researchers discovered that a single mild unlabeled prompt can unalign 15 different safety-tuned LLMs. The technique, known as GRP-Obliteration, works across all safety categories.
Augustus is a Go-based LLM vulnerability scanner with over 210 adversarial probes. It supports 28 LLM providers and covers attacks ranging from RAG poisoning to agent exploitation.
GuardLLM is a defense-in-depth Python library designed for LLM security. Features include input sanitization, canary token detection, provenance tracking, and outbound DLP.
This post documents multi-turn prompt injection attacks that exploit TOCTOU gaps. It details four attack chains, including system prompt theft and sandbox RCE via base64 indirection.
Microsoft presents a structured methodology for AI threat modeling. Key steps include asset identification, untrusted data flow mapping, and impact-driven prioritization.
This study red-teams ChatGPT and Claude Opus as TEE security advisors. The findings reveal that LLM assistants often hallucinate TEE mechanisms under adversarial prompting, introducing the TEE-RedBench framework.
AWS discusses a multi-agent AI architecture for automated penetration testing. The system achieved a 92.5% attack success rate on CVE Bench.
Real-world attacks on GenAI will combine simple and advanced techniques, from direct prompt injection to architectural exploitation like expert ablation and retrieval pivoting. Defenses must evolve from static guardrails to dynamic, context-aware systems. Organizations must prioritize threat modeling that accounts for these threats and continuously test their defenses using an autonomous AI red teaming platform.
Written by: Sergey
MCP Security Sergey
Explore the top MCP security resources for March 2026, including critical vulnerabilities in Anthropic DXT and emerging attack vectors like API budget drains via overthinking.
Adversa AI, Trustworthy AI Research & Advisory