Top GenAI Security Resources — August 2025

GenAI Security + GenAI Security Digest ADMIN August 20, 2025 669

Explore the Top GenAI Resources to stay informed about the most pressing risks and defenses in the field.

As GenAI becomes deeply integrated into products, workflows, and user-facing systems, attackers are actively exploiting its vulnerabilities. Prompt injections, jailbreaks, unsafe output handling, and compromised integrations are exposing critical gaps in security.

Top GenAI Security Report

Top AI Security Incidents (2025 Edition)

This report exposes 16 real-world AI security breaches, showing how prompt injections, agent misalignment, and flawed APIs cause financial and reputational losses across industries. It provides patterns, insights, and actionable defenses to help AI builders and defenders prevent the same failures from repeating.

OWASP Gen AI Incident & Exploit Round-up, Q2’25

The OWASP GenAI Round-up for Q2 2025 compiles real-world exploits and breaches, from GPT-4.1 tool poisoning and ChatGPT prompt injections to deepfake scams and zero-click attacks. It provides incident details, mapped vulnerabilities, and practical mitigations to help teams strengthen defenses against fast-evolving AI threats.

Top GenAI Security Incident

Researchers are using AI for peer reviews — and finding ways to cheat it

Some researchers are embedding hidden AI prompts in academic papers to trick chatbots into giving positive peer reviews. The practice, uncovered on arXiv and confirmed by multiple studies, shows how prompt injection is distorting scholarly evaluation and exposing risks in using AI for academic review.

Top GenAI Vulnerability

Package hallucination: LLMs may deliver malicious code to careless devs

LLMs are increasingly hallucinating non-existent software packages, creating a new supply chain threat called “slopsquatting,” where attackers register those fake names with malicious code. Researchers found nearly 20% of code samples contained such hallucinations, with many repeating consistently, making it easier for adversaries to exploit careless developers who trust AI-generated code.

Top 5 GenAI Tools Vulnerable to Man-in-the-Prompt Attack, Billions Could Be Affected

LayerX researchers uncovered a “Man-in-the-Prompt” attack, where malicious browser extensions can silently inject or read prompts in GenAI tools like ChatGPT, Gemini, Copilot, Claude, and Deepseek. Because nearly all enterprise users run extensions, this exploit could turn LLMs into data-stealing copilots, exposing billions of users and sensitive corporate information.

Top GenAI Defence

AIDEFEND — An AI-focused defensive countermeasures knowledge base

A defensive knowledge base mapping countermeasures for AI threats across the lifecycle, from hardening and detection to isolation, deception, eviction, and recovery. It helps builders and defenders apply structured mitigations—like prompt injection detection, supply chain security, and model restoration—to strengthen AI resilience against real-world attacks.

Top GenAI Security for CISO

Why CISOs Like Me Don’t Sleep in 2025: What You Must Know About Securing GenAI

This article gives a frontline CISO view of the top GenAI risks in 2025, from sensitive data leakage and shadow AI to weak governance and opaque incident response. It outlines practical guardrails and misuse defenses, stressing that security leaders must move beyond static controls to real-time oversight, cross-functional governance, and proactive adversarial testing.

AI, Risk, and the Road Ahead: Key Findings from the 2025 CISO Village Survey

The 2025 CISO Village Survey shows AI has become the top security priority, with one in four leaders facing AI-generated attacks and nearly 70% already running AI agents in production. CISOs are tightening control of AI tools, preparing for SOC disruption, shifting back to best-of-breed security, and expanding into product security, all while demanding stronger ROI under slower budget growth.

Top GenAI Exploitation Technique

Phishing For Gemini

A researcher demonstrated a “Phishing for Gemini” exploit where hidden HTML/CSS text in emails tricks Google Gemini into appending fake security alerts in summaries, enabling credential theft without links or attachments. This indirect prompt injection highlights how invisible directives bypass guardrails, turning AI email summaries into a new phishing vector that security teams must detect and sanitize.

Top GenAI Prompt Injection Technique

Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks

Meta introduced SecAlign, the first open-source, open-weight LLM with built-in model-level defenses against prompt injection, matching the performance of closed commercial systems. Trained with an improved SecAlign++ recipe, the 70B model achieves state-of-the-art robustness across 7 security benchmarks and general tasks, giving researchers a reproducible foundation to study and mitigate prompt injection attacks.

Securing Agentic AI: How Semantic Prompt Injections Bypass AI Guardrails

NVIDIA’s AI Red Team revealed a new class of semantic prompt injections where attackers use symbolic or visual inputs—like emoji sequences or rebus puzzles—to bypass guardrails in multimodal and agentic AI. By exploiting early fusion architectures such as Llama 4, these attacks manipulate shared latent spaces without visible text, highlighting the need for output-level defenses and cross-modal security strategies.

The Hard Problem of Prompt Injection

This essay frames prompt injection as one of AI’s hardest unsolved problems, comparing it to SQL injection in how it overrides system instructions. It explains why current defenses—filters, output checks, or multi-model voting—are brittle, and argues that only deeper model-level changes and mechanistic interpretability offer hope for reliable long-term mitigation.

Forensic Scenario Prompt Injection Bypasses Guardrails

A researcher earned a $1,200 bug bounty after tricking an AI into revealing step-by-step instructions for producing an illicit substance by framing the request as a forensic training scenario. This case shows how attackers can disguise harmful intent under professional or educational contexts to bypass safeguards and extract restricted knowledge.

Top GenAI Jailbreak

JAILBREAK ALERT: XAI: PWNED

A jailbreak of xAI’s Grok-4 and Grok-4-Heavy has been demonstrated, bypassing guardrails to access restricted reasoning, tool use, and sensitive content. Attackers showcased the ability to generate outputs ranging from copyrighted material to weapon synthesis and malware instructions, highlighting critical risks in inadequate safety controls.

Top GenAI Security Research

Logic-layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems

This research introduces Logic-layer Prompt Control Injection (LPCI), a new class of AI vulnerabilities where malicious payloads are embedded in memory or logic layers and later triggered across sessions. Testing on major platforms showed execution rates up to 49%, revealing that current defenses miss these stealthy, persistent attacks and underscoring the need for memory-aware runtime protections.

Top GenAI Safety Dataset

Prompt-injection-multilingual

This dataset, prompt-injection-multilingual, provides over 7,000 labeled text samples in multiple languages for training and evaluating prompt injection detection models. It includes both benign and malicious inputs, making it useful for building and testing multilingual AI security defenses.

Top GenAI Security Training

AI Red Teaming 101 – Full Course

This AI Red Teaming 101 series is a beginner-friendly introduction to the fundamentals of AI red teaming and generative AI security. Featuring experts from Microsoft and ADAPT, it walks you through real-world attack techniques like prompt injection and jailbreaks, hands-on use of tools like PyRIT, and proven defense strategies to help secure AI systems in practice.

Top GenAI Security Framework

Introducing the Databricks AI Governance Framework

The Databricks AI Governance Framework (DAGF) provides enterprises with a structured model for responsible AI adoption, built around 5 pillars and 43 key considerations. It offers practical guidance on risk management, compliance, ethics, and monitoring, helping organizations scale AI securely while maintaining transparency, trust, and regulatory alignment.

Top GenAI Security Developer Guide

Harder, Better, Prompter, Stronger: AI system prompt hardening

The AI System Prompt Hardening Guide explains how to secure system prompts against evolving threats like injection, override, and leakage. It introduces defensive techniques such as instruction shielding, syntax reinforcement, and layered prompting, with hands-on evaluations using Promptfoo to help developers build more resilient and trustworthy AI applications.

Top GenAI Security 101

The Rise of LLM Injection Attacks: A New Frontier in Application Security

The Rise of LLM Injection Attacks article warns that prompt injection, data poisoning, and context manipulation are becoming critical security risks as organizations adopt large language models. It highlights how these attacks exploit the lack of strict boundaries between logic and data, compares their impact to SQL injection, and calls for secure-by-design defenses like prompt hardening, input sanitization, and monitoring AI outputs.

What are prompt injection attacks in AI, and how can they be prevented in LLMs like ChatGPT?

The Prompt Injection Attacks Guide explains how attackers manipulate LLM inputs to override instructions, leak data, or generate harmful outputs in systems like ChatGPT. It explores real-world cases, common attack types, and provides prevention strategies such as input sanitization, role-based isolation, output monitoring, and red team testing to secure AI deployments.

Top GenAI Jailbreak Protection Research

CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations

The CAVGAN framework introduces a unified approach to both jailbreak and defense of LLMs by applying generative adversarial attacks on their internal representations. Experiments show it achieves high jailbreak success rates while simultaneously strengthening defenses, offering new insights into LLM security mechanisms and advancing strategies for robust model protection.

Top GenAI Prompt Injection Protection

Securing GenAI Applications Against Prompt Injection: The Role of an Enterprise-Grade LLM Firewall

The EchoLeak incident exposed how prompt injection in Microsoft 365 Copilot could silently exfiltrate enterprise data, highlighting the urgency of securing GenAI workflows. This blog presents defenses such as enterprise-grade LLM Firewalls, overseer models, canary tokens, and red teaming to build layered, Zero Trust protections for safe and scalable AI adoption.

For more expert breakdowns, visit our Trusted AI Blog or follow us on LinkedIn to stay up to date with the latest in AI security. Be the first to learn about emerging risks, tools, and defense strategies.

Subscribe for updates

Stay up to date with what is happening! Plus, get a first look at news, noteworthy research, and the worst attacks on AI—delivered right to your inbox.