Towards Secure AI Week 32 — NIST Control Overlays, OWASP Landscape, LLM Trustworthiness Scores, and GPT-5 Jailbreak

Secure AI Weekly ADMIN August 18, 2025 214

From GPT-5 jailbreaks leaking harmful instructions within hours of release to new benchmarks exposing systemic weaknesses in major models, this week highlighted how fragile LLM Security remains. Despite new training methods, Jailbreak LLM attacks like context poisoning and obfuscation continue to bypass guardrails.

As enterprises experiment with tool-using and multi-agent workflows, the challenges of Agentic AI Security grow sharper. Autonomous decision-making introduces risks that traditional frameworks cannot cover, from uncontrolled tool invocation to privilege escalation across connected environments.

Meanwhile, regulators and standards bodies are catching up. NIST proposed its Control Overlays for Securing AI Systems, and OWASP advanced both its AI Maturity Assessment and its GenAI Solutions Landscape — giving practitioners practical frameworks to strengthen Secure AI and address evolving GenAI Security threats.

NIST SP 800-53 Control Overlays for Securing AI Systems Concept Paper

NIST, August 15, 2025

NIST has published a concept paper outlining Control Overlays for Securing AI Systems (COSAIS), which adapts the widely used SP 800-53 security controls to address AI-specific risks. Building on the AI RMF, SSDF Community Profile, and draft misuse-risk guidance, COSAIS will provide implementation-focused overlays covering different AI system components (training data, model weights, configs) and use cases.

The first set of proposed overlays spans five areas: generative AI assistants, predictive AI, single-agent systems, multi-agent systems, and AI developer practices. Each overlay will tailor SP 800-53 controls to mitigate unique risks, such as prompt injection, insecure model deployment, and misuse of dual-use foundation models. NIST has opened feedback via email and a dedicated Slack channel, aiming to release the first public draft in early FY26.

AI Security Solutions Landscape For LLM and Gen Al Apps Q2/Q3 2025

OWASP GenAI Security Project, August 12, 2025

The AI Security Solutions Landscape for LLM and GenAI Apps (Q2/Q3 2025) provides a peer-reviewed map of open-source and commercial tools across the full LLM and Generative AI lifecycle. Anchored in the OWASP Top 10 risks and SecOps practices, it helps teams understand solution coverage, navigate gaps, and align defenses with evolving threats. Updated quarterly, it serves as a practical guide for security leaders to track and compare the rapidly expanding ecosystem.

OWASP AI Maturity Assessment

OWASP, August 11, 2025

The OWASP AI Maturity Assessment (AIMA) offers a structured framework for evaluating and improving how organizations adopt and manage AI. Covering five domains—Strategy, Design, Implementation, Operations, and Governance—it provides actionable maturity levels to align AI systems with business, ethical, and security goals. Community-driven and grounded in software assurance, AIMA addresses unique AI challenges like explainability, data risks, and adversarial threats.

GPT-5 jailbreaks reported despite OpenAI’s new safety-training method

SC World, August 12, 2025

OpenAI released GPT-5 on August 7 with a new “safe-completion” training method, designed to improve safety while avoiding over-refusal. Yet within a day, at least three research groups reported successful jailbreaks, extracting detailed instructions for building explosives.

NeuralTrust used its “Echo Chamber” plus narrative storytelling to bypass guardrails, Tenable achieved a jailbreak in four prompts with a history-based scenario, and SPLX ran 1,000+ attack tests, showing GPT-5 failed most security categories without a system prompt. Techniques included obfuscation attacks like “StringJoin,” highlighting persistent systemic weaknesses despite OpenAI’s claimed 5,000 hours of AI Red Teaming.

How to deal with it
— Treat new model releases as untrusted until independently tested, using sandbox environments and internal AI Red Teaming.
— Deploy conversational-level monitoring and AI gateways to catch obfuscation and context-poisoning attacks.
— Require layered defenses around AI deployment, combining technical guardrails with continuous adversarial testing.

AI Model Security & Trustworthiness Report Cards Released

Riskrubric, August 12, 2025

A new set of report cards ranks 150+ LLMs on transparency, reliability, security, privacy, safety, and reputation, highlighting both leaders and laggards.

The latest evaluation introduces standardized grades across six trust domains, updated monthly to reflect shifts in performance. Top-scoring models include ERNIE-4.5-21B (Baidu, A-935), Llama-3.3 Nemotron (NVIDIA, A-922), Claude-Opus-4 (Anthropic, A-914), and o3-mini (OpenAI, A-909). At the other end, Mixtral-8x7B (Mistral, F-563), Qwen3-4B (Qwen, F-566), and gemma-2b-it (Google, D-602) received failing grades, reflecting significant weaknesses in reliability and security. The dataset offers security teams a benchmark to compare risks across the rapidly expanding model ecosystem.

How to deal with it:
— Incorporate these standardized ratings into vendor risk management and model procurement decisions.
— Prioritize model categories where scores reveal weaknesses, such as privacy or security, before deployment.
— Restrict low-scoring models to controlled environments with additional monitoring and compensating safeguards.

Written by: ADMIN

Rate it

August 11, 2025

Agentic AI Security Digest ADMIN

Top Agentic AI Security Resources — August 2025

Explore the Top Agentic AI Resources to stay informed about the most pressing risks and defenses in the field. As autonomous agents gain new capabilities—reasoning, memory, tool use—they also introduce ...

Towards Secure AI Week 32 — NIST Control Overlays, OWASP Landscape, LLM Trustworthiness Scores, and GPT-5 Jailbreak

NIST SP 800-53 Control Overlays for Securing AI Systems Concept Paper

AI Security Solutions Landscape For LLM and Gen Al Apps Q2/Q3 2025

OWASP AI Maturity Assessment

GPT-5 jailbreaks reported despite OpenAI’s new safety-training method

AI Model Security & Trustworthiness Report Cards Released

Previous post

Top Agentic AI Security Resources — August 2025

Similar posts

Towards Secure AI Week 33 — Lenovo Chatbot Breach, PROMISQROUTE in GPT-5, NIST AI Security Overlays, EU AI Priorities, and Grok Privacy Leak

Towards Secure AI Week 32 — NIST Control Overlays, OWASP Landscape, LLM Trustworthiness Scores, and GPT-5 Jailbreak

Trusted AI Security

Explore Our Blog

Featured Post

Universal LLM Jailbreak: ChatGPT, GPT-4, BARD, BING, Anthropic, and Beyond

Latest Posts

TrustFall: coding agent security flaw enables one-click RCE in Claude, Cursor, Gemini CLI and GitHub Copilot

Top Agentic AI security resources — May 2026

AI-driven exploitation is here: what Mythos proved and what comes next

We broke GPT-5.4 safety with 10 examples and 2 words using a new attack technique — IICL

Red teaming agentic AI: should you go manual, in-house, or continuous?

Adversa AI wins Artificial Intelligence Excellence award in Safety and Alignment category

OWASP ASI01 — Agent Goal Hijack: a practical security guide

Top GenAI security resources — April 2026

Top MCP security resources — April 2026

Critical Claude Code vulnerability: Deny rules silently bypassed because security checks cost too many tokens

Top Agentic AI security resources — April 2026

Adversa AI Wins “Most Innovative Agentic AI Security” at Global InfoSec Awards During RSA Conference 2026