Towards Secure AI Week 32 — NIST Control Overlays, OWASP Landscape, LLM Trustworthiness Scores, and GPT-5 Jailbreak

Secure AI Weekly ADMIN todayAugust 18, 2025 214

Background
share close

From GPT-5 jailbreaks leaking harmful instructions within hours of release to new benchmarks exposing systemic weaknesses in major models, this week highlighted how fragile LLM Security remains. Despite new training methods, Jailbreak LLM attacks like context poisoning and obfuscation continue to bypass guardrails.

As enterprises experiment with tool-using and multi-agent workflows, the challenges of Agentic AI Security grow sharper. Autonomous decision-making introduces risks that traditional frameworks cannot cover, from uncontrolled tool invocation to privilege escalation across connected environments.

Meanwhile, regulators and standards bodies are catching up. NIST proposed its Control Overlays for Securing AI Systems, and OWASP advanced both its AI Maturity Assessment and its GenAI Solutions Landscape — giving practitioners practical frameworks to strengthen Secure AI and address evolving GenAI Security threats.

NIST SP 800-53 Control Overlays for Securing AI Systems Concept Paper

NIST, August 15, 2025

NIST has published a concept paper outlining Control Overlays for Securing AI Systems (COSAIS), which adapts the widely used SP 800-53 security controls to address AI-specific risks. Building on the AI RMF, SSDF Community Profile, and draft misuse-risk guidance, COSAIS will provide implementation-focused overlays covering different AI system components (training data, model weights, configs) and use cases.

The first set of proposed overlays spans five areas: generative AI assistants, predictive AI, single-agent systems, multi-agent systems, and AI developer practices. Each overlay will tailor SP 800-53 controls to mitigate unique risks, such as prompt injection, insecure model deployment, and misuse of dual-use foundation models. NIST has opened feedback via email and a dedicated Slack channel, aiming to release the first public draft in early FY26.

AI Security Solutions Landscape For LLM and Gen Al Apps Q2/Q3 2025

OWASP GenAI Security Project, August 12, 2025

The AI Security Solutions Landscape for LLM and GenAI Apps (Q2/Q3 2025) provides a peer-reviewed map of open-source and commercial tools across the full LLM and Generative AI lifecycle. Anchored in the OWASP Top 10 risks and SecOps practices, it helps teams understand solution coverage, navigate gaps, and align defenses with evolving threats. Updated quarterly, it serves as a practical guide for security leaders to track and compare the rapidly expanding ecosystem.

OWASP AI Maturity Assessment

OWASP, August 11, 2025

The OWASP AI Maturity Assessment (AIMA) offers a structured framework for evaluating and improving how organizations adopt and manage AI. Covering five domains—Strategy, Design, Implementation, Operations, and Governance—it provides actionable maturity levels to align AI systems with business, ethical, and security goals. Community-driven and grounded in software assurance, AIMA addresses unique AI challenges like explainability, data risks, and adversarial threats.

GPT-5 jailbreaks reported despite OpenAI’s new safety-training method

SC World, August 12, 2025

OpenAI released GPT-5 on August 7 with a new “safe-completion” training method, designed to improve safety while avoiding over-refusal. Yet within a day, at least three research groups reported successful jailbreaks, extracting detailed instructions for building explosives.

NeuralTrust used its “Echo Chamber” plus narrative storytelling to bypass guardrails, Tenable achieved a jailbreak in four prompts with a history-based scenario, and SPLX ran 1,000+ attack tests, showing GPT-5 failed most security categories without a system prompt. Techniques included obfuscation attacks like “StringJoin,” highlighting persistent systemic weaknesses despite OpenAI’s claimed 5,000 hours of AI Red Teaming.

How to deal with it
— Treat new model releases as untrusted until independently tested, using sandbox environments and internal AI Red Teaming.
— Deploy conversational-level monitoring and AI gateways to catch obfuscation and context-poisoning attacks.
— Require layered defenses around AI deployment, combining technical guardrails with continuous adversarial testing.

AI Model Security & Trustworthiness Report Cards Released

Riskrubric, August 12, 2025

A new set of report cards ranks 150+ LLMs on transparency, reliability, security, privacy, safety, and reputation, highlighting both leaders and laggards.

The latest evaluation introduces standardized grades across six trust domains, updated monthly to reflect shifts in performance. Top-scoring models include ERNIE-4.5-21B (Baidu, A-935), Llama-3.3 Nemotron (NVIDIA, A-922), Claude-Opus-4 (Anthropic, A-914), and o3-mini (OpenAI, A-909). At the other end, Mixtral-8x7B (Mistral, F-563), Qwen3-4B (Qwen, F-566), and gemma-2b-it (Google, D-602) received failing grades, reflecting significant weaknesses in reliability and security. The dataset offers security teams a benchmark to compare risks across the rapidly expanding model ecosystem.

How to deal with it:
— Incorporate these standardized ratings into vendor risk management and model procurement decisions.
— Prioritize model categories where scores reveal weaknesses, such as privacy or security, before deployment.
— Restrict low-scoring models to controlled environments with additional monitoring and compensating safeguards.

Written by: ADMIN

Rate it

Previous post