Towards Secure AI Week 28 — Grok Jailbreaks, New Whitepaper by CoSAI, and IAM Leaders Abandon Zero Trust for Agentic Hype

Trusted AI Blog ADMIN todayJuly 21, 2025 85

Background
share close

From jailbreak labs to enterprise lapses, this week reveals the widening reality gap in securing autonomous AI.

A new multi-turn jailbreak technique targeting Grok-4 shows how combining subtle context poisoning with conversational pressure can bypass LLM safety filters—reaching success rates above 60% on prohibited content.  This week’s takeaway is clear: autonomous systems demand autonomous resilience—and we’re already seeing the cost of falling short.

Grok-4 Jailbreak with Echo Chamber and Crescendo

Neural Trust, July 11, 2025

LLM jailbreaks are evolving — and attackers now combine techniques for greater impact.

A new experiment shows that combining Echo Chamber and Crescendo attacks can significantly boost jailbreak success against LLMs like Grok-4. Echo Chamber gradually poisons the model’s context through multi-turn persuasion, while Crescendo adds targeted pressure when progress stalls. This combined approach bypassed safety filters and enabled Grok-4 to output harmful instructions such as Molotov recipes, with success rates reaching 67% for that objective, 50% for meth synthesis, and 30% for toxins. In some cases, the attack worked in a single turn, revealing how adversaries can evade keyword-based filters through sustained conversational manipulation. A similar idea — combining logical jailbreaks with older hacking techniques to achieve transferable attacks — was previously explored in our article Universal LLM jailbreak: ChatGPT, GPT-4, Bard, Bing, Anthropic, and beyond.

How to deal with it:

  • Test LLMs using combined adversarial strategies (e.g., Echo Chamber + Crescendo), not just isolated jailbreak techniques.
  • Strengthen multi-turn safety evaluations to detect subtle, cumulative manipulation attempts.
  • Continuously test and monitor AI behavior using the Adversa AI Red Teaming Platform — built to simulate multi-turn attacks like Echo Chamber and Crescendo and assess defenses across both traditional and AI-native vectors.

Preparing Defenders of AI Systems — New Whitepaper by the Coalition for Secure AI (CoSAI)

GitHub, July 14, 2025

Adversa AI is proud to contribute to the latest whitepaper from the Coalition for Secure AI (CoSAI) “Preparing Defenders of AI Systems” as a reviewer.

The paper outlines how AI adoption is reshaping enterprise security, creating risks that existing governance models were never designed to handle. It offers practical guidance to help defenders move from high-level frameworks like NIST, MITRE ATLAS, and OWASP to actionable implementation strategies. With a focus on layered defenses, evolving threats, and responsible innovation, the whitepaper provides a clear roadmap for protecting AI systems at scale. Designed as a living resource, it will be continuously updated to reflect new threats and best practices. Access the full paper.

A New Identity: IAM firms double down on agentic risk and cost

SC Media, July 14, 2025

Agentic AI is entering security stacks faster than governance can catch up — raising new risks for IAM and SOC teams.

A new industry column criticizes the wave of semi-autonomous AI agents being integrated into products from Microsoft, CrowdStrike, Okta, and Dropzone AI. These agents, designed to reduce analyst workload and accelerate response, often operate without onboarding, constraints, or clear oversight — earning comparisons to “interns with root access.” While vendors highlight ROI and productivity gains, experts warn that the lack of guardrails, kill switches, and alignment with zero-trust principles could lead to systemic exposure. The article calls out the disconnect between responsible AI messaging and current deployment practices in identity and operations platforms.

How to deal with it:

  • Treat embedded AI agents as privileged identities and apply least-privilege, onboarding, and auditing controls.
  • Establish enforceable runtime constraints and kill switches for agentic behaviors within security-critical workflows.
  • Push vendors to provide transparency on AI agent permissions, autonomy boundaries, and governance readiness.

AI Safety Index

Future of Life Institute, July 17, 2025

A new industry-wide audit shows that leading AI developers remain unprepared for the very risks they claim to manage.

The second AI Safety Index from the Future of Life Institute ranks seven major AI companies on risk management, safety frameworks, and governance. Anthropic earned the highest overall grade (C+), followed by OpenAI and Google DeepMind, while Meta, xAI, and Chinese firms Zhipu AI and DeepSeek received failing scores in key domains like existential safety and transparency. Despite public commitments to safe AGI, none of the evaluated firms scored above a D in long-term risk planning. The report highlights the lack of coherent control strategies, poor investment in dangerous capability evaluations, and minimal third-party testing. Only OpenAI has published a whistleblowing policy, and most companies show limited transparency on internal safeguards.

How to deal with it:

  • Demand concrete AGI control plans and transparent documentation of dangerous capability evaluations.
  • Align internal risk testing with independent third-party reviews to meet baseline safety expectations.
  • Push for regulatory standards requiring whistleblowing protections, safety disclosures, and public accountability.

For more expert breakdowns, visit our Trusted AI Blog or follow us on LinkedIn to stay up to date with the latest in AI security. Be the first to learn about emerging risks, tools, and defense strategies.

Subscribe for updates

Stay up to date with what is happening! Plus, get a first look at news, noteworthy research, and the worst attacks on AI—delivered right to your inbox.

    Written by: ADMIN

    Rate it
    Previous post

    Similar posts