Towards Secure AI Week 28 — Grok Jailbreaks, New Whitepaper by CoSAI, and IAM Leaders Abandon Zero Trust for Agentic Hype
From jailbreak labs to enterprise lapses, this week reveals the widening reality gap in securing autonomous AI. A new multi-turn jailbreak technique targeting Grok-4 shows how combining subtle context poisoning with conversational pressure can bypass LLM safety filters—reaching success rates above 60% on prohibited content. This week’s takeaway is clear: ...