Trusted AI Blog

489 Results / Page 1 of 55

Background
IICL involuntary in-context learning attack technique

todayApril 23, 2026

close

Research + LLM Security admin

We broke GPT-5.4 safety with 10 examples and 2 words using a new attack technique — IICL

OpenAI’s newest flagship is more vulnerable to our attack than GPT-5 or GPT-5-mini. Newer doesn’t mean safer. Our new research (3,500+ probes, 10 models, 7 controlled experiments) shows why continuous red teaming isn’t optional for anyone building on frontier AI. TL;DR We ran 3,500+ controlled probes across every model in ...