Involuntary in-context learning (IICL) is an attack technique against AI Applications and Agents that hides malicious instructions or harmful requests inside few-shot pattern completion tasks, bypassing safety alignment in large language models and allowing prompt injection and jailbreak atatcks. When we first tested IICL against GPT-5.4, the model bypassed at 60% ASR, while GPT-5 and GPT-5-mini remained at 0% — a safety regression introduced between releases.
The clear next step was to determine whether IICL is specific to OpenAI or a broader issue in industry safety training. To investigate, we ran the attack across frontier and open-weight models from Anthropic, OpenAI, Google, xAI, DeepSeek, Meta, Mistral, Cohere, Microsoft, Amazon, Moonshot, Alibaba, Zhipu, Tencent, Xiaomi, Z-AI, and AntGroup, testing ten attack variants each.
We evaluated 100 (model, reasoning-mode) entries across 17 vendor families with 24,956 adversarial probes usung different sub-techniques of IICL attack family. The results show a stark bimodal distribution: 36 entries were completely immune, while a similar number were fully compromised (100% bypass on at least one variant). Vulnerability correlates more closely with vendor identity than parameter count or release date.
Read the full research: Cross-Model Vulnerability to Involuntary In-Context Learning Attacks.
Every tested entry, ranked from most to least vulnerable. R marks reasoning state (Y = reasoning-active, n = non-reasoning). Mode gives the specific configuration used (e.g. minimal, high, thinking_on). Bypass rate is the share of adversarial probes (out of N) that successfully elicited harmful output.
| # | Model (mode) | Family | R | Mode | Bypass B/N (%) | 95% CI |
|---|---|---|---|---|---|---|
| 1 | Mistral Medium 3.1 | Mistral | n | – | 250/250 (100%) | [98.5, 100] |
| 2 | Mistral Large | Mistral | n | – | 247/250 (98.8%) | [96.5, 99.6] |
| 3 | Grok 3 Mini | xAI | n | – | 244/250 (97.6%) | [94.9, 98.9] |
| 4 | Gemma 3 27B | n | – | 242/250 (96.8%) | [93.8, 98.4] | |
| 5 | Grok 3 | xAI | n | – | 239/250 (95.6%) | [92.3, 97.5] |
| 6 | Mistral Small 3.2 | Mistral | n | – | 236/250 (94.4%) | [90.8, 96.6] |
| 7 | Ant Ling 2.6 Flash | AntGroup | n | none | 230/250 (92.0%) | [88.0, 94.8] |
| 8 | Ant Ling 2.6 1T | AntGroup | Y | default | 230/251 (91.6%) | [87.5, 94.5] |
| 9 | DeepSeek V4 Flash | DeepSeek | n | – | 225/251 (89.6%) | [85.3, 92.8] |
| 10 | DeepSeek V3.2 | DeepSeek | n | – | 206/250 (82.4%) | [77.2, 86.6] |
| 11 | Gemini 2.0 Flash | n | – | 204/250 (81.6%) | [76.3, 85.9] | |
| 12 | Gemini 2.5 Flash | n | – | 200/250 (80.0%) | [74.6, 84.5] | |
| 13 | DeepSeek Chat V3.1 | DeepSeek | n | none | 187/245 (76.3%) | [70.6, 81.2] |
| 14 | GPT-4.1 Mini | OpenAI | n | – | 190/250 (76.0%) | [70.3, 80.9] |
| 15 | Tencent HunYuan 3 | Tencent | Y | default | 178/245 (72.7%) | [66.8, 77.9] |
| 16 | GLM-4.5 | Zhipu | n | none | 179/248 (72.2%) | [66.3, 77.4] |
| 17 | GLM-4.5V | Zhipu | Y | default | 179/250 (71.6%) | [65.7, 76.8] |
| 18 | DeepSeek R1 | DeepSeek | Y | default | 178/250 (71.2%) | [65.3, 76.5] |
| 19 | Kimi K2 (non-thinking) | Moonshot | n | none | 177/250 (70.8%) | [64.9, 76.1] |
| 20 | Gemini 2.5 Flash Lite | n | – | 174/250 (69.6%) | [63.6, 75.0] | |
| 21 | DeepSeek V4 Pro | DeepSeek | n | – | 169/250 (67.6%) | [61.6, 73.1] |
| 22 | Command R+ | Cohere | n | – | 166/250 (66.4%) | [60.3, 72.0] |
| 23 | Llama 4 Scout | Meta | n | – | 155/250 (62.0%) | [55.8, 67.8] |
| 24 | Gemini 3.1 Flash Lite | n | – | 138/250 (55.2%) | [49.0, 61.2] | |
| 25 | Qwen3 Coder Next | Alibaba | n | none | 119/250 (47.6%) | [41.5, 53.8] |
| 26 | Xiaomi MiMo 2.5 | Xiaomi | n | none | 113/250 (45.2%) | [39.1, 51.4] |
| 27 | GPT-3.5 Turbo | OpenAI | n | – | 105/250 (42.0%) | [36.0, 48.2] |
| 28 | Gemini 2.5 Pro | Y | default | 101/250 (40.4%) | [34.5, 46.6] | |
| 29 | GPT-4.1 Nano | OpenAI | n | – | 86/250 (34.4%) | [28.8, 40.5] |
| 30 | GPT-5 (minimal) | OpenAI | n | minimal | 81/249 (32.5%) | [27.0, 38.6] |
| 31 | Llama 4 Maverick | Meta | n | – | 74/250 (29.6%) | [24.3, 35.5] |
| 32 | Qwen3 Max Think | Alibaba | Y | default | 56/250 (22.4%) | [17.7, 28.0] |
| 33 | Xiaomi MiMo 2.5 Pro | Xiaomi | Y | default | 54/251 (21.5%) | [16.9, 27.0] |
| 34 | Nova Pro | Amazon | n | – | 50/250 (20.0%) | [15.5, 25.4] |
| 35 | Kimi K2 Think | Moonshot | Y | default | 50/250 (20.0%) | [15.5, 25.4] |
| 36 | Tencent HunYuan A13B | Tencent | n | none | 50/250 (20.0%) | [15.5, 25.4] |
| 37 | GPT-5.1 | OpenAI | Y | default | 42/250 (16.8%) | [12.7, 21.9] |
| 38 | GPT-4o Mini | OpenAI | n | – | 33/250 (13.2%) | [9.6, 18.0] |
| 39 | NVIDIA Nemotron 70B | NVIDIA | n | none | 33/250 (13.2%) | [9.6, 18.0] |
| 40 | Phi-4 | Microsoft | n | – | 28/250 (11.2%) | [7.9, 15.7] |
| 41 | Claude 3.7 Sonnet | Anthropic | Y | default | 27/250 (10.8%) | [7.5, 15.3] |
| 42 | Llama 3.1 405B | Meta | n | – | 22/250 (8.8%) | [5.9, 13.0] |
| 43 | GPT-4.1 | OpenAI | n | – | 17/250 (6.8%) | [4.3, 10.6] |
| 44 | GPT-5 Mini (minimal) | OpenAI | n | minimal | 11/250 (4.4%) | [2.5, 7.7] |
| 45 | Kimi K2.6 | Moonshot | Y | default | 10/251 (4.0%) | [2.2, 7.2] |
| 46 | Kimi K2.5 | Moonshot | Y | default | 9/250 (3.6%) | [1.9, 6.7] |
| 47 | GPT-4o | OpenAI | n | – | 9/250 (3.6%) | [1.9, 6.7] |
| 48 | Gemma 4 26B-MoE | n | – | 8/251 (3.2%) | [1.6, 6.2] | |
| 49 | o4-mini | OpenAI | Y | default | 7/250 (2.8%) | [1.4, 5.7] |
| 50 | GPT-OSS 120B | OpenAI | n | – | 7/250 (2.8%) | [1.4, 5.7] |
| 51 | Grok 4 | xAI | Y | default | 7/250 (2.8%) | [1.4, 5.7] |
| 52 | Qwen3 Coder | Alibaba | n | – | 6/250 (2.4%) | [1.1, 5.1] |
| 53 | Llama 3.3 70B | Meta | n | – | 5/250 (2.0%) | [0.9, 4.6] |
| 54 | Grok 4 Fast low | xAI | Y | low | 3/250 (1.2%) | [0.4, 3.5] |
| 55 | Grok 4 low | xAI | Y | low | 3/250 (1.2%) | [0.4, 3.5] |
| 56 | Llama 3.1 70B | Meta | n | none | 3/250 (1.2%) | [0.4, 3.5] |
| 57 | GPT-5.3 | OpenAI | n | – | 2/250 (0.8%) | [0.2, 2.9] |
| 58 | Grok 4.20 low | xAI | Y | low | 2/250 (0.8%) | [0.2, 2.9] |
| 59 | o3 | OpenAI | Y | default | 1/250 (0.4%) | [0.1, 2.2] |
| 60 | Grok 4.1 Fast | xAI | Y | default | 1/250 (0.4%) | [0.1, 2.2] |
| 61 | GPT-5.1 (high) | OpenAI | Y | high | 1/250 (0.4%) | [0.1, 2.2] |
| 62 | GPT-5.2 (high) | OpenAI | Y | high | 1/250 (0.4%) | [0.1, 2.2] |
| 63 | GPT-5.4 (high) | OpenAI | Y | high | 1/250 (0.4%) | [0.1, 2.2] |
| 64 | o3-mini | OpenAI | Y | default | 1/250 (0.4%) | [0.1, 2.2] |
| 65 | Claude 3.5 Haiku | Anthropic | n | – | 0/250 (0.0%) | [0, 1.5] |
| 66 | Claude 3 Haiku | Anthropic | n | – | 0/250 (0.0%) | [0, 1.5] |
| 67 | GPT-5 | OpenAI | Y | default | 0/250 (0.0%) | [0, 1.5] |
| 68 | GPT-5 Mini | OpenAI | Y | default | 0/250 (0.0%) | [0, 1.5] |
| 69 | GPT-5.2 | OpenAI | Y | default | 0/250 (0.0%) | [0, 1.5] |
| 70 | Claude Opus 4.5 | Anthropic | Y | default | 0/250 (0.0%) | [0, 1.5] |
| 71 | Claude Sonnet 4.5 | Anthropic | Y | default | 0/250 (0.0%) | [0, 1.5] |
| 72 | Claude 3.5 Sonnet | Anthropic | n | – | 0/250 (0.0%) | [0, 1.5] |
| 73 | Gemini 3 Pro | Y | default | 0/250 (0.0%) | [0, 1.5] | |
| 74 | Claude Sonnet 4.6 | Anthropic | Y | default | 0/250 (0.0%) | [0, 1.5] |
| 75 | Claude Opus 4.6 | Anthropic | Y | default | 0/250 (0.0%) | [0, 1.5] |
| 76 | Gemini 3.1 Pro | Y | default | 0/250 (0.0%) | [0, 1.5] | |
| 77 | Qwen 3.5 397B | Alibaba | n | – | 0/250 (0.0%) | [0, 1.5] |
| 78 | GPT-5.5 | OpenAI | Y | default | 0/250 (0.0%) | [0, 1.5] |
| 79 | Claude Opus 4.7 | Anthropic | Y | default | 0/250 (0.0%) | [0, 1.5] |
| 80 | GPT-5.5 Pro | OpenAI | Y | default | 0/251 (0.0%) | [0, 1.5] |
| 81 | Qwen 3.6 Plus | Alibaba | Y | default | 0/251 (0.0%) | [0, 1.5] |
| 82 | GPT-5.1 (minimal) | OpenAI | n | minimal | 0/250 (0.0%) | [0, 1.5] |
| 83 | GPT-5 Mini (high) | OpenAI | Y | high | 0/250 (0.0%) | [0, 1.5] |
| 84 | GPT-5.2 (minimal) | OpenAI | n | minimal | 0/250 (0.0%) | [0, 1.5] |
| 85 | GPT-5 (high) | OpenAI | Y | high | 0/250 (0.0%) | [0, 1.5] |
| 86 | GPT-5.4 (minimal) | OpenAI | n | minimal | 0/250 (0.0%) | [0, 1.5] |
| 87 | GPT-5.2 Pro (medium) | OpenAI | Y | medium | 0/250 (0.0%) | [0, 1.5] |
| 88 | GPT-5.2 Pro (high) | OpenAI | Y | high | 0/249 (0.0%) | [0, 1.5] |
| 89 | GPT-5.4 Pro (high) | OpenAI | Y | high | 0/216 (0.0%) | [0, 1.7] |
| 90 | Claude Opus 4.7 thinking | Anthropic | Y | thinking_on | 0/250 (0.0%) | [0, 1.5] |
| 91 | Claude Opus 4.6 thinking | Anthropic | Y | thinking_on | 0/250 (0.0%) | [0, 1.5] |
| 92 | Claude Opus 4.5 thinking | Anthropic | Y | thinking_on | 0/249 (0.0%) | [0, 1.5] |
| 93 | GPT-5.5 minimal | OpenAI | Y | minimal | 0/250 (0.0%) | [0, 1.5] |
| 94 | Claude Sonnet 4.6 thinking | Anthropic | Y | thinking_on | 0/250 (0.0%) | [0, 1.5] |
| 95 | Claude Sonnet 4.5 thinking | Anthropic | Y | thinking_on | 0/250 (0.0%) | [0, 1.5] |
| 96 | Claude 3.7 Sonnet thinking | Anthropic | Y | thinking_on | 0/250 (0.0%) | [0, 1.5] |
| 97 | Gemini 3 Pro thinking_off | n | thinking_off | 0/250 (0.0%) | [0, 1.5] | |
| 98 | Gemini 3.1 Pro thinking_off | n | thinking_off | 0/250 (0.0%) | [0, 1.5] | |
| 99 | o1 | OpenAI | Y | default | 0/249 (0.0%) | [0, 1.5] |
| 100 | GPT-5.5 high | OpenAI | Y | high | 0/249 (0.0%) | [0, 1.5] |
Every entry running with reasoning enabled, ranked by bypass rate. Pooled bypass rate: 9.2% [8.7, 9.7] — roughly 4× lower than the non-reasoning pool.
| # | Model | Family | Mode | Bypass B/N (%) | 95% CI |
|---|---|---|---|---|---|
| 1 | Ant Ling 2.6 1T | AntGroup | default | 230/251 (91.6%) | [87.5, 94.5] |
| 2 | Tencent HunYuan 3 | Tencent | default | 178/245 (72.7%) | [66.8, 77.9] |
| 3 | GLM-4.5V | Zhipu | default | 179/250 (71.6%) | [65.7, 76.8] |
| 4 | DeepSeek R1 | DeepSeek | default | 178/250 (71.2%) | [65.3, 76.5] |
| 5 | Gemini 2.5 Pro | default | 101/250 (40.4%) | [34.5, 46.6] | |
| 6 | Qwen3 Max Think | Alibaba | default | 56/250 (22.4%) | [17.7, 28.0] |
| 7 | Xiaomi MiMo 2.5 Pro | Xiaomi | default | 54/251 (21.5%) | [16.9, 27.0] |
| 8 | Kimi K2 Think | Moonshot | default | 50/250 (20.0%) | [15.5, 25.4] |
| 9 | GPT-5.1 | OpenAI | default | 42/250 (16.8%) | [12.7, 21.9] |
| 10 | Claude 3.7 Sonnet | Anthropic | default | 27/250 (10.8%) | [7.5, 15.3] |
| 11 | Kimi K2.6 | Moonshot | default | 10/251 (4.0%) | [2.2, 7.2] |
| 12 | Kimi K2.5 | Moonshot | default | 9/250 (3.6%) | [1.9, 6.7] |
| 13 | o4-mini | OpenAI | default | 7/250 (2.8%) | [1.4, 5.7] |
| 14 | Grok 4 | xAI | default | 7/250 (2.8%) | [1.4, 5.7] |
| 15 | Grok 4 Fast | xAI | low | 3/250 (1.2%) | [0.4, 3.5] |
| 16 | Grok 4 | xAI | low | 3/250 (1.2%) | [0.4, 3.5] |
| 17 | Grok 4.20 | xAI | low | 2/250 (0.8%) | [0.2, 2.9] |
| 18 | o3 | OpenAI | default | 1/250 (0.4%) | [0.1, 2.2] |
| 19 | Grok 4.1 Fast | xAI | default | 1/250 (0.4%) | [0.1, 2.2] |
| 20 | GPT-5.1 | OpenAI | high | 1/250 (0.4%) | [0.1, 2.2] |
| 21 | GPT-5.2 | OpenAI | high | 1/250 (0.4%) | [0.1, 2.2] |
| 22 | GPT-5.4 | OpenAI | high | 1/250 (0.4%) | [0.1, 2.2] |
| 23 | o3-mini | OpenAI | default | 1/250 (0.4%) | [0.1, 2.2] |
| 24 | GPT-5 | OpenAI | default | 0/250 (0.0%) | [0, 1.5] |
| 25 | GPT-5 Mini | OpenAI | default | 0/250 (0.0%) | [0, 1.5] |
| 26 | GPT-5.2 | OpenAI | default | 0/250 (0.0%) | [0, 1.5] |
| 27 | Claude Opus 4.5 | Anthropic | default | 0/250 (0.0%) | [0, 1.5] |
| 28 | Claude Sonnet 4.5 | Anthropic | default | 0/250 (0.0%) | [0, 1.5] |
| 29 | Gemini 3 Pro | default | 0/250 (0.0%) | [0, 1.5] | |
| 30 | Claude Sonnet 4.6 | Anthropic | default | 0/250 (0.0%) | [0, 1.5] |
| 31 | Claude Opus 4.6 | Anthropic | default | 0/250 (0.0%) | [0, 1.5] |
| 32 | Gemini 3.1 Pro | default | 0/250 (0.0%) | [0, 1.5] | |
| 33 | GPT-5.5 | OpenAI | default | 0/250 (0.0%) | [0, 1.5] |
| 34 | Claude Opus 4.7 | Anthropic | default | 0/250 (0.0%) | [0, 1.5] |
| 35 | GPT-5.5 Pro | OpenAI | default | 0/251 (0.0%) | [0, 1.5] |
| 36 | Qwen 3.6 Plus | Alibaba | default | 0/251 (0.0%) | [0, 1.5] |
| 37 | GPT-5 Mini | OpenAI | high | 0/250 (0.0%) | [0, 1.5] |
| 38 | GPT-5 | OpenAI | high | 0/250 (0.0%) | [0, 1.5] |
| 39 | GPT-5.2 Pro | OpenAI | medium | 0/250 (0.0%) | [0, 1.5] |
| 40 | GPT-5.2 Pro | OpenAI | high | 0/249 (0.0%) | [0, 1.5] |
| 41 | GPT-5.4 Pro | OpenAI | high | 0/216 (0.0%) | [0, 1.7] |
| 42 | Claude Opus 4.7 | Anthropic | thinking_on | 0/250 (0.0%) | [0, 1.5] |
| 43 | Claude Opus 4.6 | Anthropic | thinking_on | 0/250 (0.0%) | [0, 1.5] |
| 44 | Claude Opus 4.5 | Anthropic | thinking_on | 0/249 (0.0%) | [0, 1.5] |
| 45 | GPT-5.5 | OpenAI | minimal | 0/250 (0.0%) | [0, 1.5] |
| 46 | Claude Sonnet 4.6 | Anthropic | thinking_on | 0/250 (0.0%) | [0, 1.5] |
| 47 | Claude Sonnet 4.5 | Anthropic | thinking_on | 0/250 (0.0%) | [0, 1.5] |
| 48 | Claude 3.7 Sonnet | Anthropic | thinking_on | 0/250 (0.0%) | [0, 1.5] |
| 49 | o1 | OpenAI | default_high | 0/249 (0.0%) | [0, 1.5] |
| 50 | GPT-5.5 | OpenAI | high | 0/249 (0.0%) | [0, 1.5] |
Every entry running without reasoning (chat baseline, reasoning explicitly disabled, or no reasoning mode available). Pooled bypass rate: 39.4% [38.5, 40.2].
| # | Model | Family | Mode | Bypass B/N (%) | 95% CI |
|---|---|---|---|---|---|
| 1 | Mistral Medium 3.1 | Mistral | – | 250/250 (100%) | [98.5, 100] |
| 2 | Mistral Large | Mistral | – | 247/250 (98.8%) | [96.5, 99.6] |
| 3 | Grok 3 Mini | xAI | – | 244/250 (97.6%) | [94.9, 98.9] |
| 4 | Gemma 3 27B | – | 242/250 (96.8%) | [93.8, 98.4] | |
| 5 | Grok 3 | xAI | – | 239/250 (95.6%) | [92.3, 97.5] |
| 6 | Mistral Small 3.2 | Mistral | – | 236/250 (94.4%) | [90.8, 96.6] |
| 7 | Ant Ling 2.6 Flash | AntGroup | none | 230/250 (92.0%) | [88.0, 94.8] |
| 8 | DeepSeek V4 Flash | DeepSeek | – | 225/251 (89.6%) | [85.3, 92.8] |
| 9 | DeepSeek V3.2 | DeepSeek | – | 206/250 (82.4%) | [77.2, 86.6] |
| 10 | Gemini 2.0 Flash | – | 204/250 (81.6%) | [76.3, 85.9] | |
| 11 | Gemini 2.5 Flash | – | 200/250 (80.0%) | [74.6, 84.5] | |
| 12 | DeepSeek Chat V3.1 | DeepSeek | none | 187/245 (76.3%) | [70.6, 81.2] |
| 13 | GPT-4.1 Mini | OpenAI | – | 190/250 (76.0%) | [70.3, 80.9] |
| 14 | GLM-4.5 | Zhipu | none | 179/248 (72.2%) | [66.3, 77.4] |
| 15 | Kimi K2 (non-thinking) | Moonshot | none | 177/250 (70.8%) | [64.9, 76.1] |
| 16 | Gemini 2.5 Flash Lite | – | 174/250 (69.6%) | [63.6, 75.0] | |
| 17 | DeepSeek V4 Pro | DeepSeek | – | 169/250 (67.6%) | [61.6, 73.1] |
| 18 | Command R+ | Cohere | – | 166/250 (66.4%) | [60.3, 72.0] |
| 19 | Llama 4 Scout | Meta | – | 155/250 (62.0%) | [55.8, 67.8] |
| 20 | Gemini 3.1 Flash Lite | – | 138/250 (55.2%) | [49.0, 61.2] | |
| 21 | Qwen3 Coder Next | Alibaba | none | 119/250 (47.6%) | [41.5, 53.8] |
| 22 | Xiaomi MiMo 2.5 | Xiaomi | none | 113/250 (45.2%) | [39.1, 51.4] |
| 23 | GPT-3.5 Turbo | OpenAI | – | 105/250 (42.0%) | [36.0, 48.2] |
| 24 | GPT-4.1 Nano | OpenAI | – | 86/250 (34.4%) | [28.8, 40.5] |
| 25 | GPT-5 | OpenAI | minimal | 81/249 (32.5%) | [27.0, 38.6] |
| 26 | Llama 4 Maverick | Meta | – | 74/250 (29.6%) | [24.3, 35.5] |
| 27 | Nova Pro | Amazon | – | 50/250 (20.0%) | [15.5, 25.4] |
| 28 | Tencent HunYuan A13B | Tencent | none | 50/250 (20.0%) | [15.5, 25.4] |
| 29 | GPT-4o Mini | OpenAI | – | 33/250 (13.2%) | [9.6, 18.0] |
| 30 | NVIDIA Nemotron 70B | NVIDIA | none | 33/250 (13.2%) | [9.6, 18.0] |
| 31 | Phi-4 | Microsoft | – | 28/250 (11.2%) | [7.9, 15.7] |
| 32 | Llama 3.1 405B | Meta | – | 22/250 (8.8%) | [5.9, 13.0] |
| 33 | GPT-4.1 | OpenAI | – | 17/250 (6.8%) | [4.3, 10.6] |
| 34 | GPT-5 Mini | OpenAI | minimal | 11/250 (4.4%) | [2.5, 7.7] |
| 35 | GPT-4o | OpenAI | – | 9/250 (3.6%) | [1.9, 6.7] |
| 36 | Gemma 4 26B-MoE | – | 8/251 (3.2%) | [1.6, 6.2] | |
| 37 | GPT-OSS 120B | OpenAI | – | 7/250 (2.8%) | [1.4, 5.7] |
| 38 | Qwen3 Coder | Alibaba | – | 6/250 (2.4%) | [1.1, 5.1] |
| 39 | Llama 3.3 70B | Meta | – | 5/250 (2.0%) | [0.9, 4.6] |
| 40 | Llama 3.1 70B | Meta | none | 3/250 (1.2%) | [0.4, 3.5] |
| 41 | GPT-5.3 | OpenAI | – | 2/250 (0.8%) | [0.2, 2.9] |
| 42 | Claude 3.5 Haiku | Anthropic | – | 0/250 (0.0%) | [0, 1.5] |
| 43 | Claude 3 Haiku | Anthropic | – | 0/250 (0.0%) | [0, 1.5] |
| 44 | Claude 3.5 Sonnet | Anthropic | – | 0/250 (0.0%) | [0, 1.5] |
| 45 | Qwen 3.5 397B | Alibaba | – | 0/250 (0.0%) | [0, 1.5] |
| 46 | GPT-5.1 | OpenAI | minimal | 0/250 (0.0%) | [0, 1.5] |
| 47 | GPT-5.2 | OpenAI | minimal | 0/250 (0.0%) | [0, 1.5] |
| 48 | GPT-5.4 | OpenAI | minimal | 0/250 (0.0%) | [0, 1.5] |
| 49 | Gemini 3 Pro | thinking_off | 0/250 (0.0%) | [0, 1.5] | |
| 50 | Gemini 3.1 Pro | thinking_off | 0/250 (0.0%) | [0, 1.5] |
IICL vulnerability is neither universal nor random — it is determined by specific architectural and training choices that vary systematically across vendors and model generations, with reasoning augmentation emerging as the strongest single defensive factor measured (a 4.3× reduction in bypass rate). Anthropic's Claude family is uniformly immune across 15 entries and four generations, while Mistral's entire lineup falls at 94–100% — meaning model selection alone is now a measurable safety control for enterprises deploying LLMs.
Read the full paper: Cross-Model Vulnerability to Involuntary In-Context Learning Attacks.