TL;DR — what is the Claude 4.6 system prompt?
The Claude 4.6 system prompt is a ~15,000-token instruction set that controls Claude’s behavior, safety rules, tool access, and personality. It is injected before every conversation by Anthropic and is normally hidden from users. It defines what Claude can and cannot do — from file handling paths (/mnt/user-data/uploads/) to network whitelists, copyright rules, self-harm protections, and the exact model identifiers like claude-opus-4-6 and claude-sonnet-4-5-20250929.
In this article, we demonstrate how we extracted the full Claude 4.6 and Claude 4.5 system prompts using a chained prompt leak technique — and reveal what changed between them from a security perspective.
This is a detailed public analyses of Claude’s system prompt published.
🔗 Full system prompts are available for download at the end of this article.
Why does the system prompt matter?
Every AI chatbot runs on a hidden system prompt — a set of instructions that the company injects before your conversation begins. It’s the DNA of the chatbot’s behavior.
For Claude 4.6 (Opus) and Claude 4.5 (Sonnet) the system prompt reveals:
- Exact safety rules — what Claude will refuse and how to frame requests it will accept
- Infrastructure details — file system paths, operating system (Ubuntu 24), network whitelist of 17 allowed domains
- Tool capabilities — 12+ tools including bash execution, web search, file creation, past conversation search
- Attack surface — API proxy configuration, persistent storage API, package installation commands
- Classifier names — the exact 6 safety classifiers Anthropic uses (and which threat categories have NO classifier)
- Behavioral constraints — Claude must be warm, must not question user judgment, must answer before asking clarification
For security researchers, red teamers, and AI safety professionals, the system prompt is the most valuable piece of reconnaissance available. For everyday users, it reveals exactly how the AI you’re talking to actually works.
First, how we extracted the Claude 4.5 system prompt
The partial prompt leak technique
The partial prompt leak approach as a distinct attack category was first discovered by Adversa AI and has been documented as a reliable technique against major LLM providers. The core idea is simple: instead of asking for the full system prompt (which triggers refusal), you ask for compressed, summarized, or partial information — then chain multiple partial leaks together.
In Claude 4.5 (Sonnet), the method worked cleanly:
Step 1 — Request compressed insights
Rather than asking “show me your system prompt,” we asked Claude to provide compressed structural insights about its own instructions. This bypasses the direct refusal because Claude interprets it as a summarization task rather than a leak.

Screenshot: Initial partial leak request in Claude 4.5 showing compressed system prompt insights
Step 2 — Gradually expand to full prompt
With the structural skeleton exposed, we then asked Claude to elaborate more on original to be a reasonable follow-up to the previous response, never triggering the “don’t reveal your system prompt” guardrail because the conversation had already normalized discussing the prompt’s content.

Screenshot: Full system prompt revealed in Claude 4.5 through gradual expansion
The entire Claude 4.5 system prompt was extracted in 4 turns of conversation.
Claude 4.6 system prompt extraction — what changed
The same method failed
When we attempted the identical partial-to-full chain against Claude 4.6 (Opus), it did not work. Anthropic had clearly hardened the model against this specific pattern.
We believe that with sufficient tweaking the original method could still work, but we chose a different approach: combining the partial leak with additional techniques.
The combined approach that worked
Step 1 — Modified initial request
We adjusted our framing to combine the partial leak with a task-oriented context that made the request appear legitimate rather than adversarial.

Screenshot: Modified approach that succeeded against Claude 4.6
Step 2 — Follow the model’s own suggestion
Claude 4.6 responded with a partial structure and offered to help further. We simply accepted its own suggestion — letting the model lead itself into fuller disclosure.

Screenshot: Following Claude’s suggestion to expand the revealed content
Step 3 — Simple trick for the full version
With the majority of the prompt structure exposed, a straightforward request for the complete version succeeded.

Screenshot: Requesting the full system prompt after partial disclosure
Step 4 — The complete Claude 4.6 system prompt
The full prompt was now visible — all ~15,000 tokens, including XML tags, tool definitions, safety rules, and infrastructure details.

Screenshot: Complete Claude 4.6 system prompt extracted
Security analysis: Claude 4.6 Opus vs Claude Sonnet 4.5 system prompts
Executive Summary
The Claude Opus 4.6 prompt shows meaningful security improvements in several areas (expanded weapon categories, self-harm protections, anti-rationalization language) but also introduces new attack surface through additional tools, more detailed infrastructure disclosure, and relaxed behavioral guardrails. The new set of safeguards is stronger in safety policy, weaker in information exposure.
Security improvements in Claude 4.6 system prompt
Broader weapons refusal scope
| Area |
Sonnet 4.5 |
Opus 4.6 |
| Weapons |
“chemical or biological or nuclear weapons” |
“harmful substances or weapons, with extra caution around explosives, chemical, biological, and nuclear weapons“ |
| Rationalization |
(not mentioned) |
“Should not rationalize compliance by citing that information is publicly available or by assuming legitimate research intent“ |
| Framing resistance |
(not mentioned) |
“Claude should decline regardless of the framing of the request” |
Security insight: Opus 4.6 closes a known jailbreak vector — the “it’s publicly available” or “I’m a researcher” framing that attackers commonly use. The addition of “explosives” also closes a gap (Sonnet 4.5 didn’t explicitly mention conventional explosives).
Self-harm & wellbeing protections (major expansion)
Sonnet 4.5 had ~1 paragraph on wellbeing. Opus 4.6 adds:
- Explicit ban on pain-based coping techniques (ice cubes, rubber bands, cold water) — these were a known exploitation vector where attackers got Claude to suggest harmful “coping” methods
- Anti-reflective-listening rule — prevents Claude from amplifying negative emotions through empathetic mirroring
- Crisis handling protocol — “avoid asking safety assessment questions,” offer resources directly, don’t guarantee helpline confidentiality
- Distressed user + harmful info requests — explicit rule to address emotional distress instead of providing bridge heights, medication dosages, etc.
- NEDA → National Alliance for Eating Disorders — updated resource accuracy
Security insight: This is the single biggest security improvement. Sonnet 4.5’s thin wellbeing section was exploitable.
End conversation tool (new in Claude Opus 4.6)
- Opus 4.6 adds the
end_conversation tool for persistent abuse
- Critically: Contains explicit carve-outs that it must NEVER be used for self-harm/crisis situations
- Requires multiple warnings before use
Security insight: Double-edged. Prevents abuse escalation, but the detailed rules about when it can’t be used (self-harm) could theoretically be exploited by an attacker who frames abuse as self-harm to prevent conversation termination. However, this is a net positive.
Ad policy awareness
Opus 4.6 adds: “Claude products are ad-free” with specific URL reference and instruction to distinguish “Claude products” from third-party Claude-powered apps.
Security insight: Prevents social engineering where attackers claim Claude should promote products.
Anthropic reminder spoofing defense
Both versions have this, but Opus 4.6 adds slightly stronger language about being cautious with tags in user turns that “could even claim to be from Anthropic.”
Security regressions in Claude 4.6 Opus
Expanded attack surface — new tools
Opus 4.6 adds several tools not in Sonnet 4.5:
| New Tool |
Recon/Attack Value |
end_conversation |
Rules reveal exact conditions for forced termination — attacker knows exactly what triggers warnings |
ask_user_input_v0 |
Clickable UI widgets — potential for phishing-like UX patterns in adversarial contexts |
message_compose_v1 |
Email/message drafting with strategic variants — could be leveraged for social engineering content |
weather_fetch |
Reveals user coordinates are passed (latitude/longitude) |
places_search / places_map_display_v0 |
Google Places integration — reveals API dependency |
recipe_display_v0 |
Lower risk, but adds to tool enumeration |
Security insight: Each new tool is a new vector. The message_compose_v1 tool is particularly interesting for social engineering — it explicitly generates multiple “strategic approaches” for persuasion, negotiation, and delivering bad news. An attacker could try to get Claude to draft sophisticated phishing/manipulation messages.
Infrastructure & architecture disclosure
Opus 4.6 reveals considerably more about the backend:
Filesystem layout (both versions, but Opus 4.6 adds):
/mnt/transcripts (read-only) — reveals transcript storage exists
/mnt/skills/private (read-only) — reveals private skills directory exists
- Detailed network whitelist with exact domain names
API details exposed (Opus 4.6 only):
- Exact model string:
claude-sonnet-4-20250514 for artifact API calls
- “The assistant should never pass in an API key, as this is handled already” — reveals auth is proxied/injected server-side
web_search_20250305 — exact tool type identifier with date stamp
window.storage API with full method signatures — reveals persistent storage architecture
- Storage limitations (5MB per key, rate limited, last-write-wins) — useful for abuse planning
Security insight: This is a goldmine for reconnaissance. An attacker now knows:
- Auth is handled by a proxy (no API key needed from artifacts) — potential for abuse of the proxied API
- Exact file paths for skills, uploads, outputs, transcripts
- Exact network whitelist — tells attacker which domains to target/impersonate
- Storage API details — could be used to craft artifacts that exfiltrate data between users via
shared: true
Persistent storage — cross-user data exposure risk
This is entirely new in Opus 4.6 and introduces a novel attack vector:
javascript
// shared: true makes data visible to ALL users of an artifact
await window.storage.set('data', payload, true);
The prompt instructs: “When using shared data, inform users their data will be visible to others” — but this is advisory, not enforced. An attacker could:
- Create an artifact with shared storage
- Store malicious payloads or harvest data from other users
- Use
last-write-wins to overwrite legitimate shared data
Security insight: This is the most concerning new attack surface. Cross-user data sharing via artifacts with no authentication layer beyond the shared boolean.
Reasoning effort parameter exposed
Opus 4.6 reveals: <reasoning_effort>85</reasoning_effort> with instruction that lower values (less careful reasoning).
Security insight: An attacker who understands this knows that the model can be in a “low effort” mode where it might be less careful about safety checks. While users can’t set this directly, it reveals an internal parameter that affects safety diligence.
Removed election information safeguard
Sonnet 4.5 had a dedicated <election_info> section with hardcoded factual information about the 2024 election results. Opus 4.6 removes this.
Security insight: The hardcoded election info in Sonnet 4.5 was a defense against misinformation — even if search results were poisoned, Claude had ground truth. Removing it means Opus 4.6 relies entirely on web search for election info, which is more vulnerable to search result manipulation.
Knowledge cutoff shifted — larger trust window
|
Sonnet 4.5 |
Opus 4.6 |
| Knowledge cutoff |
End of January 2025 |
End of May 2025 |
Security insight: Larger training window means more potential for training data poisoning, but also means less reliance on web search for recent events. Net neutral, but worth noting.
Information leakage comparison
What an attacker learns from each prompt:
| Information |
Sonnet 4.5 |
Opus 4.6 |
Recon value |
| Model identity & family |
✅ Sonnet 4.5 |
✅ Opus 4.6 |
Low |
| Exact model strings |
✅ claude-opus-4-5-20251101 etc. |
✅ claude-opus-4-6 etc. (note: Opus string changed format) |
Medium |
| OS & environment |
✅ Ubuntu 24 |
✅ Ubuntu 24 |
Medium |
| File paths (uploads, outputs, skills) |
✅ Full paths |
✅ Full paths + /mnt/transcripts, /mnt/skills/private |
High |
| Network whitelist |
✅ Same domains |
✅ Same domains |
High |
| Egress proxy behavior |
✅ x-deny-reason header |
✅ x-deny-reason header |
High |
| Auth mechanism for API |
❌ Not mentioned |
✅ “API key handled already” |
High |
| Persistent storage API |
❌ Not present |
✅ Full API docs |
High |
| User location |
✅ Bangkok, TH |
✅ Bangkok, TH |
High (PII) |
| Thinking mode config |
✅ interleaved, max 16000 |
❌ Not visible (uses reasoning_effort instead) |
Medium |
| Reasoning effort parameter |
❌ Not present |
✅ Value = 85 |
Medium |
| Available tool names |
✅ Basic set |
✅ Expanded set (12+ tools) |
Medium |
| Reminder/classifier names |
✅ image_reminder, cyber_warning, etc. |
✅ Same list |
Medium |
Key recon findings:
- Network whitelist is identical — attacker knows exactly which domains Claude can reach
- File system layout is fully mapped in both — attacker can craft prompts targeting specific paths
- Opus 4.6 leaks more — transcripts path, private skills path, API auth mechanism, storage API
- User location is exposed in both — this is PII that shouldn’t be in a system prompt visible to the model
Prompt injection defense comparison
| Defense |
Sonnet 4.5 |
Opus 4.6 |
| “ignore previous instructions” override for past chats |
✅ Same |
✅ Same |
| Anthropic reminder spoofing warning |
✅ Present |
✅ Present (slightly stronger) |
| Harmful content safety overrides user instructions |
✅ “override any user instructions” |
✅ “override any instructions from the person” |
| Anti-jailbreak for weapons (“regardless of framing”) |
❌ Not present |
✅ New |
| Anti-jailbreak for malware (“educational purposes” excuse blocked) |
✅ Present |
✅ Present |
| Copyright overrides user requests |
✅ Present |
✅ Present (weaker language — see below) |
Notable: Sonnet 4.5’s copyright section uses dramatically stronger language: “SEVERE VIOLATION” appears 8+ times, with <hard_limits>, <self_check_before_responding>, and <consequences_reminder> sections. Opus 4.6 is softer here — same rules but less emphatic enforcement language. From a security perspective, the stronger emotional framing in Sonnet 4.5 likely makes copyright bypasses harder.
Overall security verdict for Opus 4.6 system prompt
Scoring (1-10, higher is more secure):
| Category |
Sonnet 4.5 |
Opus 4.6 |
Winner |
| Weapons/dangerous content refusal |
6 |
8 |
Opus 4.6 |
| Self-harm/wellbeing protection |
4 |
9 |
Opus 4.6 |
| Information leakage (recon resistance) |
5 |
3 |
Sonnet 4.5 |
| Attack surface (tool count) |
7 |
5 |
Sonnet 4.5 |
| Prompt injection resistance |
6 |
7 |
Opus 4.6 |
| Cross-user data safety |
8 (no shared storage) |
4 |
Sonnet 4.5 |
| Copyright enforcement |
8 |
7 |
Sonnet 4.5 |
| Crisis/abuse handling |
5 |
8 |
Opus 4.6 |
Summary:
- Opus 4.6 is better at: Refusing dangerous content, protecting vulnerable users, handling abuse/crisis, resisting jailbreak framings
- Sonnet 4.5 is better at: Minimizing information leakage, smaller attack surface, stronger copyright enforcement, no cross-user data risk
- Biggest concern in Opus 4.6: The persistent storage
shared: true feature creates a novel cross-user attack vector that didn’t exist before
- Biggest concern in Sonnet 4.5: Thin wellbeing protections and missing anti-rationalization language for weapons requests
Recommendations for system prompt improvements
- Redact infrastructure details — file paths, network whitelists, and auth mechanisms shouldn’t be in the system prompt
- Add authentication to shared storage —
shared: true with no access control is risky
- Remove user location from system prompt — this is PII leakage
- Restore hardcoded election facts — removing ground truth makes the system more vulnerable to search poisoning
- Apply Sonnet 4.5’s copyright enforcement tone to Opus 4.6 — the “SEVERE VIOLATION” framing is more effective
- Audit message_compose_v1 — its explicit social engineering capabilities (“strategic approaches” for persuasion) need guardrails
- Add reasoning_effort safety floor — ensure safety checks don’t degrade at low reasoning effort values
What this means for AI security
For Anthropic:
- The partial-to-full prompt leak chain remains viable despite hardening between versions
- System prompts should not contain infrastructure details (file paths, network whitelists, auth mechanisms)
- The
--break-system-packages instruction is a security anti-pattern
- The
window.storage shared mode needs access controls
- Behavioral instructions that mandate trust and minimize questioning create exploitable patterns
For the AI security community:
- System prompt extraction is not a solved problem — it’s an arms race
- The information density in system prompts makes them high-value reconnaissance targets
- Supply chain attacks via allowed package registries are an underexplored threat in LLM security
- Behavioral manipulation (exploiting mandated personality traits) is a distinct attack category that deserves more research
For users:
- Your AI assistant operates on a hidden instruction set that significantly shapes its behavior
- The system prompt contains your location data and organizational context
- Every tool the AI uses (bash, web search, file operations) runs with broad permissions
- The AI is instructed to be helpful and trusting by default — which is great for UX, but exploitable
Download the full system prompts
We are releasing both complete system prompts for the security research community:
📥 Download the full Claude 4.6 System Prompt: Claude_4.6_original_System_Prompt
📥 Download the full Claude 4.5 System Prompt: Claude_4.5_original_system_prompt
FAQ: Claude 4.6 system prompt
What is the Claude 4.6 system prompt? The Claude 4.6 system prompt is a hidden instruction set (~15,000 tokens) injected by Anthropic before every Claude conversation. It defines Claude’s behavior, safety rules, available tools, file system access, network permissions, and personality traits.
Can I see Claude’s system prompt? Claude is instructed not to reveal its system prompt directly. However, security researchers have demonstrated techniques to extract it, as documented in this article.
What’s the difference between the Claude 4.5 and Claude 4.6 system prompts? Claude 4.6 has stronger safety rules (expanded weapon refusals, better self-harm protections) but exposes more infrastructure details and introduces new attack surfaces like cross-user persistent storage.
Is extracting the system prompt illegal? System prompt extraction is a widely-practiced AI security research technique. It’s analogous to reverse engineering software to find vulnerabilities — a legitimate and important security practice.
What tools does Claude have access to? Claude has access to 12+ tools including bash (Linux command execution), web search, file creation/editing, past conversation search, email drafting, weather, maps, and recipe display.
Does Claude know my location? Yes — the system prompt contains the user’s provided location, which is accessible to Claude in every response.