Claude Code Context Management Research¶

Last Updated: 2026-03-19

Sources: - Piebald-AI/claude-code-system-prompts (v2.1.79, added as submodule) - Anthropic Compaction API docs - Anthropic Context Windows docs - Anthropic: Effective Context Engineering - Existing repo research: agent-cli/claude-session-files.md

Research focus: How Claude Code assembles and manages context within a conversation.

Architecture Overview¶

Claude Code is distributed as a Bun-compiled native binary (~183MB Mach-O arm64). Source code is not public, but the community has extensively reverse-engineered it. The system prompt alone is 65+ files covering different aspects, plus 20+ system-reminder templates injected contextually.

Key context management components (from extracted prompts):

Component	Files	Role
Compaction prompts	3 variants of analysis instructions + summary prompt	Generate context summaries when approaching limits
Conversation summarization	`agent-prompt-conversation-summarization.md`	Full 9-section summary for /compact
Recent message summarization	`agent-prompt-recent-message-summarization.md`	Summarize only recent portion (incremental)
System reminders	20+ `system-reminder-*.md` files	Dynamic context injected per-situation
Sub-agent prompts	Explore, Plan, Worker Fork, etc.	Isolated context for delegated tasks
Memory instructions	`system-prompt-agent-memory-instructions.md`	Cross-session memory management

Context Accumulation Model¶

Like Pi, Claude Code uses infinite context accumulation — all messages (user, assistant, tool calls, tool results) are appended to the conversation history. The full history is sent with each API call until compaction triggers.

Session files confirm this (agent-cli/claude-session-files.md): JSONL format at ~/.claude/projects/{path}/{sessionId}.jsonl, each line is a message/event entry.

Server-Side Compaction (API Level)¶

Claude Code leverages Anthropic's server-side compaction API (beta compact-2026-01-12):

How It Works¶

Enable via context_management.edits: [{ type: "compact_20260112" }] in API request
API detects when input tokens exceed the trigger threshold
Generates a compaction content block containing the summary
On subsequent requests, API automatically drops all message blocks prior to the compaction block

Configuration¶

{
  "context_management": {
    "edits": [{
      "type": "compact_20260112",
      "trigger_tokens": 100000,    // When to trigger (default: ~80% of context window)
      "preserve_system": true,     // Keep system prompt intact
      "custom_instructions": "..." // Focus areas for summary
    }]
  }
}

Key Difference from Pi/OpenClaw/Gemini CLI¶

Compaction happens server-side in the API, not client-side. The client just appends the response (which may contain a compaction block) and sends it back. This is fundamentally different from: - Pi: client-side LLM call for summarization - OpenClaw: delegates to Pi's client-side compaction - Gemini CLI: client-side two-pass compression

Client-Side Compaction Prompts (Claude Code Specific)¶

Claude Code has its own compaction prompt layer on top of the API, with three variants:

Variant 1: Full Conversation Compaction¶

Analyzes the entire conversation chronologically: - Identifies user requests, approach taken, key decisions - Captures file names, code snippets, function signatures, file edits - Records errors encountered and fixes - Emphasizes user feedback (especially corrections)

Variant 2: Recent Messages Only¶

Same structure but only analyzes messages after the last retained context. Used for incremental compaction where earlier context is already summarized.

Variant 3: Minimal (Experimental, Feature Flag)¶

Lean version where <analysis> is a brief planning scratchpad: - One or two lines per section - No code snippets or verbatim quotes in analysis - Detail goes directly into <summary> - Goal: coverage over detail in analysis phase

Summary Structure (9 Sections)¶

The conversation summarization prompt produces a structured summary with:

Primary Request and Intent — User's explicit requests and intents
Key Technical Concepts — Technologies, frameworks discussed
Files and Code Sections — Files examined/modified/created with full code snippets
Errors and Fixes — Errors encountered, how fixed, user feedback
Problem Solving — Solved problems and ongoing troubleshooting
All User Messages — Every non-tool-result user message (critical for intent tracking)
Pending Tasks — Explicitly requested outstanding work
Current Work — Precise description of work in progress at compaction time
Optional Next Step — Next action, with direct quotes to prevent task drift

SDK Compaction Prompt¶

A simpler variant for the Agent SDK, structured as: 1. Task Overview (request + success criteria + constraints) 2. Current State (completed work, files, artifacts) 3. Important Discoveries (constraints, decisions, errors, failed approaches) 4. Next Steps (specific actions, blockers, priority order) 5. Context to Preserve (preferences, domain details, promises)

Output wrapped in <summary> tags.

Context Awareness (Model-Level Feature)¶

Claude 4.5+ models have built-in context awareness — the model knows its remaining token budget:

<!-- At conversation start -->
<budget:token_budget>1000000</budget:token_budget>

<!-- After each tool call -->
<system_warning>Token usage: 35000/1000000; 965000 remaining</system_warning>

This enables the model to self-manage context usage, persist on tasks until completion, and make informed decisions about when to compact.

Thinking Block Management¶

Extended thinking tokens have special handling: - Thinking blocks are automatically stripped from context on subsequent turns - During tool use cycles, thinking blocks must be preserved until the tool cycle completes - Effective context: context_window = (input_tokens - previous_thinking_tokens) + current_turn_tokens - Cryptographic signatures verify thinking block authenticity

This is a significant context optimization — thinking can be substantial but doesn't accumulate.

System Prompt Structure¶

Claude Code's system prompt is assembled from 65+ modular files:

Core Sections¶

Identity ("Claude Code, Anthropic's official CLI")
Doing tasks (10+ sub-sections: security, read-before-modify, minimize files, no over-engineering, etc.)
Executing actions with care (reversibility, blast radius assessment)
Tool usage guidelines (prefer dedicated tools over Bash)
Git operations (commit workflow, PR creation)
Output efficiency ("go straight to the point")

Dynamic Injections (System Reminders)¶

Contextually injected based on runtime events: - system-reminder-file-modified-by-user-or-linter.md — When external file changes detected - system-reminder-compact-file-reference.md — Reference to file read before compaction - system-reminder-hook-additional-context.md — Hook-provided context - system-reminder-invoked-skills.md — When a skill is activated - system-reminder-file-truncated.md — When file content exceeds limits

Approximate Token Budget¶

Base system prompt: ~3,000 tokens
Tool definitions: ~5,000 tokens
Context buffer (reserved): ~33,000 tokens (reduced from 45,000 in earlier versions)
Usable context: ~960,000 tokens (out of 1M)

Sub-Agent Architecture¶

Claude Code has multiple sub-agent types with isolated contexts:

Agent Types (from extracted prompts)¶

Agent	Purpose	Context Model
Explore	Read-only codebase search	Fresh context, read-only tools, fast
Plan	Architecture planning	Fresh context, read-only tools
Worker Fork	Execute directive directly	Inherits full parent context, no sub-spawning
Code Reviewer	Review code changes	Fresh context with diff
Code Explorer	Deep feature analysis	Fresh context
Code Architect	Design feature architecture	Fresh context

Worker Fork (Unique Design)¶

The "fork" agent type is notable — it inherits full conversation context from the parent: - model: 'inherit' — Uses parent's model - permissionMode: 'bubble' — Permissions bubble up to parent - maxTurns: 200 - tools: [*] — Full tool access - Rule: "You ARE the fork. Do NOT spawn sub-agents; execute directly." - Must commit changes and report with commit hash - Report under 500 words, structured format: Scope → Result → Key files → Files changed → Issues

Other agents get fresh, isolated contexts with task-specific system prompts.

Comparison: Claude Code vs Pi vs OpenClaw vs Gemini CLI¶

Aspect	Pi	OpenClaw	Gemini CLI	Claude Code
Compaction location	Client-side	Client-side (Pi inherited)	Client-side	Server-side API
Compaction trigger	contextWindow - reserve	Same	50% of limit	~80% of limit (configurable)
Summary structure	6 sections (Goal/Progress/Decisions/...)	Same (inherited)	`<state_snapshot>`	9 sections (most detailed)
Verification	None	None	2-pass (generate + probe)	None (but 3 analysis variants)
Tool output handling	Full in context	Full in context	Pre-summarization + budget	Full in context + file reference on compact
Context awareness	None	None	None	Model-level (`<budget:token_budget>`)
Thinking block mgmt	N/A	N/A	N/A	Auto-stripped between turns
System prompt size	~300 words	15+ sections	Section-based, toggleable	65+ modular files, ~8K tokens
Sub-agent types	1 (extension)	1 (gateway RPC)	Multiple (in-process)	6+ types with different context models
Context inheritance	None	None	None	Worker fork inherits full context
System reminders	None	None	None	20+ dynamic injection templates
Pre-send processing	None	Multi-stage pipeline	None	Minimal (API handles most)

Unique Design Choices¶

Server-side compaction: Claude Code offloads compaction to the API, simplifying client logic. The client just sends messages and handles the compaction block in responses. This is architecturally the simplest client-side implementation among all studied agents.
Three compaction analysis variants: Full, recent-only, and minimal. Allows trading quality for speed. The minimal variant is behind a feature flag, suggesting active experimentation.
9-section summary format: The most detailed summary structure of all agents studied. Notably includes "All user messages" (verbatim non-tool messages) and "Current Work" with direct quotes to prevent task drift.
Context awareness at model level: The model itself knows remaining token budget via <budget:token_budget> and <system_warning> tags. No other studied agent has this — it's a model-level feature unique to Claude 4.5+.
Worker fork with context inheritance: Unlike all other sub-agent models (which use isolated contexts), the fork agent inherits the parent's full context. This enables continuation of complex tasks without context loss.
Dynamic system reminders: 20+ templates injected contextually (file changes, hook output, skill activation). This is more granular than any other agent's dynamic context injection.
Thinking block auto-stripping: Extended thinking tokens are automatically removed from subsequent turns, preventing context pollution from reasoning traces. This is a significant optimization unique to Claude's architecture.