Codex CLI Context Management Research¶
Last Updated: 2026-03-19
Sources:
- openai/codex (open source, Rust, added as submodule)
- Unrolling the Codex agent loop (OpenAI blog)
- Codex Prompting Guide
- Codex CLI features
- Simon Willison reverse engineering
- Existing repo research: agent-cli/codex-session-files.md
Research focus: How Codex CLI assembles and manages context within a conversation.
Architecture Overview¶
Codex CLI is open source (unlike Claude Code), written in Rust (codex-rs/). The core context management code is at codex-rs/core/src/:
| Component | Path | Role |
|---|---|---|
| Context Manager | context_manager/history.rs |
Manages conversation history vector, token estimation |
| Context Normalization | context_manager/normalize.rs |
Ensures tool call/result pairing, image handling |
| Compaction (local) | compact.rs |
Client-side LLM-based compaction |
| Compaction (remote) | compact_remote.rs |
Server-side compaction via Responses API /responses/compact |
| System prompt | prompt.md (template) |
Base instructions for the agent |
| Compaction prompt | templates/compact/prompt.md |
Compaction summarization instructions |
| Custom prompts | custom_prompts.rs |
AGENTS.md / prompt file loading |
| Truncation | truncate.rs |
Tool output truncation policies |
| Turn management | state/turn.rs |
Turn lifecycle |
| Message history | message_history.rs |
Historical message storage |
Context Accumulation Model¶
Like all other studied agents, Codex uses a single history vector (ContextManager.items: Vec<ResponseItem>) that accumulates indefinitely:
// context_manager/history.rs
pub(crate) struct ContextManager {
items: Vec<ResponseItem>, // Oldest → newest
token_info: Option<TokenUsageInfo>,
reference_context_item: Option<TurnContextItem>, // Baseline for settings diffing
}
Items are appended via record_items() with per-item truncation applied. Before sending to the model, for_prompt() normalizes and filters:
pub(crate) fn for_prompt(mut self, input_modalities: &[InputModality]) -> Vec<ResponseItem> {
self.normalize_history(input_modalities);
self.items.retain(|item| !matches!(item, ResponseItem::GhostSnapshot { .. }));
self.items
}
Dual Compaction Strategy¶
Codex has two compaction paths depending on the provider:
pub(crate) fn should_use_remote_compact_task(provider: &ModelProviderInfo) -> bool {
provider.is_openai()
}
Path 1: Remote Compaction (OpenAI providers)¶
Uses the Responses API /responses/compact endpoint:
- Server-side compaction — the API returns a type=compaction item with encrypted_content
- The encrypted content preserves the model's latent understanding (opaque to client)
- Can trigger mid-stream: if context crosses threshold during inference, server compacts and continues
- Trims function call history to fit context window before sending to API
- Returns replacement history that the client swaps in
Path 2: Local/Inline Compaction (non-OpenAI providers)¶
Client-side LLM summarization, similar to Pi:
- Uses a dedicated compaction prompt (templates/compact/prompt.md)
- Prompt: "You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff summary..."
- Summary structure: Progress/decisions, constraints/preferences, remaining work, critical data
- Summary injected as a user message with SUMMARY_PREFIX
- Collects all user messages from history and preserves them alongside the summary
Compaction Trigger¶
auto_compact_token_limit— per-model configurable thresholdeffective_context_window_percent— default 95% of context window- When context overflow occurs during compaction itself, falls back to removing oldest items one by one
Post-Compaction History Structure¶
[summary message: SUMMARY_PREFIX + LLM summary] ← Compacted summary
[preserved user messages] ← All user messages retained
[initial context injection (mid-turn only)] ← Re-injected context settings
[ghost snapshots] ← Preserved across compaction
Key detail: InitialContextInjection enum controls whether context is re-injected:
- DoNotInject — Pre-turn/manual compaction; next turn will re-inject naturally
- BeforeLastUserMessage — Mid-turn compaction; must inject because model expects it
Tool Output Truncation¶
Codex has a per-model truncation policy applied at record time:
// From model_info.rs - default fallback
truncation_policy: TruncationPolicyConfig::bytes(/*limit*/ 10_000),
- Tool outputs are truncated when they exceed the model's configured limit
- Two modes:
Bytes(byte limit) orTokens(token limit) - Applied per-item as items are recorded into the context manager
- This is pre-compaction — large tool outputs never fully enter the context
System Prompt¶
The base system prompt (prompt.md) is a single comprehensive file covering:
- Identity — "You are a coding agent running in the Codex CLI"
- Capabilities — Receive prompts, stream responses, emit function calls
- AGENTS.md spec — How instruction files are discovered and applied (scoped by directory)
- Responsiveness — Preamble message guidelines before tool calls
- Planning —
update_plantool usage with quality examples - Task execution — Autonomy, validation,
apply_patchusage - Ambition vs precision — New tasks → creative; existing code → surgical
- Progress updates — Concise status messages for long tasks
- Final answer formatting — Detailed style guidelines (headers, bullets, monospace, tone)
- Tool guidelines — Shell commands (prefer
rg),update_plan
Instruction File Loading¶
Codex enumerates instruction files from multiple locations:
- ~/.codex/ (global)
- Each directory from repo root to CWD
- Optional fallback names, size cap
- Merged in order: later directories override earlier ones
- Priority: system > developer > user > assistant
Sub-Agent Architecture¶
Codex does not appear to have a built-in sub-agent spawning mechanism in the same sense as Claude Code's Agent tool or OpenClaw's sessions_spawn. The architecture relies on:
- Single agent loop — One
ContextManagerper session - Guardian/review sessions — Separate review process for security, but using the same model infrastructure
- No explicit sub-agent tool in the base tool set
This is the simplest agent model among all studied — a single flat loop with no delegation.
Comparison: All Five Agents¶
| Aspect | Pi | OpenClaw | Gemini CLI | Claude Code | Codex |
|---|---|---|---|---|---|
| Language | TypeScript | TypeScript | TypeScript | TypeScript (Bun binary) | Rust |
| Open source | Yes | Yes | Yes | No | Yes |
| Context model | Accumulate all | Accumulate + pipeline | Accumulate all | Accumulate all | Accumulate + per-item truncation |
| Compaction | Client LLM (1 call) | Client (inherited Pi) | Client LLM (2 pass) | Server-side API | Dual: server (OpenAI) / client (others) |
| Remote compaction | No | No | No | Yes (API compact-2026-01-12) |
Yes (Responses API /responses/compact) |
| Encrypted state | No | No | No | No | Yes (opaque encrypted_content) |
| Tool output truncation | None (full in context) | None | Pre-summarization + budget | None (file ref on compact) | Per-item truncation at record time |
| Compaction prompt | 6 sections | Same (inherited) | <state_snapshot> |
9 sections | 4 sections (minimal) |
| Mid-stream compaction | No | No | No | No | Yes (server triggers mid-inference) |
| Sub-agent | Extension (process) | Built-in (gateway) | Built-in (in-process) | Built-in (6+ types) | None (single loop) |
| System prompt | ~300 words | 15+ sections | Toggleable sections | 65+ files | Single comprehensive file |
| Context awareness | None | None | None | Model-level budget | None |
Unique Design Choices¶
-
Dual compaction path: OpenAI providers use server-side compaction with encrypted state preservation; non-OpenAI providers fall back to client-side LLM summarization. Best of both worlds for a multi-provider agent.
-
Encrypted compaction content: The server returns
encrypted_contentthat preserves the model's latent understanding — the client treats it as opaque. This is unique among all studied agents and potentially preserves more semantic fidelity than text summaries. -
Mid-stream compaction: The Responses API can trigger compaction mid-inference when context crosses the threshold, emitting a compaction item in the same stream. No other agent supports this.
-
Per-item truncation at record time: Unlike Gemini CLI (which truncates at compression time) or others (which don't truncate at all), Codex truncates each tool output as it enters the context. This prevents context from ever growing uncontrollably.
-
Minimal compaction prompt: Only 4 bullet points ("progress/decisions, constraints, remaining work, critical data"). Far simpler than Claude Code's 9 sections. Trades detail for speed.
-
No sub-agents: The simplest agent model — a single flat loop. Complex tasks are handled through planning (
update_plan) rather than delegation. This keeps the context management simple but limits parallelism. -
Rust implementation: Only Rust-based agent among the five studied. This enables fine-grained memory control and the per-item truncation strategy, which would be harder to implement efficiently in JavaScript.
-
AGENTS.md scoping: Directory-scoped instruction files with merge semantics and explicit priority ordering (system > developer > user > assistant). More structured than Claude Code's CLAUDE.md approach.