Codex CLI Context Management Research¶

Last Updated: 2026-03-19

Sources: - openai/codex (open source, Rust, added as submodule) - Unrolling the Codex agent loop (OpenAI blog) - Codex Prompting Guide - Codex CLI features - Simon Willison reverse engineering - Existing repo research: agent-cli/codex-session-files.md

Research focus: How Codex CLI assembles and manages context within a conversation.

Architecture Overview¶

Codex CLI is open source (unlike Claude Code), written in Rust (codex-rs/). The core context management code is at codex-rs/core/src/:

Component	Path	Role
Context Manager	`context_manager/history.rs`	Manages conversation history vector, token estimation
Context Normalization	`context_manager/normalize.rs`	Ensures tool call/result pairing, image handling
Compaction (local)	`compact.rs`	Client-side LLM-based compaction
Compaction (remote)	`compact_remote.rs`	Server-side compaction via Responses API `/responses/compact`
System prompt	`prompt.md` (template)	Base instructions for the agent
Compaction prompt	`templates/compact/prompt.md`	Compaction summarization instructions
Custom prompts	`custom_prompts.rs`	AGENTS.md / prompt file loading
Truncation	`truncate.rs`	Tool output truncation policies
Turn management	`state/turn.rs`	Turn lifecycle
Message history	`message_history.rs`	Historical message storage

Context Accumulation Model¶

Like all other studied agents, Codex uses a single history vector (ContextManager.items: Vec<ResponseItem>) that accumulates indefinitely:

// context_manager/history.rs
pub(crate) struct ContextManager {
    items: Vec<ResponseItem>,          // Oldest → newest
    token_info: Option<TokenUsageInfo>,
    reference_context_item: Option<TurnContextItem>,  // Baseline for settings diffing
}

Items are appended via record_items() with per-item truncation applied. Before sending to the model, for_prompt() normalizes and filters:

pub(crate) fn for_prompt(mut self, input_modalities: &[InputModality]) -> Vec<ResponseItem> {
    self.normalize_history(input_modalities);
    self.items.retain(|item| !matches!(item, ResponseItem::GhostSnapshot { .. }));
    self.items
}

Dual Compaction Strategy¶

Codex has two compaction paths depending on the provider:

pub(crate) fn should_use_remote_compact_task(provider: &ModelProviderInfo) -> bool {
    provider.is_openai()
}

Path 1: Remote Compaction (OpenAI providers)¶

Uses the Responses API /responses/compact endpoint: - Server-side compaction — the API returns a type=compaction item with encrypted_content - The encrypted content preserves the model's latent understanding (opaque to client) - Can trigger mid-stream: if context crosses threshold during inference, server compacts and continues - Trims function call history to fit context window before sending to API - Returns replacement history that the client swaps in

Path 2: Local/Inline Compaction (non-OpenAI providers)¶

Client-side LLM summarization, similar to Pi: - Uses a dedicated compaction prompt (templates/compact/prompt.md) - Prompt: "You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff summary..." - Summary structure: Progress/decisions, constraints/preferences, remaining work, critical data - Summary injected as a user message with SUMMARY_PREFIX - Collects all user messages from history and preserves them alongside the summary

Compaction Trigger¶

auto_compact_token_limit — per-model configurable threshold
effective_context_window_percent — default 95% of context window
When context overflow occurs during compaction itself, falls back to removing oldest items one by one

Post-Compaction History Structure¶

[summary message: SUMMARY_PREFIX + LLM summary]    ← Compacted summary
[preserved user messages]                           ← All user messages retained
[initial context injection (mid-turn only)]         ← Re-injected context settings
[ghost snapshots]                                   ← Preserved across compaction

Key detail: InitialContextInjection enum controls whether context is re-injected: - DoNotInject — Pre-turn/manual compaction; next turn will re-inject naturally - BeforeLastUserMessage — Mid-turn compaction; must inject because model expects it

Tool Output Truncation¶

Codex has a per-model truncation policy applied at record time:

// From model_info.rs - default fallback
truncation_policy: TruncationPolicyConfig::bytes(/*limit*/ 10_000),

Tool outputs are truncated when they exceed the model's configured limit
Two modes: Bytes (byte limit) or Tokens (token limit)
Applied per-item as items are recorded into the context manager
This is pre-compaction — large tool outputs never fully enter the context

System Prompt¶

The base system prompt (prompt.md) is a single comprehensive file covering:

Identity — "You are a coding agent running in the Codex CLI"
Capabilities — Receive prompts, stream responses, emit function calls
AGENTS.md spec — How instruction files are discovered and applied (scoped by directory)
Responsiveness — Preamble message guidelines before tool calls
Planning — update_plan tool usage with quality examples
Task execution — Autonomy, validation, apply_patch usage
Ambition vs precision — New tasks → creative; existing code → surgical
Progress updates — Concise status messages for long tasks
Final answer formatting — Detailed style guidelines (headers, bullets, monospace, tone)
Tool guidelines — Shell commands (prefer rg), update_plan

Instruction File Loading¶

Codex enumerates instruction files from multiple locations: - ~/.codex/ (global) - Each directory from repo root to CWD - Optional fallback names, size cap - Merged in order: later directories override earlier ones - Priority: system > developer > user > assistant

Sub-Agent Architecture¶

Codex does not appear to have a built-in sub-agent spawning mechanism in the same sense as Claude Code's Agent tool or OpenClaw's sessions_spawn. The architecture relies on:

Single agent loop — One ContextManager per session
Guardian/review sessions — Separate review process for security, but using the same model infrastructure
No explicit sub-agent tool in the base tool set

This is the simplest agent model among all studied — a single flat loop with no delegation.

Comparison: All Five Agents¶

Aspect	Pi	OpenClaw	Gemini CLI	Claude Code	Codex
Language	TypeScript	TypeScript	TypeScript	TypeScript (Bun binary)	Rust
Open source	Yes	Yes	Yes	No	Yes
Context model	Accumulate all	Accumulate + pipeline	Accumulate all	Accumulate all	Accumulate + per-item truncation
Compaction	Client LLM (1 call)	Client (inherited Pi)	Client LLM (2 pass)	Server-side API	Dual: server (OpenAI) / client (others)
Remote compaction	No	No	No	Yes (API `compact-2026-01-12`)	Yes (Responses API `/responses/compact`)
Encrypted state	No	No	No	No	Yes (opaque `encrypted_content`)
Tool output truncation	None (full in context)	None	Pre-summarization + budget	None (file ref on compact)	Per-item truncation at record time
Compaction prompt	6 sections	Same (inherited)	`<state_snapshot>`	9 sections	4 sections (minimal)
Mid-stream compaction	No	No	No	No	Yes (server triggers mid-inference)
Sub-agent	Extension (process)	Built-in (gateway)	Built-in (in-process)	Built-in (6+ types)	None (single loop)
System prompt	~300 words	15+ sections	Toggleable sections	65+ files	Single comprehensive file
Context awareness	None	None	None	Model-level budget	None

Unique Design Choices¶

Dual compaction path: OpenAI providers use server-side compaction with encrypted state preservation; non-OpenAI providers fall back to client-side LLM summarization. Best of both worlds for a multi-provider agent.
Encrypted compaction content: The server returns encrypted_content that preserves the model's latent understanding — the client treats it as opaque. This is unique among all studied agents and potentially preserves more semantic fidelity than text summaries.
Mid-stream compaction: The Responses API can trigger compaction mid-inference when context crosses the threshold, emitting a compaction item in the same stream. No other agent supports this.
Per-item truncation at record time: Unlike Gemini CLI (which truncates at compression time) or others (which don't truncate at all), Codex truncates each tool output as it enters the context. This prevents context from ever growing uncontrollably.
Minimal compaction prompt: Only 4 bullet points ("progress/decisions, constraints, remaining work, critical data"). Far simpler than Claude Code's 9 sections. Trades detail for speed.
No sub-agents: The simplest agent model — a single flat loop. Complex tasks are handled through planning (update_plan) rather than delegation. This keeps the context management simple but limits parallelism.
Rust implementation: Only Rust-based agent among the five studied. This enables fine-grained memory control and the per-item truncation strategy, which would be harder to implement efficiently in JavaScript.
AGENTS.md scoping: Directory-scoped instruction files with merge semantics and explicit priority ordering (system > developer > user > assistant). More structured than Claude Code's CLAUDE.md approach.