LLM Memory Systems: A Reverse Engineering Study¶
Overview¶
This document summarizes findings from reverse-engineering how major AI systems implement memory - both consumer products (ChatGPT, Claude), developer tools (Claude Code, Codex CLI, Gemini CLI), and open source memory frameworks (mem0, Letta).
Part 1: Consumer AI Memory (ChatGPT vs Claude)¶
Based on reverse-engineering by Manthan Gupta.
ChatGPT Memory Architecture¶
Four-layer context structure: 1. Session Metadata - Device, location, usage patterns (ephemeral) 2. User Memory - Explicit facts stored long-term (33 facts in example) 3. Recent Conversations Summary - Lightweight digest of past chats (titles + snippets) 4. Current Session - Sliding window of full messages
Key insight: No vector database, no RAG. Pre-computed summaries injected directly.
Claude Memory Architecture¶
Tool-based selective retrieval: 1. System Prompt - Static instructions 2. User Memories - Long-term facts in XML format 3. Conversation History - Rolling window + on-demand tools 4. Current Message
Key tools:
- conversation_search - Semantic search past conversations
- recent_chats - Time-based retrieval
- memory_user_edits - Explicit memory management
Philosophy Comparison¶
| Aspect | ChatGPT | Claude |
|---|---|---|
| History Retrieval | Pre-computed summaries (always injected) | On-demand tools (selective) |
| Trade-off | Automatic continuity | On-demand depth |
| Efficiency | Fixed token cost | Variable (only when needed) |
Part 2: AI Coding CLI Memory¶
Reverse-engineered local storage of Claude Code, Codex (OpenAI), and Gemini CLI.
Storage Comparison¶
| Tool | Location | Format | Message Types |
|---|---|---|---|
| Claude Code | ~/.claude/projects/ |
JSONL | ~6 |
| Codex | ~/.codex/sessions/ |
JSONL | 10+ |
| Gemini | ~/.gemini/tmp/ |
JSON | 3 |
Context Compression Strategies¶
| Tool | Method | Visibility |
|---|---|---|
| Claude | Plaintext summary in same file | Fully readable |
| Codex | Encrypted JWT in same file | Opaque |
| Gemini | New file, server-side storage | Not stored locally |
Unique Strengths¶
Claude Code: Most comprehensive local data - Usage analytics, plugin system, planning mode, file history, IDE integration
Codex: Most detailed logging - Encrypted compression, ghost snapshots, rate limit tracking, agent reasoning
Gemini: Simplest architecture - Single JSON per session, structured thinking, easiest to parse
Design Philosophy¶
| Aspect | Claude | Codex | Gemini |
|---|---|---|---|
| Transparency | High | Low | None |
| Server Dependency | Low | High | High |
| Debuggability | Easy | Hard | Medium |
Part 3: Open Source Memory Frameworks¶
mem0¶
Core Innovation: Active memory management via LLM-driven fact extraction and conflict resolution.
Architecture:
- Two-Stage LLM Pipeline: Extract facts → Compare with existing → Decide ADD/UPDATE/DELETE
- Dual Storage: Vector store (semantic search) + Graph store (relationships) in parallel
- Session Scoping: user_id, agent_id, run_id for multi-tenant isolation
Key Differentiators from RAG: | Aspect | Traditional RAG | mem0 | |--------|----------------|------| | Storage Unit | Document chunks | Atomic facts | | Updates | Append-only | Active CRUD | | Deduplication | Vector distance | LLM reasoning |
Ecosystem: - 24+ vector stores (Qdrant, Pinecone, PGVector, FAISS, etc.) - 20+ LLMs (OpenAI, Anthropic, Gemini, Ollama, etc.) - Graph support via Neo4j, Memgraph, Neptune
Detailed analysis: mem0.research.md
Letta (MemGPT)¶
Core Innovation: LLM Operating System with self-editing memory and virtual context management.
Architecture:
- Three-Tier Memory: Core (always in-context) → Recall (searchable history) → Archival (vector storage)
- Self-Editing: Agent writes to its own prompt via tools (core_memory_replace, archival_memory_insert)
- Virtual Context: Summarization + paging creates illusion of infinite context window
Key Differentiators from RAG: | Aspect | Standard RAG | Letta | |--------|--------------|-------| | Retrieval | Passive (before LLM) | Active (LLM decides when) | | Memory | Read-only | Read/Write (self-editing) | | State | Stateless | Stateful (persistent identity) |
Agent Types:
- memgpt_agent - Original MemGPT implementation
- letta_v1_agent - Simplified, no forced tool calls
- sleeptime_agent - Background memory management
Detailed analysis: letta.research.md
Key Takeaways¶
-
No complex RAG in production: Both ChatGPT and Claude use simpler approaches than expected - pre-computed summaries or selective tool calls.
-
Transparency vs Privacy trade-off: Claude favors transparency (readable summaries), Codex/Gemini favor privacy (encrypted or server-side).
-
Tool-based memory is flexible: Claude's approach of using tools (
conversation_search) for retrieval allows dynamic context loading without fixed overhead. -
Local storage varies widely: From Gemini's minimal 3-type JSON to Codex's 10+ event types, complexity reflects different priorities.
-
Memory management is mostly explicit: All systems rely heavily on explicit user commands ("remember this") rather than fully autonomous memory extraction.
-
Open-source frameworks differ fundamentally: mem0 focuses on LLM-driven fact extraction with dual storage (vector + graph), while Letta implements OS-inspired virtual memory with self-editing capabilities.
References¶
- ChatGPT Memory Reverse Engineering - Manthan Gupta
- Claude Memory Reverse Engineering - Manthan Gupta
- mem0 - Memory layer for personalized AI
- Letta - LLM agents with self-managing memory