LLM Memory Systems: 2026 Q1 Update¶
Last Updated: 2026-03-24
Previous baseline: memory.summary.md (2025-12)
What Changed Since 2025¶
The 2025 landscape was dominated by Mem0 (fact extraction + vector/graph dual storage), Letta (three-tier self-editing memory), and Graphiti (temporal knowledge graph). Three months later, four new systems have emerged with fundamentally different approaches.
The Biggest Shift: Anti-RAG Movement¶
In 2025, the assumption was "memory = retrieval = vector database." The 2026 Q1 projects challenge this:
| Project | Uses vector DB? | Uses LLM for retrieval? | Core approach |
|---|---|---|---|
| Mem0 (2025) | ✓ | ✓ (fact extraction) | Vector + graph dual storage |
| Letta (2025) | ✓ (archival tier) | ✓ (self-editing) | Three-tier virtual context |
| Graphiti (2025) | ✗ | ✓ | Temporal knowledge graph |
| Supermemory ASMR | ✗ | ✓ (3+3 agent pipeline) | LLM-as-retriever, ensemble voting |
| Observational Memory (Mastra) | ✗ | ✓ (Observer+Reflector) | Pure compression, no retrieval at all |
| Hindsight | Hybrid (BM25+semantic) | ✓ (Cara reflect agent) | Four cognitive networks + graph traversal |
| MemOS | Optional (Qdrant) | ✓ | OS abstraction over three memory types |
Three out of four new systems don't use vector databases. The field is moving from "embed and search" toward "compress and reason."
New Projects¶
Supermemory (ASMR)¶
Approach: Replace vector search entirely with LLM reasoning.
- 3 observer agents extract structured knowledge across 6 dimensions (personal info, preferences, events, temporal data, updates, assistant info)
- 3 search agents do retrieval via LLM reasoning (direct facts, contextual clues, temporal reconstruction)
- Ensemble voting: 8-variant cluster (98.6% oracle) or 12-variant decision forest (97.2% consensus)
- 11-15 LLM calls per question
Reality check: The ~99% score uses oracle evaluation (any of 8 attempts correct = correct). Production engine uses pgvector and scores 81.6%. ASMR is experimental, not yet open-sourced (promised April 2026).
Detail: supermemory.research.md
Observational Memory (Mastra)¶
Approach: Pure compression, zero retrieval. Everything stays in the context window.
- Observer agent triggers at 30K tokens, compresses raw messages into dated observations (5-40x compression)
- Reflector agent triggers at 40K tokens, garbage-collects stale observations
- No vector DB, no graph store — "everything is text in context"
- Prompt caching enables 4-10x cost reduction (stable text prefix)
Why it matters: Simplest architecture yet highest single-model score (94.87% with gpt-5-mini on LongMemEval). Proves that careful compression alone can beat complex retrieval systems.
Source code findings: Observer uses temperature 0.3, thinking budget 215 tokens. Reflector has 5 compression escalation levels. Degenerate repetition detection prevents observation loops.
Detail: mastra.research.md
Hindsight¶
Approach: Structure memory like human cognition — separate facts from experiences from opinions.
- Four memory networks: World (objective facts), Experience (personal events), Observation (inferred patterns), Opinion (deprecated in current code)
- Tempr (retain+recall): LLM fact extraction with 5W schema, entity resolution, four link types (semantic, temporal, causal, entity). 4-way parallel retrieval (semantic + BM25 + graph + temporal) with RRF fusion
- MPFP algorithm: Novel sublinear graph traversal combining meta-path patterns with Forward Push propagation
- Cara (reflect): Agentic loop with disposition traits (skepticism, literalism, empathy) and hierarchical retrieval (mental models > observations > raw facts)
Why it matters: Most technically deep system. Uses a 20B open-source model to lift accuracy from 39% to 83.6%, proving architecture matters more than model size. 91.4% with Gemini-3 Pro.
Detail: hindsight.research.md
MemOS¶
Approach: Don't invent new memory algorithms — build an OS layer that manages all types.
- Three-layer architecture: Interface (API parsing) → Operation (MemScheduler, MemLifecycle, MemOperator) → Infrastructure (storage backends)
- MemCube: Standardized container with metadata (user_id, source, timestamp, importance_score, TTL)
- Three memory types:
- Plaintext: Text memories with naive/general/tree backends (Neo4j tree is most advanced)
- Activation: KV-cache snapshots (functional but limited to local HuggingFace models)
- Parametric: LoRA weight memories (still a stub/placeholder — most ambitious but least implemented)
- Heavy infrastructure: Neo4j + Qdrant + Redis + MySQL
Why it matters: Only system that explicitly models the three pillars (text/cache/weights) in one framework. But the Parametric Memory vision — writing memories into model weights — remains unrealized.
Detail: memos.research.md
Benchmark Landscape¶
| System | LongMemEval | LoCoMo | Model | Year |
|---|---|---|---|---|
| Supermemory ASMR (oracle) | 98.6% | — | GPT-4o × 8 | 2026 |
| Observational Memory | 94.87% | — | gpt-5-mini | 2026 |
| Hindsight | 91.4% | 89.61% | Gemini-3 Pro | 2025-12 |
| Supermemory (production) | 81.6% | — | — | 2026 |
| Mem0 | — | ~58% | — | 2025 |
Note: Benchmark numbers are self-reported and methodologies vary. ASMR's 98.6% uses oracle evaluation. Direct comparison across systems requires caution.
Emerging Patterns¶
Pattern 1: Compression Is Becoming a First-Class Memory Strategy¶
In 2025, compression (compaction) was an emergency measure for context overflow. In 2026, Observational Memory proves it can be the entire memory architecture — and beat retrieval-based systems on benchmarks.
Pattern 2: LLM-as-Retriever Replacing Vector Search¶
Both Supermemory ASMR and Hindsight's Cara use LLM reasoning for retrieval instead of (or alongside) embedding similarity. The tradeoff: better semantic understanding vs higher cost per query.
Pattern 3: Cognitive Architecture Emerging¶
Hindsight's four-network model (facts/experiences/observations/opinions) and MemOS's three memory types (text/cache/weights) both try to model memory the way humans think about memory, not just as a key-value store.
Pattern 4: The Weight-Writing Frontier Remains Unrealized¶
MemOS is the only system that explicitly attempts Parametric Memory (LoRA weight updates). It's still a stub. The gap between "external memory" and "learned memory" identified in our findings remains wide.
Updated Technology Landscape (2025 → 2026 Q1)¶
| Dimension | 2025 State | 2026 Q1 State |
|---|---|---|
| Primary storage | Vector DB (Qdrant, Chroma, pgvector) | Shifting: plain text in context, graph, or no persistent store |
| Retrieval | Embedding similarity (RAG) | Diversifying: LLM reasoning, graph traversal, pure compression |
| Compression | Emergency compaction only | First-class strategy (Observational Memory) |
| Cognitive modeling | Flat fact lists (Mem0) | Structured networks (Hindsight 4-network, MemOS 3-type) |
| Weight updates | Not attempted | Attempted but not realized (MemOS LoRA stub) |
| Benchmark SOTA | ~80% LongMemEval | ~95% LongMemEval (single model, non-oracle) |
| Cost efficiency | High (multiple LLM calls + embedding) | Improving (Mastra: 10x reduction via prompt caching) |