Skip to content

LLM Memory Systems: 2026 Q1 Update

Last Updated: 2026-03-24

Previous baseline: memory.summary.md (2025-12)


What Changed Since 2025

The 2025 landscape was dominated by Mem0 (fact extraction + vector/graph dual storage), Letta (three-tier self-editing memory), and Graphiti (temporal knowledge graph). Three months later, four new systems have emerged with fundamentally different approaches.

The Biggest Shift: Anti-RAG Movement

In 2025, the assumption was "memory = retrieval = vector database." The 2026 Q1 projects challenge this:

Project Uses vector DB? Uses LLM for retrieval? Core approach
Mem0 (2025) ✓ (fact extraction) Vector + graph dual storage
Letta (2025) ✓ (archival tier) ✓ (self-editing) Three-tier virtual context
Graphiti (2025) Temporal knowledge graph
Supermemory ASMR ✓ (3+3 agent pipeline) LLM-as-retriever, ensemble voting
Observational Memory (Mastra) ✓ (Observer+Reflector) Pure compression, no retrieval at all
Hindsight Hybrid (BM25+semantic) ✓ (Cara reflect agent) Four cognitive networks + graph traversal
MemOS Optional (Qdrant) OS abstraction over three memory types

Three out of four new systems don't use vector databases. The field is moving from "embed and search" toward "compress and reason."


New Projects

Supermemory (ASMR)

Approach: Replace vector search entirely with LLM reasoning.

  • 3 observer agents extract structured knowledge across 6 dimensions (personal info, preferences, events, temporal data, updates, assistant info)
  • 3 search agents do retrieval via LLM reasoning (direct facts, contextual clues, temporal reconstruction)
  • Ensemble voting: 8-variant cluster (98.6% oracle) or 12-variant decision forest (97.2% consensus)
  • 11-15 LLM calls per question

Reality check: The ~99% score uses oracle evaluation (any of 8 attempts correct = correct). Production engine uses pgvector and scores 81.6%. ASMR is experimental, not yet open-sourced (promised April 2026).

Detail: supermemory.research.md

Observational Memory (Mastra)

Approach: Pure compression, zero retrieval. Everything stays in the context window.

  • Observer agent triggers at 30K tokens, compresses raw messages into dated observations (5-40x compression)
  • Reflector agent triggers at 40K tokens, garbage-collects stale observations
  • No vector DB, no graph store — "everything is text in context"
  • Prompt caching enables 4-10x cost reduction (stable text prefix)

Why it matters: Simplest architecture yet highest single-model score (94.87% with gpt-5-mini on LongMemEval). Proves that careful compression alone can beat complex retrieval systems.

Source code findings: Observer uses temperature 0.3, thinking budget 215 tokens. Reflector has 5 compression escalation levels. Degenerate repetition detection prevents observation loops.

Detail: mastra.research.md

Hindsight

Approach: Structure memory like human cognition — separate facts from experiences from opinions.

  • Four memory networks: World (objective facts), Experience (personal events), Observation (inferred patterns), Opinion (deprecated in current code)
  • Tempr (retain+recall): LLM fact extraction with 5W schema, entity resolution, four link types (semantic, temporal, causal, entity). 4-way parallel retrieval (semantic + BM25 + graph + temporal) with RRF fusion
  • MPFP algorithm: Novel sublinear graph traversal combining meta-path patterns with Forward Push propagation
  • Cara (reflect): Agentic loop with disposition traits (skepticism, literalism, empathy) and hierarchical retrieval (mental models > observations > raw facts)

Why it matters: Most technically deep system. Uses a 20B open-source model to lift accuracy from 39% to 83.6%, proving architecture matters more than model size. 91.4% with Gemini-3 Pro.

Detail: hindsight.research.md

MemOS

Approach: Don't invent new memory algorithms — build an OS layer that manages all types.

  • Three-layer architecture: Interface (API parsing) → Operation (MemScheduler, MemLifecycle, MemOperator) → Infrastructure (storage backends)
  • MemCube: Standardized container with metadata (user_id, source, timestamp, importance_score, TTL)
  • Three memory types:
  • Plaintext: Text memories with naive/general/tree backends (Neo4j tree is most advanced)
  • Activation: KV-cache snapshots (functional but limited to local HuggingFace models)
  • Parametric: LoRA weight memories (still a stub/placeholder — most ambitious but least implemented)
  • Heavy infrastructure: Neo4j + Qdrant + Redis + MySQL

Why it matters: Only system that explicitly models the three pillars (text/cache/weights) in one framework. But the Parametric Memory vision — writing memories into model weights — remains unrealized.

Detail: memos.research.md


Benchmark Landscape

System LongMemEval LoCoMo Model Year
Supermemory ASMR (oracle) 98.6% GPT-4o × 8 2026
Observational Memory 94.87% gpt-5-mini 2026
Hindsight 91.4% 89.61% Gemini-3 Pro 2025-12
Supermemory (production) 81.6% 2026
Mem0 ~58% 2025

Note: Benchmark numbers are self-reported and methodologies vary. ASMR's 98.6% uses oracle evaluation. Direct comparison across systems requires caution.


Emerging Patterns

Pattern 1: Compression Is Becoming a First-Class Memory Strategy

In 2025, compression (compaction) was an emergency measure for context overflow. In 2026, Observational Memory proves it can be the entire memory architecture — and beat retrieval-based systems on benchmarks.

Both Supermemory ASMR and Hindsight's Cara use LLM reasoning for retrieval instead of (or alongside) embedding similarity. The tradeoff: better semantic understanding vs higher cost per query.

Pattern 3: Cognitive Architecture Emerging

Hindsight's four-network model (facts/experiences/observations/opinions) and MemOS's three memory types (text/cache/weights) both try to model memory the way humans think about memory, not just as a key-value store.

Pattern 4: The Weight-Writing Frontier Remains Unrealized

MemOS is the only system that explicitly attempts Parametric Memory (LoRA weight updates). It's still a stub. The gap between "external memory" and "learned memory" identified in our findings remains wide.


Updated Technology Landscape (2025 → 2026 Q1)

Dimension 2025 State 2026 Q1 State
Primary storage Vector DB (Qdrant, Chroma, pgvector) Shifting: plain text in context, graph, or no persistent store
Retrieval Embedding similarity (RAG) Diversifying: LLM reasoning, graph traversal, pure compression
Compression Emergency compaction only First-class strategy (Observational Memory)
Cognitive modeling Flat fact lists (Mem0) Structured networks (Hindsight 4-network, MemOS 3-type)
Weight updates Not attempted Attempted but not realized (MemOS LoRA stub)
Benchmark SOTA ~80% LongMemEval ~95% LongMemEval (single model, non-oracle)
Cost efficiency High (multiple LLM calls + embedding) Improving (Mastra: 10x reduction via prompt caching)

References