Skip to content

Real-World LLM Memory System Adoption in Production (2025)

Last Updated: 2025-12-18

Executive Summary

The landscape of LLM memory in production has moved beyond simple "chat history" into sophisticated, multi-layered systems often referred to as "Agentic Memory." This research separates verifiable production use from marketing claims.

Key Findings

1. Framework Adoption Status

Mem0

Status: Strongest verifiable traction among startups and developers

Production Users: - Sunflower - Healthcare/addiction recovery platform - RevisionDojo - EdTech platform - Browserbase - Headless browser agents - OpenNote - Note-taking AI assistant

Major Validation: - AWS selected Mem0 as the exclusive memory provider for its AWS Agent SDK - Pushing it into enterprise environments - Primary benefits: ~40% token cost reduction and lower latency vs. full chat history

Use Case Pattern: Cost optimization and latency reduction in conversational AI

Letta (formerly MemGPT)

Status: Gaining ground in complex, logic-heavy agent workflows

Production Users: - 11x - AI sales automation ("Deep Research" agents) - Kognitos - Business automation (enterprise analytics tools)

Use Case Pattern: "Stateful" agents that need to "self-edit" memory - Agents explicitly decide what to remember (e.g., "user is vegetarian") - Not relying on context window retention - Complex multi-step reasoning with memory state management

Graphiti

Status: Heavily specialized for dynamic knowledge graphs

Production Users: - Zep AI - Core engine for agent memory platform - FutureSmart AI - Enterprise knowledge graphs

Use Case Pattern: Scenarios where relationships change over time - Example: CRM where "John is the CEO" → "John is a consultant" - Temporal relationship tracking - Entity evolution over time

2. Common Production Architectures

Most "home-grown" production systems (Walmart, JP Morgan, large SaaS companies) don't use a single library. Instead, they build a Dual-Memory Architecture:

Architecture Pattern: Dual-Memory System

1. Short-Term / Working Memory (Hot) - Technology: Redis or in-memory caches - Function: Stores immediate conversation history (last 10-20 turns) and active task state - Characteristics: Fast, ephemeral

2. Long-Term / Episodic Memory (Cold) - Technology: Vector Databases (Pinecone, Qdrant, Milvus) - Function: Stores "snapshots" of past conversations, user preferences, facts as embeddings - Retrieval: Semantic search of past interactions (not just recent ones)

3. Vector Databases: Beyond RAG

Critical Finding: Vector Databases are definitively used for Conversation Memory, not just RAG.

Distinction: - RAG: "Search my documents to answer this question" - Memory: "Search my past conversations to remember who this user is"

Production Examples: - Twilio (AI Assistants) - Vector stores for chat history indexing - Aquant - Conversational memory retrieval - OpenAI (via Azure Cosmos DB) - Vector-based conversation memory

Capability: Enables agents to answer "What did we talk about last week?" by treating past chat logs as a retrieval dataset

4. Production Deployment Summary

Company/Project Memory Tech Stack Use Case
AWS Agent SDK Mem0 Standardized memory for AWS-built agents
Zep AI Graphiti Memory-as-a-service platform
11x (Sales AI) Letta Autonomous sales development reps (SDRs)
Sunflower Mem0 Long-term therapy/health chat context
Twilio Vector DB (custom) AI assistant conversation indexing
Standard Enterprise Redis + Vector DB Custom implementations with LangChain/autogen

Market Segmentation (2025)

Startups & Mid-Market

Approach: Aggressively adopting Mem0 and Letta - Rationale: Ship features faster - Focus: Reduce development time - Trade-off: Less customization for faster time-to-market

Large Enterprises

Approach: Building custom Dual-Memory architectures - Stack: Redis + Vector Database - Orchestration: LangGraph or CrewAI frameworks - Drivers: Security, scale, customization requirements

Conclusions

  1. Production memory is no longer just "sending the whole transcript"
  2. Multi-layered memory systems are now standard
  3. Hot/cold memory separation is common pattern

  4. Framework adoption is real but segmented

  5. Mem0: Most broadly adopted (especially post-AWS partnership)
  6. Letta: Niche for complex stateful agents
  7. Graphiti: Specialized for temporal knowledge graphs

  8. Vector databases have dual purposes

  9. RAG: Document/knowledge retrieval
  10. Memory: Conversation history and user context retrieval

  11. Architecture patterns are converging

  12. Short-term: Redis/in-memory (fast access)
  13. Long-term: Vector DB (semantic retrieval)
  14. Orchestration: Framework-based (LangGraph, CrewAI)

  15. Production reality vs. marketing

  16. Real adoption exists but is concentrated in startups and specific enterprise features
  17. Not yet universal across all "big tech" production systems
  18. Enterprise tends toward custom solutions for security/compliance

Research Methodology

Source: Internet search conducted via Gemini CLI on 2025-12-18 Focus: Real adoption data, GitHub issues, developer discussions, case studies Exclusions: Marketing claims without verifiable production use