MemOS Technical Research Report¶
Last Updated: 2026-03-24
Research Methodology: This document was generated through source code analysis of the MemOS repository (v2.0.10) combined with the project's arXiv papers and web research, analyzed by Claude Code.
Overview¶
MemOS is an open-source Memory Operating System for LLMs and AI agents, developed by MemTensor (Shanghai). Unlike memory layers that bolt onto existing frameworks (Mem0) or agent runtimes that manage memory as part of execution (Letta), MemOS treats memory as a first-class system resource and introduces a three-layer OS architecture with a unified MemCube abstraction that encapsulates heterogeneous memory types -- plaintext, activation (KV-cache), and parametric (weights) -- under standardized scheduling and orchestration.
MemOS originated from the Memory3 model (WAIC 2024), which introduced explicit memory carriers in attention mechanisms. The short paper was released May 28, 2025 (arXiv:2505.22101), and the full paper July 4, 2025 (arXiv:2507.03724).
Source: MemTensor/MemOS on GitHub | PyPI: MemoryOS | Apache-2.0 License
1. Three-Layer OS Architecture¶
MemOS adopts a modular three-layer architecture analogous to an operating system:
┌─────────────────────────────────────────────────────────────────────┐
│ Interface Layer (MemReader) │
│ REST API · MCP Server · Python SDK · OpenClaw Plugins │
│ Parses requests → structured memory operations │
├─────────────────────────────────────────────────────────────────────┤
│ Operation Layer (MemOS Core) │
│ MOSCore · MemScheduler · MemLifecycle · MemFeedback │
│ Scheduling, lifecycle, CoT decomposition, conflict resolution │
├─────────────────────────────────────────────────────────────────────┤
│ Infrastructure Layer │
│ Neo4j (graph) · Qdrant (vector) · MySQL/SQLite · Redis Streams │
│ Memory storage, access control, multi-user management │
└─────────────────────────────────────────────────────────────────────┘
Interface Layer¶
The entry point for all memory operations. Key components:
| Component | Location | Purpose |
|---|---|---|
| REST API | src/memos/api/routers/ |
Product, server, and admin endpoints |
| MCP Server | src/memos/api/mcp_serve.py |
Model Context Protocol for tool-based memory access |
| MemReader | src/memos/mem_reader/ |
Translates raw input (messages, docs, images) into structured memory items |
| Python SDK | MOS class |
Direct programmatic access (MOS.simple()) |
MemReader has three backends: - SimpleStructMemReader -- basic text-to-memory extraction via LLM prompts - StrategyStructMemReader -- strategy-based extraction with conflict resolution - MultiModalStructMemReader -- handles images, tool traces, documents, URLs, and preference/skill memory
Operation Layer¶
The central controller orchestrating memory lifecycle:
| Component | Location | Purpose |
|---|---|---|
MOSCore |
src/memos/mem_os/core.py |
Core engine: manages MemCubes, users, chat, search |
MOS |
src/memos/mem_os/main.py |
Extended core with CoT query decomposition (PRO mode) |
GeneralScheduler |
src/memos/mem_scheduler/ |
Async task scheduling via Redis Streams with priority queues |
MemFeedback |
src/memos/mem_feedback/ |
Natural-language feedback loop for memory correction |
UserManager |
src/memos/mem_user/ |
Multi-user access control with role-based permissions |
Infrastructure Layer¶
Storage backends supporting the system:
| Backend | Used For | Notes |
|---|---|---|
| Neo4j | Graph-structured tree text memory | Hierarchical topic/concept/fact organization |
| Qdrant | Vector similarity search | General text memory embedding store |
| Milvus | Preference memory vectors | MinHash deduplication + vector search |
| MySQL/SQLite | User management, history, metadata | Relational data and audit logs |
| Redis | Task scheduling, message queues | Redis Streams for async memory operations |
2. MemCube: The Unified Memory Container¶
MemCube is the core abstraction -- a standardized container that bundles heterogeneous memory types into a single manageable unit. Each MemCube has a cube_id, belongs to a user_id, and encapsulates up to four memory slots:
# src/memos/mem_cube/general.py
class GeneralMemCube(BaseMemCube):
_text_mem: BaseTextMemory | None # Plaintext memory
_act_mem: BaseActMemory | None # Activation memory (KV-cache)
_para_mem: BaseParaMemory | None # Parametric memory (LoRA)
_pref_mem: BaseTextMemory | None # Preference memory
MemCube Configuration¶
# src/memos/configs/mem_cube.py
class GeneralMemCubeConfig(BaseMemCubeConfig):
text_mem: MemoryConfigFactory # backends: naive_text, general_text, tree_text
act_mem: MemoryConfigFactory # backends: kv_cache, vllm_kv_cache
para_mem: MemoryConfigFactory # backends: lora
pref_mem: MemoryConfigFactory # backends: pref_text
Key MemCube Operations¶
load(dir)/dump(dir)-- Serialize/deserialize all memory types to/from a directoryinit_from_dir(dir)-- Reconstruct a MemCube from a saved directory (includesconfig.json)init_from_remote_repo(cube_id)-- Load from HuggingFace datasets- Selective loading -- Load only specific memory types:
load(dir, memory_types=["text_mem"])
Multi-Cube Composition¶
# src/memos/multi_mem_cube/composite_cube.py
class CompositeCubeView:
cube_views: list[SingleCubeView]
def add_memories(...) # Fan-out writes to all cubes
def search_memories(...) # Parallel search, merge results (text, act, para, pref, tool, skill)
def feedback_memories(...)# Fan-out feedback to all cubes
MOSCore manages a dictionary of MemCubes (mem_cubes: Dict[str, GeneralMemCube]) and routes operations based on user-cube access permissions.
3. Three Memory Types¶
A. Plaintext (Textual) Memory¶
The most developed and feature-rich memory type. Three backend implementations:
NaiveTextMemory (naive_text)¶
Simple in-memory list of TextualMemoryItem objects. No external storage dependency. Suitable for testing and lightweight use.
GeneralTextMemory (general_text)¶
Vector-based memory using Qdrant for similarity search:
- Extract: LLM extracts structured facts from conversation as {"key": ..., "value": ..., "tags": [...]}
- Store: Each memory embedded and stored in Qdrant vector DB
- Search: Query embedding → cosine similarity search → top-k retrieval
- Dependencies: Qdrant + LLM (extractor) + Embedding model
TreeTextMemory (tree_text) -- Most Advanced¶
Hierarchical graph-structured memory using Neo4j:
- Three-level hierarchy: Topics → Concepts → Facts (stored as Neo4j nodes/edges)
- Dual retrieval: BM25 keyword search + embedding-based vector search
- Reranking: Pluggable rerankers (cosine local, BGE, etc.) with configurable level weights
- Background reorganization: Async MemoryManager periodically restructures the graph (merges, deduplicates, builds relations)
- Relation reasoning: LLM-based detection of conflict/duplicate/complementary relationships between memories
- Internet search integration: BochaSearch and XinyuSearch for augmenting memory with web results
- Dependencies: Neo4j + Qdrant + LLM + Embedder + optional Reranker
Memory Item Structure:
class TextualMemoryItem(BaseModel):
id: str # UUID
memory: str # The actual text content
metadata: TextualMemoryMetadata # Rich metadata (see below)
class TreeNodeTextualMemoryMetadata(TextualMemoryMetadata):
memory_type: Literal[
"WorkingMemory", "LongTermMemory", "UserMemory",
"OuterMemory", "ToolSchemaMemory", "ToolTrajectoryMemory",
"RawFileMemory", "SkillMemory", "PreferenceMemory"
]
sources: list[SourceMessage] # Provenance tracking
embedding: list[float] # Vector embedding
status: "activated" | "resolving" | "archived" | "deleted"
version: int # Version tracking
history: list[ArchivedTextualMemory] # Archived previous versions
confidence: float # 0-100 reliability score
tags: list[str] # Categorization labels
visibility: "private" | "public" | "session"
B. Activation Memory (KV-Cache)¶
Stores transformer key-value caches as memory, enabling pre-computed context injection:
# src/memos/memories/activation/kv.py
class KVCacheMemory(BaseActMemory):
def extract(self, text: str) -> KVCacheItem:
kv_cache = self.llm.build_kv_cache(text) # Forward pass to build cache
return KVCacheItem(memory=kv_cache, metadata={...})
def add(self, memories: list[KVCacheItem]) -> None: ...
def get_all(self) -> list[KVCacheItem]: ...
- Requires HuggingFace backend (local models only, not API-based LLMs)
- Also supports vLLM-based KV cache (
VLLMKVCacheMemory) - Used during chat: retrieved KV caches are passed as
past_key_valuesto the LLM'sgenerate()call - Stored as pickle files on disk
Use case: Pre-encode long documents or user profiles into KV caches, then inject them at inference time to reduce re-computation latency.
C. Parametric Memory (LoRA)¶
Placeholder for model weight-level memory via Low-Rank Adaptation:
# src/memos/memories/parametric/lora.py -- PLACEHOLDER
class LoRAMemory(BaseParaMemory):
# Currently a stub - load/dump produce placeholder files
# Intended for storing per-user or per-task LoRA adapters as memory
The parametric memory concept is described in the paper as encoding knowledge directly into model weights through fine-tuning, but the open-source implementation is not yet functional. This is the most ambitious memory type -- it envisions a "Mem-training Paradigm" where learning and inference are unified through continuous memory-driven parameter updates.
D. Preference Memory (Additional)¶
Beyond the three types from the paper, the implementation adds a fourth slot:
class PreferenceTextMemory: # backend: "pref_text"
# Stores explicit and implicit user preferences
# Uses Milvus for vector storage + MinHash for deduplication
# Extracts preferences from conversation patterns
4. Memory Lifecycle & Scheduling¶
MemScheduler¶
The async memory scheduling system built on Redis Streams:
Task types processed by the scheduler:
- ADD_TASK -- Extract and store memories from new messages
- QUERY_TASK -- Track search queries for analytics
- ANSWER_TASK -- Track assistant responses
- MEM_READ_TASK -- Document/URL ingestion into memory
- PREF_ADD_TASK -- Preference extraction
Features: - Priority queues with quota-based scheduling - Auto-recovery for failed tasks - Queue isolation per user/cube - Working memory management with periodic promotion to long-term memory
Memory Feedback Loop¶
# src/memos/mem_feedback/feedback.py
class MemFeedback:
# Natural language feedback → memory correction pipeline:
# 1. Keyword extraction & replacement
# 2. Judgment (should this feedback change memory?)
# 3. Comparison with existing memories
# 4. Operation execution (update/delete/supplement)
# Supports both English and Chinese prompts
Memory Reorganization¶
TreeTextMemory includes a background MemoryManager that periodically:
1. Detects conflict/duplicate relations between memory nodes
2. Merges overlapping facts
3. Builds hierarchical topic → concept → fact trees
4. Archives superseded memory versions (full version history preserved)
5. Agent Integration¶
OpenClaw Cloud Plugin¶
Located at apps/MemOS-Cloud-OpenClaw-Plugin/. Two-phase lifecycle:
- Before each turn (recall): Sends semantic search to MemOS Cloud API → injects relevant memory fragments into agent context
- After each turn (capture): Extracts key information from conversation → persists as structured memory
Key features:
- Multi-agent memory sharing via user_id + agent_id isolation
- Conversation ID tracking with configurable prefix/suffix
- Rate limiting with MIN_CAPTURE_INTERVAL
- Claims 72% lower token usage vs loading full chat history
OpenClaw Local Plugin¶
Located at apps/memos-local-openclaw/. A full on-device memory system:
- Storage: SQLite with FTS5 full-text search + in-process vector search
- Recall engine (
src/recall/engine.ts): Hybrid FTS + vector search → RRF fusion → MMR reranking → recency decay - Ingest pipeline (
src/ingest/): LLM-based summarization with multiple providers (OpenAI, Anthropic, Gemini, Bedrock) - Skill memory (
src/skill/): Reusable skills that self-evolve -- generator, evaluator, evolver, upgrader - Task summarization: Auto-summarizes completed tasks for future reference
- Memory Viewer: Web dashboard for browsing/managing memories
- Multi-agent: Memory isolation per agent with skill sharing across agents
MCP Server¶
# src/memos/api/mcp_serve.py
# FastMCP server exposing memory operations as MCP tools:
# - memory_add, memory_search, memory_delete, memory_feedback
# - Enables any MCP-compatible client to use MemOS as a memory backend
OpenWork Integration¶
Located at apps/openwork-memos-integration/. An Electron desktop app integrating MemOS with OpenCode (coding agent), providing memory-augmented task execution with a full GUI.
6. How Memory is Stored, Retrieved, and Managed¶
Storage Flow (Add)¶
Messages/Documents
│
▼
MemReader (LLM extraction) ──→ list[TextualMemoryItem]
│ │
│ (async via MemScheduler) │ (sync)
▼ ▼
GeneralTextMemory: TreeTextMemory:
embed → Qdrant embed → Neo4j graph nodes
BM25 index update
Relation detection (async)
Retrieval Flow (Search)¶
Query
│
├──→ TreeTextMemory: BM25 + embedding search → rerank → top-k
├──→ GeneralTextMemory: embed query → Qdrant similarity → top-k
├──→ PreferenceMemory: Milvus search → preference notes
└──→ ActivationMemory: return KV cache for context injection
│
▼
MOSSearchResult { text_mem, act_mem, para_mem, pref_mem }
│
▼
System prompt injection: "## Memories:\n1. fact1\n2. fact2\n..."
Chat Pipeline¶
# Simplified from MOSCore.chat():
1. Resolve target user → get accessible cubes
2. For each cube: text_mem.search(query, top_k) → collect memories
3. Build system prompt with memory context
4. If activation memory enabled: load KV cache as past_key_values
5. LLM.generate(messages, past_key_values?) → response
6. Update chat history
7. Submit to MemScheduler for async memory extraction
PRO Mode (CoT Enhancement)¶
When PRO_MODE=True, complex queries are decomposed:
1. LLM decomposes query into sub-questions (JSON: {is_complex, sub_questions})
2. Each sub-question searched against memory in parallel
3. Sub-answers generated with memory context
4. Final synthesis combines all sub-answers into a coherent response
7. Evaluation & Benchmarks¶
MemOS includes evaluation scripts for four benchmarks:
| Benchmark | Focus | MemOS Claim |
|---|---|---|
| LoCoMo | Long-term contextual memory & reasoning | 75.80 accuracy |
| LongMemEval | Cross-session reasoning, temporal reasoning, abstention | +40.43% vs baselines |
| PrefEval | Preference tracking and personalization | +2568% improvement |
| PersonaMem | Persona consistency across conversations | +40.75% improvement |
The evaluation framework (evaluation/) includes unofficial implementations of competitor systems (Mem0, Zep, Memobase, SuperMemory, MemU) for fair comparison.
Claims from README: +43.70% accuracy vs. OpenAI Memory and saves 35.24% memory tokens.
8. Comparison with Other Memory Systems¶
| Feature | MemOS | Mem0 | Letta (MemGPT) |
|---|---|---|---|
| Architecture | Three-layer OS (Interface/Operation/Infrastructure) | Memory layer (bolt-on) | Agent runtime with memory |
| Memory Types | Plaintext + Activation (KV-cache) + Parametric (LoRA) + Preference | Vector facts + Graph relationships | Core (in-context) + Recall (logs) + Archival (vector) |
| Core Abstraction | MemCube (multi-memory container) | Memory (fact store) | Agent with Block-based memory |
| Memory Extraction | LLM-based via MemReader (multi-modal) | LLM fact extraction + conflict resolution | Agent self-edits via tool calls |
| Storage | Neo4j + Qdrant + Milvus + MySQL + Redis | 24+ vector stores + Neo4j/Memgraph | Postgres + vector extensions |
| Graph Memory | Core feature (tree hierarchy: topic/concept/fact) | Optional (parallel to vector) | Not native |
| KV-Cache Memory | Native support (HuggingFace/vLLM) | No | No |
| Weight-Level Memory | LoRA placeholder (not yet functional) | No | No |
| Async Scheduling | Redis Streams with priority queues | No (sync pipeline) | Background processing via tasks |
| Multi-Modal | Images, documents, URLs, tool traces | Text only | Text + file attachments |
| Memory Feedback | Natural language correction loop | Manual update API | Agent self-correction via tools |
| Multi-User | Role-based access control, cube-per-user | user_id/agent_id/run_id scoping | Per-agent isolation |
| Agent Integration | OpenClaw plugins (cloud + local), MCP server | SDKs, Vercel AI SDK | Native agent runtime |
| Deployment | Docker Compose, uvicorn, Cloud API | Managed cloud or self-hosted | Server + Docker |
| Language | Python (core) + TypeScript (plugins) | Python | Python |
| License | Apache-2.0 | Apache-2.0 | Apache-2.0 |
| Primary Use Case | Enterprise memory OS with multi-modal KB | Quick memory layer for existing apps | Stateful agent development |
| Maturity | Parametric memory is stub; tree memory most advanced | Production-ready vector+graph | Production-ready agent framework |
9. Unique Features & Contributions¶
-
Unified heterogeneous memory abstraction -- MemCube bundles plaintext, activation, and parametric memory under one container, enabling holistic memory management rather than treating each type separately.
-
Tree-structured memory organization -- Unlike flat vector stores, TreeTextMemory organizes knowledge hierarchically (topic → concept → fact) in Neo4j, enabling multi-granularity retrieval with BM25 + embedding + reranking.
-
KV-cache as first-class memory -- The only system to treat transformer KV caches as a manageable memory type that can be extracted, stored, loaded, and injected at inference time.
-
Memory feedback loop -- Natural language corrections flow through a judgment → comparison → operation pipeline to refine existing memories, supporting both English and Chinese.
-
Skill memory and evolution -- The local OpenClaw plugin introduces self-upgrading skill memory -- tasks generate reusable skills that evolve through evaluation and improvement cycles.
-
Mem-training paradigm (vision) -- The paper proposes blurring the line between learning and inference through continuous memory-driven parameter updates, though the LoRA implementation remains a placeholder.
-
Rich provenance tracking -- Every memory item records its source (chat, doc, web, file, system), version history, archived versions, confidence scores, and visibility settings.
10. Key Takeaways¶
-
OS-level ambition -- MemOS is the most architecturally ambitious memory system, treating memory as a full OS concern rather than a simple retrieval layer. The three-layer design mirrors traditional OS architecture.
-
Implementation maturity varies -- Plaintext/tree memory is production-quality with sophisticated retrieval and reorganization. KV-cache memory works but only with local HuggingFace models. Parametric memory (LoRA) is a placeholder.
-
Chinese ecosystem focus -- Heavy integration with Alibaba Cloud services (Bailian API, OSS, DashScope), Neo4j Community/Enterprise editions, and Chinese-language prompt support throughout.
-
Benchmark leadership -- Claims SOTA across LoCoMo, LongMemEval, PrefEval, and PersonaMem, with built-in comparison scripts against Mem0, Zep, and others.
-
Agent integration via plugins -- Rather than being an agent framework itself, MemOS integrates with agent platforms (OpenClaw, MCP) as a dedicated memory backend, separating concerns cleanly.
-
Heavy infrastructure requirements -- Full deployment requires Neo4j + Qdrant + Redis + MySQL/SQLite + LLM API, making it significantly more complex to self-host than Mem0's single-vector-store approach.
References¶
- MemOS GitHub Repository
- MemOS Paper (Long) -- "MemOS: A Memory OS for AI System" (Jul 2025)
- MemOS Paper (Short) -- "MemOS: An Operating System for Memory-Augmented Generation" (May 2025)
- Memory3 Paper -- "Memory3: Language Modeling with Explicit Memory" (2024)
- MemOS Documentation
- MemOS Architecture Docs
- MemOS Cloud OpenClaw Plugin
- Awesome-AI-Memory
- MemOS OpenClaw Token Savings Analysis
- AI Memory Architecture: MemOS Governance Framework
- Core implementation:
src/memos/mem_os/core.py(~600 lines),src/memos/mem_cube/general.py - Memory backends:
src/memos/memories/(textual, activation, parametric) - Scheduler:
src/memos/mem_scheduler/general_scheduler.py