Supermemory Technical Research Report¶
Last Updated: 2026-03-24
Research Methodology: Open-source repository analysis (GitHub
supermemoryai/supermemory), blog post analysis, documentation review, and web search for community discussion. Note: Supermemory's backend (API server, ingestion pipeline, memory engine) is closed source. The open-source repo contains client SDKs, framework integrations, MCP server, and the web application shell.
Overview¶
Supermemory is a commercial memory and context platform for AI agents, positioned as an all-in-one solution combining memory extraction, RAG, user profiles, and data connectors into a single managed API. It claims #1 performance on three major benchmarks: LongMemEval, LoCoMo, and ConvoMem.
Separately, Supermemory has published research on ASMR (Agentic Search and Memory Retrieval) -- an experimental multi-agent system that replaces vector database retrieval with LLM-powered agentic reasoning, achieving ~99% on LongMemEval-s. ASMR is not the production engine; it is a research prototype announced in March 2026.
Source: GitHub Repository | Documentation | ASMR Blog Post | Research Page
1. Production System Architecture¶
What's in the Open-Source Repo¶
The GitHub repository (supermemoryai/supermemory) is a Turbo monorepo containing client-side code only. The actual memory engine, ingestion pipeline, and search infrastructure run on Supermemory's cloud (Cloudflare Workers + PostgreSQL).
supermemory/ (open-source monorepo)
├── apps/
│ ├── web/ # Next.js consumer application
│ ├── mcp/ # MCP server (Cloudflare Workers + Durable Objects)
│ ├── browser-extension/ # Chrome extension
│ ├── docs/ # Documentation site
│ └── raycast-extension/ # Raycast plugin
├── packages/
│ ├── tools/ # SDK integrations (Vercel AI SDK, Mastra, OpenAI, Claude)
│ ├── lib/ # Shared client utilities
│ ├── ui/ # React UI components + memory graph visualization
│ ├── validation/ # Zod schemas (mirrors DB schema)
│ ├── hooks/ # React hooks
│ ├── ai-sdk/ # AI SDK provider
│ ├── openai-sdk-python/ # Python OpenAI SDK integration
│ ├── agent-framework-python/ # Microsoft Agent Framework integration
│ └── memory-graph/ # Force-directed graph visualization
└── skills/ # Claude Code / OpenCode skills
Production Memory Engine (Closed Source)¶
Based on the research page and documentation, the production system implements a 5-component architecture:
Content Input (text, conversations, files, URLs)
↓
┌─────────────────────────────────────────────────────────────┐
│ Ingestion Pipeline │
│ 1. Chunk-based Ingestion → semantic decomposition │
│ 2. Memory Extraction → atomic facts from sessions │
│ 3. Relational Versioning → updates/extends/derives edges │
│ 4. Temporal Grounding → documentDate + eventDate │
│ 5. Embedding → Cloudflare AI vectors + pgvector │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Storage Layer │
│ PostgreSQL + pgvector (Hyperdrive connection pooling) │
│ • Documents (content, chunks, embeddings, summaries) │
│ • MemoryEntries (versioned, with relations + temporal) │
│ • Spaces / ContainerTags (user/project scoping) │
│ • Graph edges (similarity-based connections) │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Retrieval Layer │
│ • Hybrid search: RAG chunks + extracted memories │
│ • User profiles: static (stable) + dynamic (recent) │
│ • Semantic similarity + metadata filtering │
│ • Graph-based relationship traversal │
└─────────────────────────────────────────────────────────────┘
Memory Data Model (from Zod schemas in packages/validation/schemas.ts)¶
The MemoryEntrySchema reveals the internal memory structure:
| Field | Type | Purpose |
|---|---|---|
memory |
string | The atomic fact content |
version |
number | Version counter for updates |
isLatest |
boolean | Whether this is the current version |
parentMemoryId |
string? | Previous version of this memory |
rootMemoryId |
string? | Original memory in the chain |
memoryRelations |
Record<"updates"\|"extends"\|"derives"> |
Graph edges to related memories |
isStatic |
boolean | Stable fact (profile) vs dynamic (recent) |
isForgotten |
boolean | Soft-delete flag |
forgetAfter |
date? | TTL for automatic expiration |
isInference |
boolean | Whether this was inferred, not explicitly stated |
sourceCount |
number | How many documents support this memory |
memoryEmbedding |
number[]? | Vector embedding |
Three relationship types model fact evolution:
- updates: Contradictions/corrections (e.g., "moved from NYC to SF" supersedes "lives in NYC")
- extends: Supplements existing facts with details
- derives: Inferred logic from multiple memories
User Profile System¶
Profiles split into two tiers, delivered via /v4/profile:
interface ProfileStructure {
profile: {
static?: Array<{ memory: string; metadata? }> // Stable facts (name, preferences)
dynamic?: Array<{ memory: string; metadata? }> // Recent context (current projects)
}
searchResults: {
results: Array<{ memory: string; similarity: number }> // Query-relevant memories
}
}
The SDK client-side deduplicates across tiers with priority: Static > Dynamic > Search Results (packages/tools/src/tools-shared.ts:deduplicateMemories).
API Surface¶
| Endpoint | Purpose |
|---|---|
client.add() |
Ingest content (text, conversations, files) |
client.profile() |
User profile + optional semantic search (~50ms) |
client.search.memories() |
Hybrid search: RAG + memory |
client.search.documents() |
Document search with metadata filters |
client.memories.forget() |
Soft-delete a memory (exact match or semantic fallback) |
client.documents.uploadFile() |
Upload PDFs, images, videos |
/v4/conversations |
Structured conversation ingestion with smart diffing |
/v3/graph/viewport |
Memory graph visualization data |
MCP Server (apps/mcp/)¶
Runs on Cloudflare Workers with Durable Objects for per-session state. Exposes three core tools to AI clients:
| MCP Tool | Description |
|---|---|
memory |
Save or forget information (save/forget action) |
recall |
Search memories + user profile in one call |
listProjects |
List available container tags for scoping |
Plus resources (supermemory://profile, supermemory://projects) and a context prompt for system prompt injection.
Framework Integrations (packages/tools/)¶
| Integration | Mechanism | File |
|---|---|---|
| Vercel AI SDK | Middleware: intercepts doGenerate/doStream, injects memories into system prompt, saves conversation after response |
src/vercel/middleware.ts |
| Mastra | Wrapper + processor for Mastra agent config | src/mastra/ |
| OpenAI SDK (Python) | Middleware for OpenAI Responses API | src/openai/middleware.ts |
| Claude Memory Tool | File-system metaphor: view, create, str_replace, insert, delete on /memories/ paths |
src/claude-memory.ts |
| Microsoft Agent Framework | Python SDK wrapper | packages/agent-framework-python/ |
Memory retrieval modes: "profile" (static+dynamic only), "query" (semantic search only), "full" (both).
2. ASMR: Agentic Search and Memory Retrieval (Experimental)¶
ASMR is an experimental research system, separate from the production engine, that replaces vector-based retrieval with multi-agent orchestration. Published March 22, 2026.
Architecture¶
INGESTION (3 Observer Agents)
─────────────────────────────
Raw sessions ──→ Agent 1 (sessions 1,3,5...) ──→ ┐
Agent 2 (sessions 2,4,6...) ──→ ├→ Structured findings
Agent 3 (sessions 7,9...) ──→ ┘ (6 dimensions)
[All: Gemini 2.0 Flash]
RETRIEVAL (3 Search Agents)
────────────────────────────
Question ──→ Agent 1: Direct facts & explicit statements ──→ ┐
Agent 2: Related context, social cues ──→ ├→ Compiled
Agent 3: Temporal timelines, relationship maps ──→ ┘ findings
[Orchestrator retrieves verbatim session excerpts]
ANSWERING (Ensemble Voting)
────────────────────────────
Compiled findings ──→ Strategy 1: 8-variant cluster (98.6% oracle)
Strategy 2: 12-variant decision forest (97.2% single output)
Six Knowledge Extraction Dimensions¶
During ingestion, each observer agent extracts across all six:
- Personal Information -- Names, relationships, biographical facts
- Preferences -- Likes, dislikes, style choices
- Events -- What happened, when, with whom
- Temporal Data -- Timelines, date references, sequences
- Updates -- Changes, corrections, contradictions
- Assistant Info -- What the AI previously said/did
Answer Strategies¶
| Strategy | Variants | Accuracy | Output |
|---|---|---|---|
| 8-Variant Cluster | Specialized prompts (e.g., "Precise Counter", "Time Specialist") | 98.6% (oracle: success if any variant correct) | 8 parallel answers |
| 12-Variant Decision Forest | GPT-4o-mini agents + Aggregator LLM with majority voting | 97.2% (single consensus output) | 1 aggregated answer |
Key Innovation: LLM-as-Retriever¶
ASMR completely eliminates vector database queries. Instead of embedding + cosine similarity, it uses LLM agents that read and reason over stored findings. The blog claims this was "the single biggest unlock" -- agentic retrieval overcomes the "semantic similarity trap" where temporal changes and contradictions fool vector search.
Cost Analysis¶
Each question requires 11-15 LLM calls: - 3 ingestion agents (one-time, amortized) - 3 search agents per question - 1 orchestrator for compilation - 8 or 12 answer variants (depending on strategy) - 1 aggregator (for decision forest strategy)
This makes ASMR impractical for production at scale but demonstrates the ceiling of what's achievable with agentic reasoning.
Open-Source Status (as of 2026-03-24)¶
The blog post promised open-sourcing the ASMR agent flow in "beginning of April" (2026). As of this writing, no ASMR-specific repository has been published on the supermemoryai GitHub organization. The existing open-source repository contains only the production SDK/client code, not the ASMR agent pipeline. This should be re-checked in April 2026.
3. Benchmark Performance¶
LongMemEval-s (Production Engine)¶
| Category | Supermemory | Zep | Full Context (115k tokens) |
|---|---|---|---|
| Overall | 81.6% | 71.2% | 60.2% |
| Multi-Session | 71.43% | 57.9% | 44.3% |
| Temporal Reasoning | 76.69% | 62.4% | 45.1% |
| Knowledge Update | 88.46% | 83.3% | 78.2% |
ASMR Experimental Results (LongMemEval-s)¶
| Configuration | Accuracy |
|---|---|
| 8-Variant Cluster (oracle) | 98.6% |
| 12-Variant Decision Forest | 97.2% |
| Previous production engine | ~85% |
Cross-Benchmark Claims¶
| Benchmark | What It Tests | Claim |
|---|---|---|
| LongMemEval | Long-term memory across sessions with knowledge updates | #1 (81.6%) |
| LoCoMo | Fact recall across extended conversations | #1 |
| ConvoMem | Personalization and preference learning | #1 |
Note: Independent benchmarks from third parties sometimes report different numbers. A DEV.to comparison article estimated Supermemory at ~70% on LoCoMo, lower than self-reported figures. Hindsight achieves 91.4% on LongMemEval, significantly above Supermemory's 81.6%.
4. Comparison with Other Memory Systems¶
| Feature | Supermemory | Mem0 | Letta (MemGPT) | Graphiti/Zep | Hindsight |
|---|---|---|---|---|---|
| Architecture | Memory graph + RAG (unified cloud service) | Vector + optional graph memory | OS-inspired tiered memory (agent self-manages) | Temporal knowledge graph (bi-temporal) | Multi-strategy hybrid (4 parallel retrievers) |
| Open Source | Client SDKs only; engine is closed | Core open source; graph features Pro-tier | Fully open source | Fully open source (Graphiti) | Open source |
| Self-Hosting | Enterprise agreement required | Yes (open-source core) | Yes | Yes | Yes |
| Storage | PostgreSQL + pgvector (Cloudflare) | 24+ vector stores + optional Neo4j | PostgreSQL + vector extensions | Neo4j knowledge graph | Embedded PostgreSQL |
| Memory Extraction | Automatic (LLM-driven) | LLM-driven fact extraction | Agent-controlled (proactive) | Triplet extraction + entity resolution | Fact extraction at write-time |
| Contradiction Handling | Version chain (updates/extends/derives) | LLM semantic deduplication | Agent edits core memory blocks | Temporal edge invalidation | Cross-encoder reranking |
| Temporal Reasoning | Dual timestamp (documentDate + eventDate) + forgetAfter TTL | Basic (via metadata) | No built-in temporal logic | Bi-temporal (valid_at + invalid_at) | Temporal filtering |
| User Profiles | Built-in (static + dynamic, ~50ms) | Not built-in | Core memory blocks (always in context) | Not built-in | Not built-in |
| LongMemEval | 81.6% (production) / ~99% (ASMR) | 58-66% (varies by source) | ~83.2% (estimated) | 71.2% | 91.4% |
| Retrieval Method | Hybrid: vector similarity + memory search; ASMR: LLM-as-retriever | Vector similarity | LLM decides when/what to retrieve | Graph traversal + semantic search | 4-channel RRF fusion |
| Connectors | Google Drive, Gmail, Notion, OneDrive, GitHub | No built-in | No built-in | No built-in | No built-in |
| Pricing | Free tier (1M tokens, 10K searches); paid plans | Open-source free; Pro $249/mo | Open source | Open source | Open source |
Key Differentiators¶
vs Mem0: Supermemory bundles more features (RAG, profiles, connectors) into one API. Mem0 offers broader open-source flexibility with 24+ vector store backends. Supermemory's memory versioning (updates/extends/derives) is more structured than Mem0's LLM-driven deduplication.
vs Letta: Letta's "LLM OS" approach gives agents direct control over memory management (proactive retrieval/editing). Supermemory's memory management is automatic and opaque to the agent -- simpler integration but less agent autonomy.
vs Graphiti/Zep: Graphiti's bi-temporal knowledge graph with explicit entity nodes and edges provides richer relationship modeling. Supermemory's memory graph is similarity-based rather than entity-based. Graphiti is fully open source.
vs Hindsight: Hindsight outperforms Supermemory's production engine on LongMemEval (91.4% vs 81.6%) with a depth-focused, open-source approach. Supermemory offers breadth (RAG + memory + profiles + connectors) over depth.
vs Observational Memory (Mastra Research): Observational Memory achieves 95% on LongMemEval with a specialized observation extraction approach. Supermemory's ASMR exceeds this experimentally (~99%) but at much higher compute cost.
5. Unique Features¶
Automatic Forgetting¶
Memories with forgetAfter dates are automatically expired. Temporal facts like "I have an exam tomorrow" are recognized and given appropriate TTLs. The isForgotten flag allows soft deletion while preserving history.
Memory Graph Visualization¶
The MCP server includes a force-directed graph visualization (memory-graph tool) that renders documents, extracted memories, and similarity-based edges. Uses WebGL canvas for performance with spatial viewport-based loading (/v3/graph/viewport).
Session-Based Ingestion¶
Content is processed session-by-session rather than round-by-round. The /v4/conversations endpoint supports structured messages with smart diffing -- the backend detects when messages are appended to existing conversations and processes only new content.
Framework Middleware Pattern¶
The Vercel AI SDK integration (packages/tools/src/vercel/) implements a middleware pattern that intercepts LLM calls to: (1) inject memories into the system prompt before generation, (2) save the full conversation after the response. Per-turn caching avoids redundant API calls during tool-call loops.
6. Technical Stack¶
| Component | Technology |
|---|---|
| Runtime | Cloudflare Workers |
| Database | PostgreSQL (Hyperdrive pooling) + pgvector |
| Embeddings | Cloudflare AI |
| Web Framework | Hono (API), Next.js (web app) |
| Auth | Better Auth + OAuth 2.0 for MCP |
| MCP | agents/mcp (Cloudflare Durable Objects) |
| Monorepo | Turborepo + Bun |
| Language | TypeScript (primary), Python (SDK, agent-framework) |
| Monitoring | Sentry + PostHog analytics |
7. Open Questions & Limitations¶
-
Closed-source core: The memory engine, ingestion pipeline, and search infrastructure are proprietary. Evaluating the production system requires trusting published benchmarks or running MemoryBench yourself.
-
ASMR cost: 11-15 LLM calls per question makes ASMR impractical for real-time production use. The blog acknowledges this is "not our main production engine (yet)."
-
Vendor lock-in: No self-hosting option outside enterprise agreements. Data sovereignty concerns for regulated industries.
-
Benchmark discrepancies: Self-reported numbers (#1 on all benchmarks) vs independent testing sometimes diverge. Hindsight's 91.4% on LongMemEval significantly exceeds Supermemory's 81.6%.
-
ASMR open-source timeline: Promised for April 2026. Not yet available as of 2026-03-24. Should be re-checked.
-
Production vs ASMR gap: The production engine scores 81.6% while ASMR scores ~99%. Bridging this gap for production use would be the key engineering challenge.
Sources¶
- Supermemory GitHub Repository (accessed: 2026-03-24)
- ASMR Blog Post (accessed: 2026-03-24)
- Supermemory Research Page (accessed: 2026-03-24)
- Supermemory Documentation (accessed: 2026-03-24)
- Memory vs RAG Concepts (accessed: 2026-03-24)
- Vectorize: Best AI Agent Memory Systems 2026 (accessed: 2026-03-24)
- Vectorize: Hindsight vs SuperMemory (accessed: 2026-03-24)
- DEV.to: 5 AI Agent Memory Systems Compared (accessed: 2026-03-24)