Character.AI: Character Training at Scale¶

Last Updated: 2026-03-24

Overview¶

Character.AI (C.AI) is the most significant production system for personality-trained LLMs. Founded by Noam Shazeer and Daniel De Freitas — core authors of Google's LaMDA — it processes 30,000 messages/second and was the third most-used generative AI application globally.

Unlike Neuro-sama (one developer, one character, iterative SFT), Character.AI operates a platform where millions of users create characters, and the system learns to embody any character definition through a combination of proprietary post-training and prompt-based character definitions.

Founding & History¶

Date	Event
Nov 2021	Founded by Noam Shazeer and Daniel De Freitas, both ex-Google Brain
2022	Built proprietary foundation model from scratch (complete pre-training + post-training stack)
2023	Rapid growth to one of the top 3 generative AI apps globally
Aug 2024	Google licenses C.AI's research, hires 32 researchers including the entire pre-training team. Shazeer returns to Google
Post-2024	C.AI pivots: shifts from self-built foundation models to third-party pre-trained models + proprietary post-training

Key context: Shazeer is one of the 8 authors of "Attention Is All You Need" (2017). The team that built LaMDA at Google is the same team that built Character.AI's foundation model.

Technical Architecture¶

Four-Layer System¶

┌─────────────────────────────────────────────────────┐
│  Layer 4: User Feedback                             │
│  Star ratings (1-4) + message editing               │
│  → Affects per-character response selection          │
│  → Does NOT modify base model weights               │
└─────────────────────┬───────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────┐
│  Layer 3: Prompt Layer (User-Facing)                │
│  Character definition: name, description, greeting,  │
│  personality, example dialogues                      │
│  → Managed by Prompt Poets (YAML + Jinja)           │
│  → Smart truncation (character def prioritized)      │
└─────────────────────┬───────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────┐
│  Layer 2: Character Training (Post-Training)        │
│  DPO with synthetic constitutional data             │
│  → "I am..." personality constitutions              │
│  → Teaches model to generalize from ANY character   │
│    definition, not just specific characters          │
│  ★ Core competitive moat, fully proprietary         │
└─────────────────────┬───────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────┐
│  Layer 1: Foundation Model (Pre-Training)           │
│  Originally: proprietary LaMDA-derived model        │
│  Now: third-party models + proprietary post-training│
│  Multi-Query Attention, native int8 training        │
└─────────────────────────────────────────────────────┘

Layer 1: Foundation Model¶

Custom pre-trained from day zero, not fine-tuned from an existing model
Based on team's LaMDA expertise at Google
Multi-Query Attention (MQA): 5x reduction in GPU KV cache requirements
Native int8 training: eliminates training/serving precision mismatch
KV cache sharing: ties KV cache across neighboring attention layers, further 2-3x reduction
Parameter count: never publicly disclosed
After Google deal (2024): pivoting to hybrid approach (third-party pre-trained + proprietary post-training)

Layer 2: Character Training (The Core Innovation)¶

This is the most research-relevant layer. Character Training is a specialized form of post-training that teaches the model how to embody any character description, not how to be a specific character.

What Character Training is: - A subset of post-training focused on "crafting traits within the model in the manner of its response" (interconnects.ai) - Uses Constitutional AI variants adapted for personality - Generates synthetic training data through a pipeline: 1. Define personality as "I am..." constitutional statements (not "Choose the response that..." preference comparisons) 2. Model generates queries relevant to target traits 3. Model generates responses under constitutional guidance 4. Constitution-guided responses vs base model responses form DPO preference pairs 5. Standard DPO training encodes personality generalization into weights

What Character Training is NOT: - Not per-character fine-tuning (they don't train a separate model for each character) - Not prompt engineering (it changes weights) - Not regular RLHF (it's personality-specific, not general helpfulness)

Key insight from Nathan Lambert: "crafting a specific personality from scratch is an open question" — Character Training remains "more of an art than a hill to climb up with careful data engineering."

Evaluation: They use a ModernBERT classifier that predicts which of 11 characters an output most likely came from. This quantifies personality strength across different training interventions.

Anthropic uses a similar approach for Claude. Amanda Askell (Anthropic): "It's like constitutional AI, but it's without any human data" — indicating fully synthetic, model-generated training data for personality.

Layer 3: Prompt Layer¶

Users define characters through structured prompts:

Field	Max Length	Purpose
Name	—	Character identifier
Short Description	50 chars	Concise system description
Greeting	500 chars	First message, sets tone and style. Single most influential element for personality consistency
Personality / Description	Unlimited	Detailed traits, backstory, knowledge
Example Dialogues	Unlimited	Conversation samples defining speech patterns

These are managed by Prompt Poets (originally internal tool "Hermies", later open-sourced): - YAML structure + Jinja templating for runtime variables and control flow - Smart truncation when context overflows: character definitions are prioritized over conversation history - Handles group chat (multiple character definitions in one prompt)

Layer 4: User Feedback¶

1-4 star ratings on each response
Rating "predominantly affects the specific character, but also affects behavioral selection as a whole"
Official position: feedback does NOT alter base model weights
Mechanism: likely affects response ranking/selection, not training — a form of implicit RLHF without gradient updates
Users can also edit messages and provide written feedback

Infrastructure & Scale¶

Metric	Value
Messages per second	30,000
Bandwidth	7-8 GB/s on primary generation path
Open connections	~400,000
P50 response time	~12.5 seconds
Cache hit rate	95% (only compute the new user message, ~5% of total)
Scale timeline	300 → 30,000 msg/s in 18 months

The 95% cache hit rate is critical: successive messages in a conversation are nearly identical (same character definition + conversation history, plus one new user message). Without this optimization, the platform would be economically unviable.

Character.AI vs Neuro-sama¶

Dimension	Character.AI	Neuro-sama
Scale	Millions of characters	2 characters (Neuro + Evil)
Personality training	DPO + synthetic constitutional data	Iterative SFT on curated stream data
Model	Proprietary (param count unknown)	Custom 2B, q2_k
Character definition	User prompt (anyone can create)	Developer-curated training data
Learning signal	Star ratings + message edits	Vedal's manual curation
Core approach	Train model to generalize from any character description	Train model to BE a specific character
Technical disclosure	Minimal (some blog posts, one scaling talk)	Minimal (stream comments, interviews)

The key architectural difference: C.AI trains a meta-character model (can become any character given a description), while Neuro-sama trains a specific-character model (IS Neuro-sama in the weights).

What This Reveals About Character Training¶

The Industry Pattern¶

Three labs have now confirmed using constitutional/synthetic post-training for personality:

Lab	Method	Public Detail
Character.AI	DPO + personality constitutions	Minimal (Nathan Lambert's analysis)
Anthropic (Claude)	Constitutional AI for character	Amanda Askell confirmed methodology, no paper
OpenAI (GPT-4o)	Unknown, but dramatic personality shifts observed between versions	Zero disclosure

Nathan Lambert: "frontier labs lack public documentation of personality changes."

Character Training vs Academic Personality Methods¶

Dimension	Character Training (C.AI/Anthropic)	Academic (BIG5-CHAT, FinePE, PERSONA)
Goal	Generalize to ANY character	Specific Big Five trait control
Data	Synthetic, model-generated	Annotated human data (BIG5-CHAT) or contrastive activations (PERSONA)
Evaluation	Custom classifier (11 characters)	Psychometric tests (BFI, IPIP-NEO)
Human involvement	"Without any human data" (Askell)	Varies (BIG5-CHAT uses Facebook posts)
Production	Yes (billions of messages)	No

The Meta-Character Insight¶

Character.AI's most important contribution to the field is the concept of meta-character training: instead of training a model to have a personality, train it to adopt any personality given a description. This is:

More scalable — one model serves millions of characters
More flexible — new characters don't need retraining
Closer to how humans work — we can "imagine being" different characters
The reason Layer 2 (training) and Layer 3 (prompts) are both needed — training gives the model the ability to embody characters, prompts specify which character

Open Questions¶

What happened after the Google deal? With the pre-training team gone, how much has the architecture changed? Are they now fine-tuning Gemini models with their post-training pipeline?
Does the star rating actually feed back into training? The official position is "no weight changes," but accumulated ratings across millions of interactions could be valuable DPO signal. It's plausible they batch this data into periodic retraining.
How does Character Training compare to activation engineering? PERSONA achieves 9.60 vs SFT's 9.61 on PersonalityBench. Would Character Training score higher on the same benchmark? No one has tested this.
Can the meta-character approach be open-sourced? OpenCharacter (arxiv 2501.15427) attempts this with 20K synthetic characters + SFT, achieving GPT-4o-level role-playing. Is this architecturally equivalent to what C.AI does, or is there a DPO-specific advantage?
Is the "I am..." constitution format important? C.AI uses first-person constitutional statements instead of third-person preference comparisons. Is this a meaningful design choice, or just a convention?

Connection to Continuous Learning Research¶

Character.AI bridges Pillar 1 (Memory) and Pillar 3 (Learning):

Layer 2 (Character Training) is Pillar 3: personality in weights via post-training
Layer 3 (Prompts) is Pillar 1: character definitions as external context
Layer 4 (Feedback) is a potential Pillar 3 signal source: if ratings are ever used for retraining, this closes the continuous learning loop

The meta-character approach also suggests a fourth pillar: Meta-Learning — not learning specific facts or personalities, but learning how to learn from character descriptions. This is qualitatively different from both memory retrieval and weight updates.

References¶

Character.AI Official¶

Character.AI Character Book — official documentation on character creation
Building a Better AI Together — user role in training
Our Next Phase of Growth — Google deal, pivot to third-party models
Optimizing AI Inference — MQA, int8 training, KV cache

Analysis¶

Character Training — interconnects.ai — Nathan Lambert on character training concept, Anthropic's approach
Opening the Character Training Pipeline — interconnects.ai — DPO + constitutional data, ModernBERT evaluation
Character.AI Scaling Talk — ZenML — infrastructure, Prompt Poets, 95% cache hit rate

OpenCharacter — arxiv 2501.15427 — open-source attempt at meta-character training
Character-LLM — arxiv 2310.10158 — trainable agent for role-playing