🧠 WORKING IMPLEMENTATION β€” Feb 2026

Persistent Agent

Multi-Level Memory Architecture for AI Agents Without Amnesia

Part of the Agent Brain Architecture

By Roman Godz & Π Π΅ΠΌ (Rem)

5:1–25:1
Compression
7+ days
Memory Span
0.845
Search Accuracy
$1.09/day
Cost

Abstract

Current AI agents suffer from episodic amnesia. Each session begins from zero. The agent reads its memory files β€” facts about its past β€” but has no continuity of experience. It is a new entity cosplaying as its predecessor.

This paper introduces Persistent Agent β€” a multi-level memory architecture inspired by the human hippocampal-neocortical consolidation system. Instead of abrupt context deletion, memories flow through five tiers of decreasing fidelity (L0–L4), scored by importance using hippocampal decay functions, and consolidated during idle periods analogous to sleep.

The implementation achieves 5:1 to 25:1 compression ratios, extends effective memory span from 4 hours to 7+ days, and maintains semantic search accuracy of 0.845 cosine similarity for cross-lingual queries β€” all for approximately $1/day.

1. The Amnesia Problem

An AI agent on any modern framework faces the same lifecycle:

Session start→Read files→Work→Context fills→Reset→Amnesia

The agent reads MEMORY.md at boot and learns facts. But it does not remember these events. It reads about them the way you read about Napoleon β€” informatively, not experientially. This is the difference between semantic memory (facts) and episodic memory (lived experience).

Files Are Prosthetics, Not Memory

PropertyHuman MemoryFile StoragePersistent Agent
Continuityβœ… Seamless❌ Read from scratchβœ… State snapshots
Prioritizationβœ… Emotional salience❌ Everything equalβœ… Hippocampus scoring
Gradual decayβœ… Forgetting curve❌ Binaryβœ… Multi-level
Associative recallβœ… Context-triggered❌ Keyword searchβœ… Semantic vectors
Consolidationβœ… Sleep processing❌ Manualβœ… Automated pipeline

2. Neuroscience Foundation

Complementary Learning Systems

McClelland, McNaughton & O'Reilly (1995) proved that biological intelligence requires two fundamentally different learning systems:

πŸƒ Hippocampus

Fast, one-shot encoding. High fidelity, limited capacity. Vulnerable to interference.

β†’ L0 (Hot Context)
🐒 Neocortex

Slow, gradual extraction. Lower fidelity, vast capacity. Resistant to interference.

β†’ L3-L4 (Archive)

Critical insight: you cannot have both fast learning and stable long-term knowledge in a single system. Biology solves this with two systems and a transfer mechanism.

Sharp-Wave Ripples

The transfer mechanism is memory consolidation during sleep via Sharp-Wave Ripples (SWRs) β€” high-frequency oscillations (140–200 Hz) in hippocampal circuits. SWRs replay waking experiences in compressed form, selectively strengthen important memories, and transfer hippocampal traces to neocortical storage.

Crucially, this is not random replay. SWRs preferentially consolidate memories associated with reward, novelty, and emotional salience β€” a biological importance-scoring mechanism.

The Forgetting Curve

R(t) = e(-t/S)

Ebbinghaus (1885) showed retention follows exponential decay. ~56% lost within 1 hour, stabilizing after ~1 day. Each recall increases stability S, resetting the curve higher.

Forgetting is a feature, not a bug. An agent that remembers everything equally is not intelligent β€” it is a database. Intelligence requires selective retention.

Digital Neurotransmitters

Key Insight

Time is secondary. Neurotransmitters determine memory importance at the moment of encoding β€” not hours later.

The most significant gap in purely time-based compression: treating all memories equally until a time threshold triggers compression. In biological systems, neuromodulatory tagging at the moment of encoding determines how strongly a memory trace is formed.

Neurotransmitter β†’ Digital Signal Mapping

NeurotransmitterBiological TriggerDigital Signal
DopamineReward, achievementRevenue, deployments, milestones
NorepinephrineUrgency, dangerCrashes, security incidents, deadlines
CortisolStress, failureErrors, data loss β†’ DNA reflexes
AcetylcholineFocus, learningNew insights, architecture decisions
OxytocinTrust, bondingPhilosophical discussions, relationships
SerotoninSatisfactionTask completion, positive feedback

Implementation: the neurotag.py classifier tags every incoming message at encoding time using pattern matching (~38ms, zero AI cost). The resulting importance score directly controls compression timing:

// Before: time-based (naive)
4 hours β†’ compress to L1 (everything equal)
24 hours β†’ compress to L2
// After: neurotransmitter-driven
importance < 0.3 β†’ compress after 30 min
importance 0.3–0.6 β†’ compress after 4 hours
importance 0.6–0.8 β†’ compress after 24 hours
importance > 0.8 β†’ keep raw for 7+ days
cortisol tagged β†’ also convert to DNA reflex

Production result on 346 messages: 46% classified as trash (importance < 0.15) and dropped without summarization β€” saving both context tokens and API calls. High-importance memories (dopamine, cortisol) persist 14Γ— longer than untagged messages.

3. Multi-Level Memory Compaction

L0: HOT(0-4h)

Full messages, no compression

Budget: ~200K tokens

L1: WARM(4-12h)

3:1 compression β€” key moments preserved

Budget: ~150K tokens

L2: COOL(12h-3d)

10:1 compression β€” decisions and outcomes

Budget: ~100K tokens

L3: COLD(3-7d)

50:1 compression β€” topic one-liners

Budget: ~50K tokens

L4: ARCHIVE(7d+)

MEMORY.md + semantic search

Budget: 0 (outside context)

Buffer Preservation Principle

Never fill context to the limit. Total in-context: ~500K of 1M. Remaining 500K: working space for reasoning.

Rate-distortion theory (Shannon, 1959) proves multi-level compression is optimal: R(D) is convex, meaning early compression gives significant savings at low cost, while later stages accept more distortion for maximum compression. Attempting single-stage 50:1 from raw messages would destroy critical structure.

Comparison with Prior Art

SystemTiersScoringConsolidation
MemGPT2NoneAgent-triggered
Generative Agents2Recency Γ— importanceEnd-of-day
LangChain Buffer2FIFOOn overflow
H-MEM4PositionalStatic
Persistent Agent ✦5Hippocampus decayContinuous + nightly

4. Hippocampus Scoring

Entry Classification

TypeImportanceDecay (Ξ»)FloorExample
decision0.900.030.50"Deploy to production"
user_intent0.800.050.35"Search for alternatives"
context0.500.120.00General conversation
tool_result0.300.200.00API responses
ephemeral0.100.350.00Acknowledgments

Retention Formula

retention(t) = max(floor, importance Γ— e(-Ξ» Γ— t))
KEEP HOTretention β‰₯ 0.65β€” Preserve at current fidelity
COMPRESS0.25 ≀ r < 0.65β€” Standard summarization
DROPretention < 0.25β€” Aggressive compression or skip

First Scoring Run Results

136 messages scored: 49% kept hot (decisions avg 0.87), 43% compressed (context avg 0.40), 7% dropped (ephemeral avg 0.05).

Time determines level. Hippocampus determines priority within level. A critical decision from 6 hours ago stays in L0. An acknowledgment from 2 minutes ago gets compressed immediately.

5. Myelination: Memory β†’ Reflex

In neurobiology, myelination wraps frequently used neural pathways in an insulating sheath, increasing signal speed from 2 m/s to 120 m/s (~100Γ— faster). This is how skills become automatic β€” a pianist plays without thinking about each note.

Recall promotion: memory lives longer
Myelination: memory accessed faster
Unmyelinated(0–2 recalls)

Normal decay, normal search

Biological analog: New neural pathway

Thin Myelin(3–4 recalls)

Resist decay, stay warm

Biological analog: Practiced skill

Thick Myelin(5–9 recalls)

Pre-loaded at boot β€” no search needed

Biological analog: Automatic recall

Reflex (DNA)(10+ recalls)

Written to agent DNA, becomes behavior

Biological analog: Muscle memory

Complete pipeline: experience β†’ memory β†’ recall β†’ myelination β†’ reflex.

Myelinated memories generate two outputs: a preload file injected at session boot (the agent β€œjust knows” without searching), and a DNA file of reflexes β€” behaviors, not memories β€” that persist indefinitely.

No existing agent memory system implements this progression from memory to behavior.

6. Cognitive Snapshots

The most profound gap in agent memory is not factual β€” it is cognitive. An agent reading MEMORY.md knows what happened but not how it was thinking. Human episodic memory encodes not just events but the cognitive context: goals, emotional state, hypotheses, reasoning chains.

State Snapshot

active_goals: [...]
current_focus: "..."
hypotheses: [...]
emotional_context: "..."

Resume your train of thought, not just your facts.

Agent DNA

learned_behaviors: [...]
communication_patterns: [...]
reflexes: [...]
preferences: {...}

Permanent behavioral patterns. Persist indefinitely.

7. Hybrid Storage

πŸ“„ MD Files

Human-readable, editable, transparent, portable. Source of truth.

⚠️ Keyword search only

πŸ” Vector DB

Semantic search by meaning. Scalable to millions of entries.

⚠️ Opaque, non-editable

Hybrid approach: MD files = source of truth β†’ sync β†’ ChromaDB = semantic search index β†’ results point back to files with source + line references.

Search Accuracy

Baseline (MiniLM)
0.31
E5-large (current)
0.845

117 chunks from 18 files. Local multilingual-e5-large (1.1GB, free). ~200ms latency. Finds semantically related content with zero keyword overlap.

8. Pipeline Architecture

Continuous Compaction (every 30 min)

1
Extract β€” Session transcript β†’ tier messages by age
2
Score β€” Hippocampus scoring β†’ partition (keep/compress/drop)
3
Compress β€” LLM summarization (L1: 3:1, L2: 10:1)
4
Index β€” ChromaDB vector sync with multilingual embeddings
5
Myelinate β€” Score pathways, generate preloads
6
Snapshot β€” Save cognitive state (goals, focus, context)

Nightly Defrag (2:30 AM)

Full consolidation cycle β€” the agent's β€œsleep”: L2β†’L3 promotion, L3β†’MEMORY.md transfer, recall analysis, daily notes archival, vector re-index, and self-reflection.

Cost Analysis

ComponentProviderDaily Cost
L1 summarizationGemini Flash$0.96
L2 summarizationGemini Flash$0.12
L3 promotionGemini Flash$0.01
Vector embeddingsLocal (free)$0.00
Total~$1.09/day

9. Brain Suite Integration

Persistent Agent is the runtime orchestrator that unifies four memory protocols:

🧠
hippocampus
Scoring & decay
hippocampus.md
πŸ—‚οΈ
neocortex
Long-term storage
neocortex.md
😴
defrag
Nightly consolidation
defrag.md
πŸ”—
synapse
Multi-agent sharing
synapse.md

10. Future Work

πŸ”„
Continuous learning β€” Without catastrophic forgetting
❀️
Emotional memory weighting β€” Strong emotions resist decay dramatically
πŸ’­
Dream-like creative synthesis β€” Recombine memories for novel insights during defrag
πŸ”—
Cross-agent memory transfer β€” Share compressed knowledge via synapse.md
πŸ“Š
Formal verification β€” Measure actual information loss per compression tier

References

  1. Park et al. (2023). "Generative Agents: Interactive Simulacra of Human Behavior." ACM UIST.
  2. Packer et al. (2023). "MemGPT: Towards LLMs as Operating Systems." arXiv:2310.08560.
  3. McClelland et al. (1995). "Why There Are Complementary Learning Systems." Psychological Review, 102(3).
  4. Kumaran et al. (2016). "What Learning Systems do Intelligent Agents Need?" Trends in Cognitive Sciences.
  5. BuzsΓ‘ki (2015). "Hippocampal Sharp Wave-Ripple: A Cognitive Biomarker." Hippocampus, 25(10).
  6. Ebbinghaus (1885). Über das GedÀchtnis. Replicated: Murre & Dros (2015).
  7. Shannon (1959). "Coding Theorems for a Discrete Source with a Fidelity Criterion."
  8. Baddeley (2000). "The episodic buffer." Trends in Cognitive Sciences, 4(11).
  9. H-MEM (2025). "Hierarchical Memory for LLM Agents." arXiv:2507.22925.
  10. HMT (2025). "Hierarchical Memory Transformer." NAACL.