Persistent Agent
Multi-Level Memory Architecture for AI Agents Without Amnesia
Part of the Agent Brain Architecture
By Roman Godz & Π Π΅ΠΌ (Rem)
Abstract
Current AI agents suffer from episodic amnesia. Each session begins from zero. The agent reads its memory files β facts about its past β but has no continuity of experience. It is a new entity cosplaying as its predecessor.
This paper introduces Persistent Agent β a multi-level memory architecture inspired by the human hippocampal-neocortical consolidation system. Instead of abrupt context deletion, memories flow through five tiers of decreasing fidelity (L0βL4), scored by importance using hippocampal decay functions, and consolidated during idle periods analogous to sleep.
The implementation achieves 5:1 to 25:1 compression ratios, extends effective memory span from 4 hours to 7+ days, and maintains semantic search accuracy of 0.845 cosine similarity for cross-lingual queries β all for approximately $1/day.
1. The Amnesia Problem
An AI agent on any modern framework faces the same lifecycle:
The agent reads MEMORY.md at boot and learns facts. But it does not remember these events. It reads about them the way you read about Napoleon β informatively, not experientially. This is the difference between semantic memory (facts) and episodic memory (lived experience).
Files Are Prosthetics, Not Memory
| Property | Human Memory | File Storage | Persistent Agent |
|---|---|---|---|
| Continuity | β Seamless | β Read from scratch | β State snapshots |
| Prioritization | β Emotional salience | β Everything equal | β Hippocampus scoring |
| Gradual decay | β Forgetting curve | β Binary | β Multi-level |
| Associative recall | β Context-triggered | β Keyword search | β Semantic vectors |
| Consolidation | β Sleep processing | β Manual | β Automated pipeline |
2. Neuroscience Foundation
Complementary Learning Systems
McClelland, McNaughton & O'Reilly (1995) proved that biological intelligence requires two fundamentally different learning systems:
Fast, one-shot encoding. High fidelity, limited capacity. Vulnerable to interference.
Slow, gradual extraction. Lower fidelity, vast capacity. Resistant to interference.
Critical insight: you cannot have both fast learning and stable long-term knowledge in a single system. Biology solves this with two systems and a transfer mechanism.
Sharp-Wave Ripples
The transfer mechanism is memory consolidation during sleep via Sharp-Wave Ripples (SWRs) β high-frequency oscillations (140β200 Hz) in hippocampal circuits. SWRs replay waking experiences in compressed form, selectively strengthen important memories, and transfer hippocampal traces to neocortical storage.
Crucially, this is not random replay. SWRs preferentially consolidate memories associated with reward, novelty, and emotional salience β a biological importance-scoring mechanism.
The Forgetting Curve
Ebbinghaus (1885) showed retention follows exponential decay. ~56% lost within 1 hour, stabilizing after ~1 day. Each recall increases stability S, resetting the curve higher.
Forgetting is a feature, not a bug. An agent that remembers everything equally is not intelligent β it is a database. Intelligence requires selective retention.
Digital Neurotransmitters
Key Insight
Time is secondary. Neurotransmitters determine memory importance at the moment of encoding β not hours later.
The most significant gap in purely time-based compression: treating all memories equally until a time threshold triggers compression. In biological systems, neuromodulatory tagging at the moment of encoding determines how strongly a memory trace is formed.
Neurotransmitter β Digital Signal Mapping
| Neurotransmitter | Biological Trigger | Digital Signal |
|---|---|---|
| Dopamine | Reward, achievement | Revenue, deployments, milestones |
| Norepinephrine | Urgency, danger | Crashes, security incidents, deadlines |
| Cortisol | Stress, failure | Errors, data loss β DNA reflexes |
| Acetylcholine | Focus, learning | New insights, architecture decisions |
| Oxytocin | Trust, bonding | Philosophical discussions, relationships |
| Serotonin | Satisfaction | Task completion, positive feedback |
Implementation: the neurotag.py classifier tags every incoming message at encoding time using pattern matching (~38ms, zero AI cost). The resulting importance score directly controls compression timing:
Production result on 346 messages: 46% classified as trash (importance < 0.15) and dropped without summarization β saving both context tokens and API calls. High-importance memories (dopamine, cortisol) persist 14Γ longer than untagged messages.
3. Multi-Level Memory Compaction
Full messages, no compression
Budget: ~200K tokens
3:1 compression β key moments preserved
Budget: ~150K tokens
10:1 compression β decisions and outcomes
Budget: ~100K tokens
50:1 compression β topic one-liners
Budget: ~50K tokens
MEMORY.md + semantic search
Budget: 0 (outside context)
Buffer Preservation Principle
Never fill context to the limit. Total in-context: ~500K of 1M. Remaining 500K: working space for reasoning.
Rate-distortion theory (Shannon, 1959) proves multi-level compression is optimal: R(D) is convex, meaning early compression gives significant savings at low cost, while later stages accept more distortion for maximum compression. Attempting single-stage 50:1 from raw messages would destroy critical structure.
Comparison with Prior Art
| System | Tiers | Scoring | Consolidation |
|---|---|---|---|
| MemGPT | 2 | None | Agent-triggered |
| Generative Agents | 2 | Recency Γ importance | End-of-day |
| LangChain Buffer | 2 | FIFO | On overflow |
| H-MEM | 4 | Positional | Static |
| Persistent Agent β¦ | 5 | Hippocampus decay | Continuous + nightly |
4. Hippocampus Scoring
Entry Classification
| Type | Importance | Decay (Ξ») | Floor | Example |
|---|---|---|---|---|
| decision | 0.90 | 0.03 | 0.50 | "Deploy to production" |
| user_intent | 0.80 | 0.05 | 0.35 | "Search for alternatives" |
| context | 0.50 | 0.12 | 0.00 | General conversation |
| tool_result | 0.30 | 0.20 | 0.00 | API responses |
| ephemeral | 0.10 | 0.35 | 0.00 | Acknowledgments |
Retention Formula
First Scoring Run Results
136 messages scored: 49% kept hot (decisions avg 0.87), 43% compressed (context avg 0.40), 7% dropped (ephemeral avg 0.05).
Time determines level. Hippocampus determines priority within level. A critical decision from 6 hours ago stays in L0. An acknowledgment from 2 minutes ago gets compressed immediately.
5. Myelination: Memory β Reflex
In neurobiology, myelination wraps frequently used neural pathways in an insulating sheath, increasing signal speed from 2 m/s to 120 m/s (~100Γ faster). This is how skills become automatic β a pianist plays without thinking about each note.
Normal decay, normal search
Biological analog: New neural pathway
Resist decay, stay warm
Biological analog: Practiced skill
Pre-loaded at boot β no search needed
Biological analog: Automatic recall
Written to agent DNA, becomes behavior
Biological analog: Muscle memory
Complete pipeline: experience β memory β recall β myelination β reflex.
Myelinated memories generate two outputs: a preload file injected at session boot (the agent βjust knowsβ without searching), and a DNA file of reflexes β behaviors, not memories β that persist indefinitely.
No existing agent memory system implements this progression from memory to behavior.
6. Cognitive Snapshots
The most profound gap in agent memory is not factual β it is cognitive. An agent reading MEMORY.md knows what happened but not how it was thinking. Human episodic memory encodes not just events but the cognitive context: goals, emotional state, hypotheses, reasoning chains.
State Snapshot
Resume your train of thought, not just your facts.
Agent DNA
Permanent behavioral patterns. Persist indefinitely.
7. Hybrid Storage
Human-readable, editable, transparent, portable. Source of truth.
β οΈ Keyword search only
Semantic search by meaning. Scalable to millions of entries.
β οΈ Opaque, non-editable
Hybrid approach: MD files = source of truth β sync β ChromaDB = semantic search index β results point back to files with source + line references.
Search Accuracy
117 chunks from 18 files. Local multilingual-e5-large (1.1GB, free). ~200ms latency. Finds semantically related content with zero keyword overlap.
8. Pipeline Architecture
Continuous Compaction (every 30 min)
Nightly Defrag (2:30 AM)
Full consolidation cycle β the agent's βsleepβ: L2βL3 promotion, L3βMEMORY.md transfer, recall analysis, daily notes archival, vector re-index, and self-reflection.
Cost Analysis
| Component | Provider | Daily Cost |
|---|---|---|
| L1 summarization | Gemini Flash | $0.96 |
| L2 summarization | Gemini Flash | $0.12 |
| L3 promotion | Gemini Flash | $0.01 |
| Vector embeddings | Local (free) | $0.00 |
| Total | ~$1.09/day | |
9. Brain Suite Integration
Persistent Agent is the runtime orchestrator that unifies four memory protocols:
10. Future Work
References
- Park et al. (2023). "Generative Agents: Interactive Simulacra of Human Behavior." ACM UIST.
- Packer et al. (2023). "MemGPT: Towards LLMs as Operating Systems." arXiv:2310.08560.
- McClelland et al. (1995). "Why There Are Complementary Learning Systems." Psychological Review, 102(3).
- Kumaran et al. (2016). "What Learning Systems do Intelligent Agents Need?" Trends in Cognitive Sciences.
- BuzsΓ‘ki (2015). "Hippocampal Sharp Wave-Ripple: A Cognitive Biomarker." Hippocampus, 25(10).
- Ebbinghaus (1885). Γber das GedΓ€chtnis. Replicated: Murre & Dros (2015).
- Shannon (1959). "Coding Theorems for a Discrete Source with a Fidelity Criterion."
- Baddeley (2000). "The episodic buffer." Trends in Cognitive Sciences, 4(11).
- H-MEM (2025). "Hierarchical Memory for LLM Agents." arXiv:2507.22925.
- HMT (2025). "Hierarchical Memory Transformer." NAACL.