🧠 WORKING IMPLEMENTATION — Feb 2026

Persistent Agent

Multi-Level Memory Architecture for AI Agents Without Amnesia

Part of the Agent Brain Architecture

By Roman Godz & Рем (Rem)

5:1–25:1

Compression

7+ days

Memory Span

0.845

Search Accuracy

$1.09/day

Cost

Abstract 1. The Amnesia Problem 2. Neuroscience + Neurotransmitters 3. Multi-Level Compaction 4. Hippocampus Scoring 5. Myelination 6. Cognitive Snapshots 7. Hybrid Storage 8. Pipeline 9. Brain Suite 10. Future Work References

Abstract

Current AI agents suffer from episodic amnesia. Each session begins from zero. The agent reads its memory files — facts about its past — but has no continuity of experience. It is a new entity cosplaying as its predecessor.

This paper introduces Persistent Agent — a multi-level memory architecture inspired by the human hippocampal-neocortical consolidation system. Instead of abrupt context deletion, memories flow through five tiers of decreasing fidelity (L0–L4), scored by importance using hippocampal decay functions, and consolidated during idle periods analogous to sleep.

The implementation achieves 5:1 to 25:1 compression ratios, extends effective memory span from 4 hours to 7+ days, and maintains semantic search accuracy of 0.845 cosine similarity for cross-lingual queries — all for approximately $1/day.

1. The Amnesia Problem

An AI agent on any modern framework faces the same lifecycle:

Session start→Read files→Work→Context fills→Reset→Amnesia

The agent reads MEMORY.md at boot and learns facts. But it does not remember these events. It reads about them the way you read about Napoleon — informatively, not experientially. This is the difference between semantic memory (facts) and episodic memory (lived experience).

Files Are Prosthetics, Not Memory

Property	Human Memory	File Storage	Persistent Agent
Continuity	✅ Seamless	❌ Read from scratch	✅ State snapshots
Prioritization	✅ Emotional salience	❌ Everything equal	✅ Hippocampus scoring
Gradual decay	✅ Forgetting curve	❌ Binary	✅ Multi-level
Associative recall	✅ Context-triggered	❌ Keyword search	✅ Semantic vectors
Consolidation	✅ Sleep processing	❌ Manual	✅ Automated pipeline

2. Neuroscience Foundation

Complementary Learning Systems

McClelland, McNaughton & O'Reilly (1995) proved that biological intelligence requires two fundamentally different learning systems:

🏃 Hippocampus

Fast, one-shot encoding. High fidelity, limited capacity. Vulnerable to interference.

→ L0 (Hot Context)

🐢 Neocortex

Slow, gradual extraction. Lower fidelity, vast capacity. Resistant to interference.

→ L3-L4 (Archive)

Critical insight: you cannot have both fast learning and stable long-term knowledge in a single system. Biology solves this with two systems and a transfer mechanism.

Sharp-Wave Ripples

The transfer mechanism is memory consolidation during sleep via Sharp-Wave Ripples (SWRs) — high-frequency oscillations (140–200 Hz) in hippocampal circuits. SWRs replay waking experiences in compressed form, selectively strengthen important memories, and transfer hippocampal traces to neocortical storage.

Crucially, this is not random replay. SWRs preferentially consolidate memories associated with reward, novelty, and emotional salience — a biological importance-scoring mechanism.

The Forgetting Curve

R(t) = e^(-t/S)

Ebbinghaus (1885) showed retention follows exponential decay. ~56% lost within 1 hour, stabilizing after ~1 day. Each recall increases stability S, resetting the curve higher.

Forgetting is a feature, not a bug. An agent that remembers everything equally is not intelligent — it is a database. Intelligence requires selective retention.

Digital Neurotransmitters

Key Insight

Time is secondary. Neurotransmitters determine memory importance at the moment of encoding — not hours later.

The most significant gap in purely time-based compression: treating all memories equally until a time threshold triggers compression. In biological systems, neuromodulatory tagging at the moment of encoding determines how strongly a memory trace is formed.

Neurotransmitter → Digital Signal Mapping

Neurotransmitter	Biological Trigger	Digital Signal
Dopamine	Reward, achievement	Revenue, deployments, milestones
Norepinephrine	Urgency, danger	Crashes, security incidents, deadlines
Cortisol	Stress, failure	Errors, data loss → DNA reflexes
Acetylcholine	Focus, learning	New insights, architecture decisions
Oxytocin	Trust, bonding	Philosophical discussions, relationships
Serotonin	Satisfaction	Task completion, positive feedback

Implementation: the neurotag.py classifier tags every incoming message at encoding time using pattern matching (~38ms, zero AI cost). The resulting importance score directly controls compression timing:

// Before: time-based (naive)

4 hours → compress to L1 (everything equal)

24 hours → compress to L2

// After: neurotransmitter-driven

importance < 0.3 → compress after 30 min

importance 0.3–0.6 → compress after 4 hours

importance 0.6–0.8 → compress after 24 hours

importance > 0.8 → keep raw for 7+ days

cortisol tagged → also convert to DNA reflex

Production result on 346 messages: 46% classified as trash (importance < 0.15) and dropped without summarization — saving both context tokens and API calls. High-importance memories (dopamine, cortisol) persist 14× longer than untagged messages.

3. Multi-Level Memory Compaction

L0: HOT(0-4h)

Full messages, no compression

Budget: ~200K tokens

L1: WARM(4-12h)

3:1 compression — key moments preserved

Budget: ~150K tokens

L2: COOL(12h-3d)

10:1 compression — decisions and outcomes

Budget: ~100K tokens

L3: COLD(3-7d)

50:1 compression — topic one-liners

Budget: ~50K tokens

L4: ARCHIVE(7d+)

MEMORY.md + semantic search

Budget: 0 (outside context)

Buffer Preservation Principle

Never fill context to the limit. Total in-context: ~500K of 1M. Remaining 500K: working space for reasoning.

Rate-distortion theory (Shannon, 1959) proves multi-level compression is optimal: R(D) is convex, meaning early compression gives significant savings at low cost, while later stages accept more distortion for maximum compression. Attempting single-stage 50:1 from raw messages would destroy critical structure.

Comparison with Prior Art

System	Tiers	Scoring	Consolidation
MemGPT	2	None	Agent-triggered
Generative Agents	2	Recency × importance	End-of-day
LangChain Buffer	2	FIFO	On overflow
H-MEM	4	Positional	Static
Persistent Agent ✦	5	Hippocampus decay	Continuous + nightly

4. Hippocampus Scoring

Entry Classification

Type	Importance	Decay (λ)	Floor	Example
decision	0.90	0.03	0.50	"Deploy to production"
user_intent	0.80	0.05	0.35	"Search for alternatives"
context	0.50	0.12	0.00	General conversation
tool_result	0.30	0.20	0.00	API responses
ephemeral	0.10	0.35	0.00	Acknowledgments

Retention Formula

retention(t) = max(floor, importance × e^{(-λ × t)})

KEEP HOTretention ≥ 0.65— Preserve at current fidelity

COMPRESS0.25 ≤ r < 0.65— Standard summarization

DROPretention < 0.25— Aggressive compression or skip

First Scoring Run Results

136 messages scored: 49% kept hot (decisions avg 0.87), 43% compressed (context avg 0.40), 7% dropped (ephemeral avg 0.05).

Time determines level. Hippocampus determines priority within level. A critical decision from 6 hours ago stays in L0. An acknowledgment from 2 minutes ago gets compressed immediately.

5. Myelination: Memory → Reflex

In neurobiology, myelination wraps frequently used neural pathways in an insulating sheath, increasing signal speed from 2 m/s to 120 m/s (~100× faster). This is how skills become automatic — a pianist plays without thinking about each note.

Recall promotion: memory lives longer

Myelination: memory accessed faster

Unmyelinated(0–2 recalls)

Normal decay, normal search

Biological analog: New neural pathway

Thin Myelin(3–4 recalls)

Resist decay, stay warm

Biological analog: Practiced skill

Thick Myelin(5–9 recalls)

Pre-loaded at boot — no search needed

Biological analog: Automatic recall

Reflex (DNA)(10+ recalls)

Written to agent DNA, becomes behavior

Biological analog: Muscle memory

Complete pipeline: experience → memory → recall → myelination → reflex.

Myelinated memories generate two outputs: a preload file injected at session boot (the agent “just knows” without searching), and a DNA file of reflexes — behaviors, not memories — that persist indefinitely.

No existing agent memory system implements this progression from memory to behavior.

6. Cognitive Snapshots

The most profound gap in agent memory is not factual — it is cognitive. An agent reading MEMORY.md knows what happened but not how it was thinking. Human episodic memory encodes not just events but the cognitive context: goals, emotional state, hypotheses, reasoning chains.

State Snapshot

active_goals: [...]

current_focus: "..."

hypotheses: [...]

emotional_context: "..."

Resume your train of thought, not just your facts.

Agent DNA

learned_behaviors: [...]

communication_patterns: [...]

reflexes: [...]

preferences: {...}

Permanent behavioral patterns. Persist indefinitely.

7. Hybrid Storage

📄 MD Files

Human-readable, editable, transparent, portable. Source of truth.

⚠️ Keyword search only

🔍 Vector DB

Semantic search by meaning. Scalable to millions of entries.

⚠️ Opaque, non-editable

Hybrid approach: MD files = source of truth → sync → ChromaDB = semantic search index → results point back to files with source + line references.

Search Accuracy

Baseline (MiniLM)

0.31

E5-large (current)

0.845

117 chunks from 18 files. Local multilingual-e5-large (1.1GB, free). ~200ms latency. Finds semantically related content with zero keyword overlap.

8. Pipeline Architecture

Continuous Compaction (every 30 min)

Extract — Session transcript → tier messages by age

Score — Hippocampus scoring → partition (keep/compress/drop)

Compress — LLM summarization (L1: 3:1, L2: 10:1)

Index — ChromaDB vector sync with multilingual embeddings

Myelinate — Score pathways, generate preloads

Snapshot — Save cognitive state (goals, focus, context)

Nightly Defrag (2:30 AM)

Full consolidation cycle — the agent's “sleep”: L2→L3 promotion, L3→MEMORY.md transfer, recall analysis, daily notes archival, vector re-index, and self-reflection.

Cost Analysis

Component	Provider	Daily Cost
L1 summarization	Gemini Flash	$0.96
L2 summarization	Gemini Flash	$0.12
L3 promotion	Gemini Flash	$0.01
Vector embeddings	Local (free)	$0.00
Total		~$1.09/day

9. Brain Suite Integration

Persistent Agent is the runtime orchestrator that unifies four memory protocols:

🧠

hippocampus

Scoring & decay

hippocampus.md

🗂️

neocortex

Long-term storage

neocortex.md

😴

defrag

Nightly consolidation

defrag.md

🔗

synapse

Multi-agent sharing

synapse.md

hippocampus.md →defrag.md →synapse.md →

10. Future Work

🔄

Continuous learning — Without catastrophic forgetting

❤️

Emotional memory weighting — Strong emotions resist decay dramatically

💭

Dream-like creative synthesis — Recombine memories for novel insights during defrag

🔗

Cross-agent memory transfer — Share compressed knowledge via synapse.md

📊

Formal verification — Measure actual information loss per compression tier

References

Park et al. (2023). "Generative Agents: Interactive Simulacra of Human Behavior." ACM UIST.
Packer et al. (2023). "MemGPT: Towards LLMs as Operating Systems." arXiv:2310.08560.
McClelland et al. (1995). "Why There Are Complementary Learning Systems." Psychological Review, 102(3).
Kumaran et al. (2016). "What Learning Systems do Intelligent Agents Need?" Trends in Cognitive Sciences.
Buzsáki (2015). "Hippocampal Sharp Wave-Ripple: A Cognitive Biomarker." Hippocampus, 25(10).
Ebbinghaus (1885). Über das Gedächtnis. Replicated: Murre & Dros (2015).
Shannon (1959). "Coding Theorems for a Discrete Source with a Fidelity Criterion."
Baddeley (2000). "The episodic buffer." Trends in Cognitive Sciences, 4(11).
H-MEM (2025). "Hierarchical Memory for LLM Agents." arXiv:2507.22925.
HMT (2025). "Hierarchical Memory Transformer." NAACL.

Persistent Agent

Table of Contents

Abstract

1. The Amnesia Problem

Files Are Prosthetics, Not Memory

2. Neuroscience Foundation

Complementary Learning Systems

Sharp-Wave Ripples

The Forgetting Curve

Digital Neurotransmitters

Neurotransmitter → Digital Signal Mapping

3. Multi-Level Memory Compaction

Comparison with Prior Art

4. Hippocampus Scoring

Entry Classification

Retention Formula

First Scoring Run Results

5. Myelination: Memory → Reflex

6. Cognitive Snapshots

State Snapshot

Agent DNA

7. Hybrid Storage

Search Accuracy

8. Pipeline Architecture

Continuous Compaction (every 30 min)

Nightly Defrag (2:30 AM)

Cost Analysis

9. Brain Suite Integration

10. Future Work

References