Every LLM you deploy has the same condition: anterograde amnesia. It can't form new long-term memories. Every conversation starts from zero. Whatever it learned during pre-training is all it will ever know — unless you retrain it.
Behrouz et al. made this observation precise in their Nested Learning paper (NeurIPS 2025). They showed that neural networks aren't monolithic function approximators — they're hierarchies of associative memories, each operating at a different timescale. Training builds these memories across multiple frequency bands, from fast per-token adaptation down to slow foundational weight updates. When you deploy a model, you freeze all of those timescales at once. The model can still think. It just can't remember.
The brain solved this problem a long time ago. Memory consolidation doesn't happen in the cortex — it happens in the hippocampus, a separate structure that receives experiences, organizes them across timescales, and feeds consolidated knowledge back into cortical processing. The cortex reasons. The hippocampus remembers. They're architecturally separate.
We built the same separation for LLMs.
The Harness Layer as External Hippocampus
The software that surrounds a deployed model — the harness layer — can provide exactly the multi-timescale memory that deployment froze. Not RAG (which operates at a single timescale with no consolidation). Not fine-tuning (which is expensive, slow, and in regulated environments, a compliance hazard). A full memory hierarchy:
- Conversations (fast, ephemeral) — the current session
- Working summaries (days to weeks) — what you're working on, what you've tried
- Lessons (weeks to months) — transferable knowledge extracted from experience
- Institutional knowledge (months to years) — organized domain understanding
- Domain ontologies (persistent) — foundational schemas that change only when the domain does
Each layer compresses from the one above it. Not every conversation becomes a summary. Not every summary yields a lesson. The compression is lossy by design — just like biological memory consolidation.
The model never knows it's been augmented. It receives a prompt that happens to contain the right context at the right time, injected through hooks that fire at session start, before each turn, and at session end. The harness handles learning. The model handles reasoning. And the separation is what makes it work.
The Evidence: Fine-Tuning Hit a Wall. External Memory Broke Through It.
We tested this on a real problem: automated CMMC Level 2 compliance assessment against Microsoft 365 configurations. Four rounds of LoRA fine-tuning on Gemma 3 12B produced a hard accuracy ceiling:
- Round 1: Aggressive learning rate. Complete mode collapse — 0% useful accuracy.
- Round 2: Fixed. 62% accuracy on verdict classification.
- Round 3: More training data. 84% accuracy.
- Round 4: Even more data. Still 84%. Some previously-correct items degraded.
Fine-tuning plateaued. More data didn't help. The model had learned the process of compliance assessment but couldn't internalize the specific knowledge of what each CMMC objective requires.
Then we added a 640KB static context map — per-objective decision guides injected at inference time. Same fine-tuned model. No additional training.
Result: 100% accuracy. Every item correct.
The fine-tuned model provides the reasoning pattern. External memory provides the domain facts. Different timescales, different mechanisms, same system. This is exactly what Nested Learning predicts: low-frequency learning captures general patterns, higher-frequency mechanisms capture specific, context-dependent information.
Five Systems, One Architecture
This isn't a one-off result. We've deployed external nested learning across five production systems:
- Nellie — persistent memory server for AI coding assistants. Cuts context recovery from ~10 minutes to ~30 seconds. Reduces token consumption ~50%.
- Praxis — expert system framework with a three-layer knowledge flywheel. Month-one accuracy ~85%, month-six approaches expert-level through accumulated corrections.
- SallyPort — the compliance platform described above. 84% fine-tuned, 100% with external memory.
- Bottery — multi-agent orchestration where each agent maintains its own memory hierarchy with cross-agent knowledge sharing.
- AI Gateway — organizational-scale audit capture that feeds the Praxis flywheel without changing user workflow.
Each arose from engineering needs. The theoretical connection to Nested Learning was established after the fact — Behrouz et al. gave us the formal framework for what we'd already built.
Why This Matters
If continual learning requires multi-timescale memory consolidation, and a harness layer can provide that externally, then the model can stay frozen. That means:
- Auditable. Every piece of injected knowledge has provenance. An auditor can reconstruct exactly what the model "knew" at any given response. Try that with fine-tuned weights.
- Replaceable. New model drops? Swap it in. Your accumulated knowledge transfers instantly. Fine-tuned weights are locked to one architecture.
- Compliance-safe. In CMMC, FedRAMP, and HIPAA environments, sensitive data touching model weights raises serious data-boundary questions. External memory keeps domain knowledge in inspectable, deletable storage — not baked into parameters.
- Fast. A new lesson is available to every agent in seconds. A fine-tuning round takes hours to days.
The pursuit of continual learning may not require modifying model weights at all. The model reasons. The harness remembers. And the separation is the point.