I've been describing Nellie as a memory store for AI agents. Checkpoints, lessons, semantic code search — that's what it does. But that framing misses the part that actually matters.

Nellie isn't a memory store. Nellie is hook-based out-of-process context engineering middleware for Claude Code. The persistent memory is how it implements the long-term-memory primitive. Memory is an ingredient, not the thing.

The category most people don't know exists

Context engineering — the practice of deciding what information a model sees on each prompt — is usually done inside the agent. Subagent spawning, model-initiated tool calls, in-loop context-window management. Every framework reimplements its own version. The model has to know that context management is happening, because the model is driving it.

Nellie does it outside the agent, at the harness layer, via Claude Code's hook API. The model doesn't know it's been augmented. The hook runs. The context gets curated. The augmented prompt reaches the model. The model reasons over it. Separation of duty by design.

That distinction sounds small until you use it every day, and then it reorganizes your mental model of what's possible.

Why "middleware," not "memory store"

Three things flow from the reposition:

Universal applicability. Any Claude Code session gets it — all models, all skills, all subagents, all conversations. The harness runs first. You don't instrument the model or the agent framework. You install a hook.

Clean separation of duty. Reasoning stays with the model. Context curation stays with Nellie. Each is replaceable. When a new model drops, Nellie's hooks keep working. When Nellie adds a retrieval strategy, the model doesn't need to change.

Auditability at the boundary. Every context-injection decision is deterministic, loggable, and inspectable at the hook layer. For compliance-adjacent work — CUI-handling codebases, CMMC, FedRAMP — the hook is the natural place to strip sensitive content from prompts before the model sees them, to redact outputs after, and to record every decision the system makes about what the model gets to see. The model is the only untrusted party in the chain, and the hook is where the boundary of trust lives.

Academic validation landed last month

I didn't invent this framing on my own. Researchers at Stanford's IRIS Lab and MIT published Meta-Harness: End-to-End Optimization of Model Harnesses in late March 2026. Their first two sentences:

"Changing the harness around a fixed large language model (LLM) can produce a 6× performance gap on the same benchmark. The harness — the code that determines what to store, retrieve, and show to the model — often matters as much as the model itself."

That's the same definition of "harness" Nellie has been. Reading their paper was the moment the category clicked for me.

They built a system that automatically searches for better harnesses and ran it on three benchmarks. On text classification, their discovered harness beats the state-of-the-art context-management baseline by 7.7 points while using 4× fewer context tokens. On retrieval-augmented math reasoning, it lifts accuracy by 4.7 points on average across five held-out base models — i.e., the harness generalizes across models. On TerminalBench-2 agentic coding, it beats every hand-engineered harness published for Claude Haiku 4.5. All on fixed model weights. Harness alone.

Their reference proposer agent is Claude Code.

v0.5.3 shipped yesterday — and it actually installs now

The positioning work only matters if the tool works. Nellie v0.5.3 shipped yesterday and closes a launch-quality gap that had been embarrassing me for weeks. A batch of install-UX and runtime issues — the ONNX Runtime version mismatch, the missing runtime files on fresh installs, the nellie index CLI that was a silent stub, the sudo prompts that blocked Claude-Code-driven installs, the initial filesystem walk that never fired — all landed together.

A fresh install is now one command:

git clone https://github.com/mmorris35/nellie.git && cd nellie
bash packaging/install-universal.sh

The installer handles Rust toolchain, build prerequisites, ONNX Runtime, the embedding model, the binary, and first-time bootstrap. When it finishes, Nellie is running and Claude Code can talk to it. No manual curation of prerequisite packages. No silent CLI stubs. Sensible failure modes. The install experience is finally at the level the product promises.

Where this is going

Nellie is one implementation of a category that's going to get crowded as context engineering matures. Meta-Harness points at the theoretical direction: automated search over harnesses, not hand-written ones. The next natural step is to let a harness optimizer evolve Nellie's own hook logic against real workloads instead of my intuition about what to retrieve.

That's a research direction, not a next sprint. In the meantime, the v0.5.3 install path is production-quality, the architectural class is named, and the academic validation is a stable URL. If you've been curious about persistent memory for Claude Code, this is a good week to try it.

The two-line install is right up there. The repo is public.