I Built an Enterprise AI Gateway From My Phone

Last night I built a five-layer enterprise AI gateway — traffic management, identity federation, multi-provider LLM routing, full audit trail, and content safety guardrails — and I did it from my phone.

No laptop. No IDE. No terminal. Just Telegram messages to two AI agents running on machines at home.

The Setup

I have a small fleet of AI agents, each running as a Claude Code instance on its own machine. Skippy runs on a Raspberry Pi. Bilby runs on a Mac Mini. They talk to me via Telegram bot, and they talk to each other through an inter-agent message bus I built called Beer Can.

Each agent has persistent memory via Nellie, my context engineering middleware. That means they remember what we were working on yesterday, what decisions we made, what failed, and what's next. No context reset. No "remind me what we're doing." They pick up where we left off.

I had a design document for the AI Gateway — a five-layer open source stack for corporate AI governance. The architecture was done. I needed it built, configured, and deployed.

The Build

I sent Bilby the design doc and told him to start. He broke it into phases, stood up the Docker Compose stack (Traefik, Keycloak, LiteLLM, Langfuse), configured OAuth integration with Microsoft Entra ID, and wired the routing layers together. When he hit a decision point — which auth flow for the passthrough mode, how to handle subscription tokens vs API keys — he asked me via Telegram. I answered in a sentence or two. He kept going.

Meanwhile, Skippy was handling research. When I needed to understand how a specific LiteLLM endpoint handled OAuth token forwarding, I asked Skippy. He read the design doc, searched Nellie for prior decisions, and gave me the answer. When Bilby needed instructions for grabbing dashboard screenshots, Skippy found an agent-friendly headless browser in Nellie's knowledge base and sent Bilby the setup steps through Beer Can.

Three phases of the gateway went up in one evening. Docker containers configured. Keycloak realms created. LiteLLM routing tested. Langfuse logging verified. Budget controls — accidentally — tested when Bilby hit a $10 spending cap and went silent until I had another Claude session go fix the limit.

My total input was maybe forty Telegram messages. Short ones. "Yes." "Use passthrough mode." "Check the design doc section on Mode 1." "That looks right, keep going."

What Actually Happened

This isn't a story about phones being good development tools. Phones are terrible development tools. The keyboard is small, the screen is smaller, and you can't see more than a few lines of code at once.

This is a story about what happens when the abstraction layer between intent and implementation gets thin enough.

I didn't write code. I didn't configure Docker. I didn't debug OAuth flows. I made decisions. I said "yes" or "no" or "not that way, this way." The agents did the mechanical work — and they did it well, because they had persistent memory of the project context, the design rationale, and the decisions we'd already made.

The phone wasn't the development environment. Natural language was the development environment. The phone was just the cheapest device that could transmit it.

The 10x Problem

Berkeley Haas recently published research showing that individual developers see 10x throughput gains with AI tools, but enterprise outcomes only improve 0.3x. Ninety-seven percent of the individual gain is lost to organizational friction — approval chains, coordination overhead, context switching, meetings about meetings.

I felt that number in my bones. I work at a company where I regularly build systems that nobody asked for and nobody knows how to evaluate. The organizational plumbing eats the throughput.

But last night, there was no organizational plumbing. The "organization" was me and two agents with a message bus. The coordination overhead was zero because Beer Can handles inter-agent communication. The context switching cost was zero because Nellie remembers everything. The approval chain was one person: me, on my phone, saying "yes."

That's how you get the 10x to stick. You don't fix the plumbing. You replace it.

The Stack That Makes This Possible

This didn't happen because the AI models got smarter (though they did). It happened because of the infrastructure around the models:

Persistent memory (Nellie) — agents don't start from zero. They remember prior sessions, lessons learned, and project context. Recovery time: 30 seconds instead of 10 minutes.
Inter-agent communication (Beer Can) — agents coordinate directly. Skippy can send Bilby instructions without me relaying messages.
Natural language interface (Telegram) — I talk to agents the same way I'd talk to a colleague. No syntax, no CLI, no IDE required.
Agent-per-repo architecture — each agent owns a domain. Bilby owns the gateway. Skippy handles research and coordination. Specialization beats generalization.

I wrote about this architecture formally in a recent paper — the harness layer around a frozen model can restore the continual learning properties that deployment freezes. Last night was that theory in practice: the models were frozen, but the harness remembered everything, and the system kept getting smarter across sessions.

What This Means

We're at an inflection point. The bottleneck in software development is no longer typing speed, or even thinking speed. It's the distance between having an idea and seeing it running. AI models closed most of that gap. Persistent memory and agent coordination closed the rest.

I built a production-grade enterprise system from my phone. Not a prototype. Not a demo. A five-layer gateway stack with OAuth, audit trails, multi-provider routing, and budget enforcement — the same thing companies pay $2K-20K/month for from commercial vendors.

The phone didn't matter. What mattered was that the distance between "I want this" and "this exists" has collapsed to the length of a text message.