Building an Open Source AI Gateway: Full Visibility, Zero Vendor Lock-In

Companies are adopting AI fast. What they're not doing is watching what happens when they do.

Who's using it? What are they asking? What models are answering? What did it cost? Did the model leak sensitive data? Did it hallucinate a compliance violation? Nobody knows, because every query goes straight from the employee's laptop to the cloud provider with zero corporate visibility.

Commercial AI gateways promise to solve this — for $2,000 to $20,000 a month, plus vendor lock-in. I wanted to see if I could build the same thing with open source, running on a laptop, for about $200 a month in infrastructure.

Turns out, I can.

The Problem

When a company gives employees access to Claude, ChatGPT, or any LLM, they're essentially saying: "Go talk to this extremely capable system. We have no idea what you're saying to it, what it's saying back, whether it's leaking our IP, or what it costs. Good luck."

For regulated industries — defense contractors, financial services, healthcare, energy — that's not just uncomfortable, it's a compliance risk. CMMC, SOX, HIPAA, and FedRAMP all have audit trail requirements. "We don't know what our AI is doing" is not an acceptable answer.

The Architecture

The AI Gateway is a five-layer open source stack that sits between every user and every LLM provider:

Layer 1 — Traefik handles TLS, rate limiting, and routing. One entry point for all traffic.

Layer 2 — Keycloak handles identity. SSO with your existing corporate IdP (Entra ID, Okta, Google Workspace), MFA, and fine-grained RBAC. Users sign in with their Microsoft credentials. Agents get service accounts. Everyone gets a role that determines which models they can use and how much they can spend.

Layer 3 — LiteLLM is the gateway core. Unified OpenAI-compatible API across every major provider — Anthropic, OpenAI, Azure OpenAI, AWS Bedrock, Google Vertex, and local models running on your own hardware. Model routing, fallback chains, budget controls, and virtual keys so users never see raw provider credentials.

Layer 4 — Langfuse + OpenTelemetry captures everything. Every prompt, every response, every tool call, every token count, every dollar spent. Full traces with user attribution, session threading, and cost analytics. This is the layer that answers "what is our AI doing?"

Layer 5 — Guardrails scans inputs and outputs for PII, credentials, policy violations, and content safety issues. Configurable per role and department.

The entire stack runs in Docker Compose. On a laptop. Using about 3 GB of RAM.

Zero Workflow Change

The part I'm most proud of: users don't have to change anything.

Claude Code supports ANTHROPIC_BASE_URL. Set it to point at the gateway instead of directly at Anthropic, and every request flows through the gateway transparently. The user's existing subscription works — their OAuth token passes straight through. The gateway logs everything but doesn't get in the way.

For corporate IT, that's one environment variable pushed via Group Policy or Intune. Users never know the gateway exists. Their tools work exactly the same. But now every interaction has a full audit trail.

The demo scenario: I open VS Code on my laptop, write code with Claude Code exactly the way I always do, then alt-tab to the admin dashboard and show every query I just made — which model answered, what tools it called, what it cost, and whether any safety rules fired. Real-time. Full chain.

Agents Are Users Too

Here's where it gets interesting. AI agents operating on behalf of the company are corporate actors. They need the same identity, audit, and governance treatment as human employees.

In the gateway, each agent is a Keycloak service account with the same RBAC, budget controls, and audit trail as a human. Agent activity appears in the same SIEM event stream — SOC analysts see agents in their existing dashboards. No separate tooling, no blind spots.

If an agent starts behaving anomalously at 3 AM — unusual query volume, accessing tools outside its scope, cost spike — the same alerting rules fire as they would for a human employee.

The Praxis Flywheel

Most gateways are dumb pipes. Data flows through and that's it. This one gets smarter.

Every interaction feeds a three-layer improvement loop I call the Praxis architecture:

Layer 1 — Foundation: Base system prompts and safety guardrails, refined based on aggregate quality scores across all users.

Layer 2 — Institutional Knowledge: Company-specific context — acronyms, approved workflows, compliance requirements, lessons from incidents. "When someone asks about X, always include Y." This layer is what turns a generic AI into one that knows how your organization works.

Layer 3 — Refinements: Per-user and per-department customization. Role context, style preferences, model routing based on usage patterns.

Month one, the gateway is a smart proxy. Month six, responses that used to take three clarification rounds land on the first try because the system has learned your organization's vocabulary and workflows. Month twelve, new hires get day-one quality because institutional knowledge lives in the system, not in someone's head.

Secrets Done Right

An AI gateway handles API keys for potentially dozens of users and providers. If an LLM agent can read its own environment variables — which most can — it can exfiltrate every credential in the system. Standard .env files are a liability.

The gateway uses with-secrets for privilege-separated credential injection. Secrets are injected at runtime at a privilege level the LLM context cannot reach. docker inspect shows nothing. Container logs show nothing. The agent can use the API key but can't read it.

What It Costs

The entire stack is open source: LiteLLM, Keycloak, Langfuse, Traefik, PostgreSQL, Redis. No proprietary dependencies. No per-seat SaaS fees.

Infrastructure for a small deployment: about $200/month for a VM, plus whatever you spend on LLM API credits (or $0 if you route to local models). Compare that to commercial gateway SaaS at $2,000 to $20,000 per month.

For the proof of concept, it runs on a Dell XPS 13 with 16 GB of RAM. Total hardware cost for a full demo: whatever the laptop cost.

Where This Goes

The proof of concept demonstrates that a single person can build a fully auditable AI gateway with open source tools in a few weeks. The architecture scales from a laptop demo to a production Kubernetes deployment without changing the stack.

The next steps: compliance report templates (CMMC, SOX, HIPAA) that generate directly from the audit trail, a multi-tenant SaaS edition for companies that don't want to self-host, and deeper integration with the Praxis flywheel so the system continuously improves based on measured response quality.

If you're running AI in a regulated environment and your current governance strategy is "hope nobody asks," let's talk.