ProxyLLM
The Layer 2 for AI

Use your $20/mo Codex sub in production workflows.

ProxyLLM routes every request to the cheapest model that can do the job. Across your API credits and your ChatGPT subscription. One endpoint. Drop in for OPENAI_BASE_URL.

No credit card. Your keys stay encrypted in your account.

Integrates with
OpenAI
Anthropic
Google Gemini
Meta
Node.js
Python
TypeScript
n8n
LangChain
Cursor
Vercel
terminal
# point your existing OpenAI SDK at ProxyLLM
export OPENAI_BASE_URL="https://app.proxyllm.ericslab.ai/v1"

# every call now gets routing, Blitz, cache, and cost analytics.
# nothing else changes.

One endpoint sits on top of every model you already pay for.

OpenAI, Anthropic, OpenRouter, and Codex are L1. Slow to switch between, expensive when picked wrong. ProxyLLM is L2. Routes, batches, caches, and reports on top.

Your code one OPENAI_BASE_URL change
ProxyLLM · L2 routing · Blitz · cache · sub-keys · replay · evals
OpenAI
API + Codex
Anthropic
API
OpenRouter
any model
Codex CLI
your subscription

Three reasons your bill drops on Monday.

Paid

Routing

One endpoint. Picks the right model.

Send a prompt. ProxyLLM classifies intent and difficulty, then dispatches to the cheapest model that can answer it. Simple edits route to MiniMax. Heavy refactors route to GPT-5.5. You write zero glue code.

"rename this var" minimax $0.0001
"add JSDoc" gpt-4o-mini $0.0004
"refactor auth" gpt-5.5 $0.082
Free

Blitz

100 prompts in the time of one.

Fan out a batch across providers and your Codex subscription in parallel. Rate-limit aware. Cost-capped. Partial failures handled. Replaces every for-loop you've written around an OpenAI client.

blitz · 8 prompts 0.94s total
Sequential would take 7.5s.
Paid

Codex Hosted

Your ChatGPT subscription. Programmatic.

Connect your Codex subscription with one click. ProxyLLM spins up an isolated container that runs codex exec on your behalf. Flat rate for the workloads it covers, API fallback for the ones it can't.

codex exec connected
this month $20.00
would have cost via API $147.32

Change one env var. Keep your code.

ProxyLLM is OpenAI-compatible. Point your existing SDK at our base URL and you're routing, Blitzing, and caching by default.

1

Drop in your keys

Bring OpenAI, OpenRouter, and Codex. Stored AES-256-GCM. Never logged.

2

Point your SDK

Set OPENAI_BASE_URL to https://app.proxyllm.ericslab.ai/v1. Your existing code keeps working.

3

Watch the bill drop

Routing picks cheap models for easy calls. Codex absorbs the rest. Cache catches repeats.

Free, always

And every request still shows the cache hit rate.

We hash every prompt server-side. The moment you send the same system + user message twice, you see it counted, and the dollar amount the cache saved you. Free on every account.

  • Per-request USD cost, per-day rollup, per-model breakdown
  • Repeating prompt detection out of the box
  • 30+ days of full request history
Cache hit rate
42.1%
last 7d
Cost saved
$1.07
vs no cache
Repeats caught
14
this week
Models tested
9
this week
Request log now
gpt-4o-mini
1,204 in 824 cached $0.0003
claude-3.5-sonnet
1,204 in 0 $0.0058
gpt-4o-mini repeat
1,204 in 1,204 cached $0.0002

Pricing built like the product.

Cheap when easy, paid when worth it.

Free
$0 forever
  • Bring your own OpenAI and OpenRouter keys
  • Cache analytics, cost rollup, repeating-prompt detection
  • Blitz: parallel inference across providers
  • Drop-in for OPENAI_BASE_URL
  • 30+ days of request history
Start free
Paid
$8 per user, per month
  • Everything in Free
  • Routing: one endpoint, picks the model
  • Codex Hosted: connect your ChatGPT subscription
  • Scoped sub-keys with per-app budget caps
  • Replay and diff: re-run any past request
  • Schema-enforced outputs across providers
Get the router

Questions you'd be right to ask.

What happens to my API keys?

Encrypted with AES-256-GCM in our database. Decrypted only inside the serverless function that calls the provider on your behalf. We never log them and never transmit them to anyone other than the provider you pointed us at.

How does the router pick a model?

A cheap classifier call inspects intent and difficulty, then matches against your routing config. You can build the config in the visual editor (drag nodes, set thresholds) or write JSON. The chosen model and the routing reason come back in every response so you can trust it.

Is Codex Hosted using my account legitimately?

It runs your OAuth-authenticated codex exec inside an isolated container that you alone control. You log in with your ChatGPT account through OpenAI's official device-code flow. We hold the session for your container only. If OpenAI changes policy we will pull the feature without resistance, and the rest of the product carries on.

Does Blitz work over Codex?

Yes. Fan out a batch and ProxyLLM distributes calls across whatever credentials you have configured, including your Codex container, with rate-limit-aware backpressure.