The Layer 2 for AI

Use your $20/mo Codex sub in production workflows.

ProxyLLM routes every request to the cheapest model that can do the job. Across your API credits and your ChatGPT subscription. One endpoint. Drop in for OPENAI_BASE_URL.

Start free See the router

No credit card. Your keys stay encrypted in your account.

Integrates with

terminal

# point your existing OpenAI SDK at ProxyLLM
export OPENAI_BASE_URL="https://app.proxyllm.ericslab.ai/v1"

# every call now gets routing, Blitz, cache, and cost analytics.
# nothing else changes.

One endpoint sits on top of every model you already pay for.

OpenAI, Anthropic, OpenRouter, and Codex are L1. Slow to switch between, expensive when picked wrong. ProxyLLM is L2. Routes, batches, caches, and reports on top.

Your code one OPENAI_BASE_URL change

ProxyLLM · L2 routing · Blitz · cache · sub-keys · replay · evals

OpenAI

API + Codex

Anthropic

API

OpenRouter

any model

Codex CLI

your subscription

Three reasons your bill drops on Monday.

Paid

Routing

One endpoint. Picks the right model.

Send a prompt. ProxyLLM classifies intent and difficulty, then dispatches to the cheapest model that can answer it. Simple edits route to MiniMax. Heavy refactors route to GPT-5.5. You write zero glue code.

"rename this var" minimax $0.0001

"add JSDoc" gpt-4o-mini $0.0004

"refactor auth" gpt-5.5 $0.082

Free

Blitz

100 prompts in the time of one.

Fan out a batch across providers and your Codex subscription in parallel. Rate-limit aware. Cost-capped. Partial failures handled. Replaces every for-loop you've written around an OpenAI client.

blitz · 8 prompts 0.94s total

Sequential would take 7.5s.

Paid

Codex Hosted

Your ChatGPT subscription. Programmatic.

Connect your Codex subscription with one click. ProxyLLM spins up an isolated container that runs codex exec on your behalf. Flat rate for the workloads it covers, API fallback for the ones it can't.

codex exec connected

this month $20.00

would have cost via API $147.32

Change one env var. Keep your code.

ProxyLLM is OpenAI-compatible. Point your existing SDK at our base URL and you're routing, Blitzing, and caching by default.

Drop in your keys

Bring OpenAI, OpenRouter, and Codex. Stored AES-256-GCM. Never logged.

Point your SDK

Set OPENAI_BASE_URL to https://app.proxyllm.ericslab.ai/v1. Your existing code keeps working.

Watch the bill drop

Routing picks cheap models for easy calls. Codex absorbs the rest. Cache catches repeats.

Free, always

And every request still shows the cache hit rate.

We hash every prompt server-side. The moment you send the same system + user message twice, you see it counted, and the dollar amount the cache saved you. Free on every account.

Per-request USD cost, per-day rollup, per-model breakdown
Repeating prompt detection out of the box
30+ days of full request history

Cache hit rate

42.1%

last 7d

Cost saved

$1.07

vs no cache

Repeats caught

this week

Models tested

this week

Request log now

gpt-4o-mini

1,204 in 824 cached $0.0003

claude-3.5-sonnet

1,204 in 0 $0.0058

gpt-4o-mini repeat

1,204 in 1,204 cached $0.0002

Pricing built like the product.

Cheap when easy, paid when worth it.

Free

$0 forever

Bring your own OpenAI and OpenRouter keys
Cache analytics, cost rollup, repeating-prompt detection
Blitz: parallel inference across providers
Drop-in for OPENAI_BASE_URL
30+ days of request history

Start free

Paid

$8 per user, per month

Everything in Free
Routing: one endpoint, picks the model
Codex Hosted: connect your ChatGPT subscription
Scoped sub-keys with per-app budget caps
Replay and diff: re-run any past request
Schema-enforced outputs across providers

Get the router

Questions you'd be right to ask.

What happens to my API keys?

Encrypted with AES-256-GCM in our database. Decrypted only inside the serverless function that calls the provider on your behalf. We never log them and never transmit them to anyone other than the provider you pointed us at.

How does the router pick a model?

A cheap classifier call inspects intent and difficulty, then matches against your routing config. You can build the config in the visual editor (drag nodes, set thresholds) or write JSON. The chosen model and the routing reason come back in every response so you can trust it.

Is Codex Hosted using my account legitimately?

It runs your OAuth-authenticated codex exec inside an isolated container that you alone control. You log in with your ChatGPT account through OpenAI's official device-code flow. We hold the session for your container only. If OpenAI changes policy we will pull the feature without resistance, and the rest of the product carries on.

Does Blitz work over Codex?

Yes. Fan out a batch and ProxyLLM distributes calls across whatever credentials you have configured, including your Codex container, with rate-limit-aware backpressure.