Use your $20/mo Codex sub in production workflows.
ProxyLLM routes every request to the cheapest model that can do the job. Across your API credits and your ChatGPT subscription. One endpoint. Drop in for OPENAI_BASE_URL.
No credit card. Your keys stay encrypted in your account.
# point your existing OpenAI SDK at ProxyLLM
export OPENAI_BASE_URL="https://app.proxyllm.ericslab.ai/v1"
# every call now gets routing, Blitz, cache, and cost analytics.
# nothing else changes. One endpoint sits on top of every model you already pay for.
OpenAI, Anthropic, OpenRouter, and Codex are L1. Slow to switch between, expensive when picked wrong. ProxyLLM is L2. Routes, batches, caches, and reports on top.
Three reasons your bill drops on Monday.
Routing
One endpoint. Picks the right model.
Send a prompt. ProxyLLM classifies intent and difficulty, then dispatches to the cheapest model that can answer it. Simple edits route to MiniMax. Heavy refactors route to GPT-5.5. You write zero glue code.
Blitz
100 prompts in the time of one.
Fan out a batch across providers and your Codex subscription in parallel. Rate-limit aware. Cost-capped. Partial failures handled. Replaces every for-loop you've written around an OpenAI client.
Codex Hosted
Your ChatGPT subscription. Programmatic.
Connect your Codex subscription with one click. ProxyLLM spins up an isolated container that runs codex exec on your behalf. Flat rate for the workloads it covers, API fallback for the ones it can't.
Change one env var. Keep your code.
ProxyLLM is OpenAI-compatible. Point your existing SDK at our base URL and you're routing, Blitzing, and caching by default.
Drop in your keys
Bring OpenAI, OpenRouter, and Codex. Stored AES-256-GCM. Never logged.
Point your SDK
Set OPENAI_BASE_URL to https://app.proxyllm.ericslab.ai/v1. Your existing code keeps working.
Watch the bill drop
Routing picks cheap models for easy calls. Codex absorbs the rest. Cache catches repeats.
And every request still shows the cache hit rate.
We hash every prompt server-side. The moment you send the same system + user message twice, you see it counted, and the dollar amount the cache saved you. Free on every account.
- Per-request USD cost, per-day rollup, per-model breakdown
- Repeating prompt detection out of the box
- 30+ days of full request history
Pricing built like the product.
Cheap when easy, paid when worth it.
- Bring your own OpenAI and OpenRouter keys
- Cache analytics, cost rollup, repeating-prompt detection
- Blitz: parallel inference across providers
- Drop-in for OPENAI_BASE_URL
- 30+ days of request history
- Everything in Free
- Routing: one endpoint, picks the model
- Codex Hosted: connect your ChatGPT subscription
- Scoped sub-keys with per-app budget caps
- Replay and diff: re-run any past request
- Schema-enforced outputs across providers
Questions you'd be right to ask.
What happens to my API keys?
Encrypted with AES-256-GCM in our database. Decrypted only inside the serverless function that calls the provider on your behalf. We never log them and never transmit them to anyone other than the provider you pointed us at.
How does the router pick a model?
A cheap classifier call inspects intent and difficulty, then matches against your routing config. You can build the config in the visual editor (drag nodes, set thresholds) or write JSON. The chosen model and the routing reason come back in every response so you can trust it.
Is Codex Hosted using my account legitimately?
It runs your OAuth-authenticated codex exec inside an isolated container that you alone control. You log in with your ChatGPT account through OpenAI's official device-code flow. We hold the session for your container only. If OpenAI changes policy we will pull the feature without resistance, and the rest of the product carries on.
Does Blitz work over Codex?
Yes. Fan out a batch and ProxyLLM distributes calls across whatever credentials you have configured, including your Codex container, with rate-limit-aware backpressure.