Documentation

Build smarter.
Pay less.

Kyrion is a drop-in OpenAI-compatible proxy that automatically routes every prompt to the cheapest capable model, caches repeated queries in Redis, and fails over between providers — with zero changes to your existing SDK code.

Quick Start

Change two lines of code. That's all.

Kyrion is fully compatible with the OpenAI SDK. Replace the api_key and base_url — every other parameter, model name, and response format works identically.

import openai

client = openai.OpenAI(
    api_key="kyr_live_...",           # Your Kyrion key
    base_url="https://api.kyrion.dev/v1",
)

response = client.chat.completions.create(
    model="kyrion-auto",              # Kyrion picks the optimal model
    messages=[{"role": "user", "content": "What is the capital of France?"}],
)

print(response.choices[0].message.content)

# Inspect what Kyrion did:
# response.headers["X-Kyrion-Model"]     → "llama-3-8b-8192"
# response.headers["X-Kyrion-Provider"]  → "groq"
# response.headers["X-Kyrion-Saved"]     → "$0.0031"
Tip:Use model="kyrion-auto" to let Kyrion pick the optimal model for every request. You can also pass any specific model name (e.g. gpt-4o) and Kyrion will route that request accordingly while still applying caching and failover.

Authentication

API keys and environments.

Every request must include your Kyrion API key in the Authorization header.

kyr_live_...ProductionCharges apply. Real provider calls.
kyr_test_...SandboxFree. Returns synthetic responses. No provider calls.
Note:Your Kyrion key is scoped to your account and can be rotated at any time from the dashboard. Never commit it to source control — use environment variables.
shell
export KYRION_API_KEY="kyr_live_..."
export OPENAI_API_KEY="$KYRION_API_KEY"
export OPENAI_BASE_URL="https://api.kyrion.dev/v1"

How routing works

Every prompt gets a complexity score from 0 to 1.

Before forwarding a request to any provider, Kyrion runs the prompt through a lightweight semantic scorer. The scorer outputs a value between 0.00 (trivial) and 1.00 (highly complex), which determines the routing tier.

Scoring factors

Prompt lengthHighLonger prompts tend to be more complex tasks.
Instruction verbsHigh"Analyze", "design", "refactor", "evaluate" push score up. "What", "when", "define" pull it down.
Code presenceMediumCode blocks or technical syntax raise the score.
Context lengthMediumMulti-turn conversations with long history score higher.
Structured outputLowRequests for JSON/markdown formatting add a small bonus.
Tip:You can inspect the exact score Kyrion assigned to any request via the X-Kyrion-Score response header.

Models & tiers

Three tiers covering 99% of real-world use cases.

TierScoreDefault modelProviderCost / 1K tokensvs GPT-4
Simple0.00–0.35Llama 3 8BGroq$0.0000997% cheaper
Medium0.35–0.65GPT-4o MiniOpenAI$0.0003885% cheaper
Complex0.65–1.00Claude 3.5 SonnetAnthropic$0.0030060% cheaper
Note:Default models are chosen for the best cost/quality ratio at each tier. You can override them per-request by passing a specific model name, or globally from your dashboard under Settings → Routing.
python
# Force a specific model (bypass auto-routing)
response = client.chat.completions.create(
    model="claude-3-5-sonnet-20241022",   # exact provider model name
    messages=[{"role": "user", "content": "..."}],
)

Semantic caching

Identical queries served instantly at $0.00.

Kyrion maintains a Redis cache keyed on a semantic hash of each prompt. On a cache hit, the stored response is returned in under 2ms with zero provider cost. The cache is shared across your entire team — a query answered once benefits everyone.

~52%

Average hit rate

Across mixed production workloads

<2ms

Cache latency

Redis lookup on hit

24h

Default TTL

Configurable in dashboard

To bypass the cache for a specific request (e.g. time-sensitive queries), add the X-Kyrion-No-Cache: true header:

python
# Disable cache for a specific request
response = client.chat.completions.create(
    model="kyrion-auto",
    messages=[{"role": "user", "content": "What time is it?"}],
    extra_headers={"X-Kyrion-No-Cache": "true"},
)

Cache behaviour

HitResponse served from Redis. Cost: $0.00.
MissRequest forwarded to provider. Response stored for next time.
BypassX-Kyrion-No-Cache header was set. Cache not read or written.

Circuit breaker & failover

Your app keeps running even when a provider goes down.

Kyrion monitors the health of every provider in real time. If a provider returns errors or exceeds latency thresholds, the circuit breaker opens and traffic is automatically rerouted to the next capable provider — in milliseconds, without any action on your part.

Fallback chain — Complex tier example
1stClaude 3.5 SonnetAnthropicPrimary
2ndGPT-4oOpenAIFallback
3rdGPT-4o MiniOpenAIDegraded fallback
Warning:Failover to a degraded model (e.g. falling back to GPT-4o Mini for a complex request) will be flagged in the X-Kyrion-Degraded: true response header so your app can handle it if needed.

POST /v1/chat/completions

The core endpoint. Fully OpenAI-compatible.

shell
curl https://api.kyrion.dev/v1/chat/completions \
  -H "Authorization: Bearer kyr_live_..." \
  -H "Content-Type: application/json" \
  -d '{"model":"kyrion-auto","messages":[{"role":"user","content":"Hello"}]}'

Request body

ParameterTypeRequiredDescription
modelstringYes"kyrion-auto" for smart routing, or any exact model name (gpt-4o, claude-3-5-sonnet-20241022, etc.)
messagesarrayYesArray of {role, content} objects. Same format as OpenAI.
temperaturenumberNo0–2. Passed through to the provider.
max_tokensintegerNoMaximum tokens in the response.
streambooleanNoEnable SSE streaming. Default false.
top_pnumberNoNucleus sampling. Passed through.
stopstring[]NoStop sequences. Passed through.
response_formatobjectNo{ type: "json_object" } for structured output.
toolsarrayNoTool/function definitions. Routes to a capable model automatically.

GET /v1/models

List all models available via your account.

Returns an OpenAI-compatible model list including all Kyrion routing aliases and underlying provider models your account has access to.

shell
curl https://api.kyrion.dev/v1/models \
  -H "Authorization: Bearer kyr_live_..."

GET /health

Check gateway and provider status.

Returns JSON with current system health, provider statuses, and cache uptime. Useful for monitoring integrations.

json
{
  "status": "ok",
  "cache": "healthy",
  "providers": {
    "openai":    "healthy",
    "anthropic": "healthy",
    "groq":      "healthy"
  },
  "uptime_seconds": 2847392
}

Response headers

Every response tells you exactly what Kyrion did.

HeaderExample valueDescription
X-Kyrion-Modelllama-3-8b-8192Exact model ID that handled the request
X-Kyrion-ProvidergroqProvider used: openai | anthropic | groq
X-Kyrion-TiersimpleRouting tier: simple | medium | complex
X-Kyrion-Score0.082Complexity score 0.000–1.000
X-Kyrion-Cachedfalsetrue if served from Redis cache
X-Kyrion-Saved$0.0031Cost saved vs direct GPT-4 call
X-Kyrion-Latency89msTotal round-trip including Kyrion overhead
X-Kyrion-Overhead11msKyrion scoring + routing overhead only
X-Kyrion-Degradedfalsetrue if a fallback model was used
X-Kyrion-RequestIdreq_abc123Unique request ID for support & tracing

Streaming

Server-sent events work exactly as with OpenAI.

Kyrion fully supports SSE streaming. Set stream: true (or use the stream helper in your SDK) and responses will be streamed back token-by-token from the selected provider.

import openai

client = openai.OpenAI(
    api_key="kyr_live_...",
    base_url="https://api.kyrion.dev/v1",
)

with client.chat.completions.stream(
    model="kyrion-auto",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
Note:Kyrion response headers are sent with the final SSE data: [DONE] chunk when streaming.

Errors & retries

Standard HTTP status codes with machine-readable error bodies.

StatusCodeMeaning
400invalid_requestMalformed request body or missing required field.
401invalid_api_keyAPI key missing, invalid, or revoked.
402insufficient_creditsAccount has run out of credits.
429rate_limit_exceededToo many requests. See X-RateLimit-* headers.
500provider_errorAll providers returned an error. Retry with backoff.
503no_models_availableNo capable model is healthy right now.
Warning:On 500 errors, Kyrion has already exhausted its internal failover chain. Implement exponential backoff before retrying in your application layer.

Provider setup

Add your API keys once. Kyrion handles the rest.

Provider keys are stored AES-256 encrypted in Supabase Vault and are never exposed to your application. Kyrion uses them server-side to call providers on your behalf.

OpenAI
sk-...Used for: Medium, Complex
Anthropic
sk-ant-...Used for: Complex
Groq
gsk_...Used for: Simple

Add keys in Dashboard → Providers. You only need the providers for the tiers you use — Kyrion skips providers with no key configured.

Tip:You can use your own OpenAI key (billing goes to your OpenAI account) or let Kyrion proxy through its own keys (billing via Kyrion credits).

Rate limits

Per-account limits applied before provider limits.

PlanRequests / monthRequests / minuteConcurrent
Hobby5,000205
Startup200,00020050
ProUnlimited600200

Rate limit status is returned in every response via X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers.

SDK compatibility

If it works with OpenAI, it works with Kyrion.

Kyrion implements the full OpenAI REST API surface. Any library that can set a custom base_url works without modification.

openai-pythonPython
✓ Tested
openai-nodeNode.js
✓ Tested
openai-goGo
✓ Tested
openai-javaJava
✓ Tested
LangChainPython/JS
✓ Tested
LlamaIndexPython
✓ Tested
Vercel AI SDKNode.js
✓ Tested
instructorPython
✓ Tested

Self-hosting

Deploy the routing engine on your own infrastructure.

The Kyrion routing engine is written in Go and available as a Docker image. You can deploy it on Fly.io, Railway, or any container-capable host. Self-hosted instances use the same API surface but bill directly to your provider accounts.

shell
# Pull the image
docker pull kyrion/gateway:latest

# Run with your provider keys
docker run -p 8080:8080 \
  -e OPENAI_API_KEY=sk-... \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  -e GROQ_API_KEY=gsk_... \
  -e REDIS_URL=redis://localhost:6379 \
  kyrion/gateway:latest

Environment variables

OPENAI_API_KEYRequired if using OpenAI models
ANTHROPIC_API_KEYRequired if using Anthropic models
GROQ_API_KEYRequired if using Groq models
REDIS_URLRedis connection string for caching
PORTListen port (default: 8080)
LOG_LEVELdebug | info | warn | error (default: info)
Note:Self-hosting is available on the Pro plan. Contact hello@kyrion.dev for the Docker image access token and deployment guide.

FAQ

Answers to the most common questions.

Does Kyrion store my prompts?+

Only the first 80 characters of each prompt are stored as a preview in your usage logs — never the full content. Cached responses are stored in Redis with your configured TTL (default 24h), then deleted. Provider keys are encrypted with AES-256-GCM and never logged.

What happens if a provider is down?+

The circuit breaker detects failures within seconds and automatically reroutes to the next healthy provider in the fallback chain. Your request completes — just via a different model. The X-Kyrion-Degraded: true header tells you when this happens.

Can I use my own OpenAI/Anthropic API keys?+

Yes. Add your own keys in Dashboard → Providers. Kyrion will use them for calls to those providers — billing goes directly to your provider accounts. If you don't add a key for a provider, Kyrion skips that provider entirely.

Is kyrion-auto just a random router?+

No. Each prompt gets a complexity score (0–1) based on action verbs, subjects, bigrams, and length. The score determines the tier, and the tier maps to the cheapest model that can handle it reliably. You can inspect the score on every response via X-Kyrion-Score.

Will switching to Kyrion break my existing code?+

No. Kyrion is fully OpenAI-compatible. Change api_key to your Kyrion key and base_url to https://api.kyrion.dev/v1 — every other parameter, model name, streaming flag, and response format works identically. You can revert in seconds.

What is the overhead added by Kyrion?+

Scoring and routing adds ~8–15ms per request. On a cache hit, the entire response is returned in under 2ms with zero provider latency. The X-Kyrion-Overhead header shows the exact overhead for every request.

Does caching work across my whole team?+

Yes. The Redis cache is shared across all API keys in your account. If two team members send identical prompts, the second request is served from cache at $0.00 cost regardless of which key they use.

Can I force a specific model instead of auto-routing?+

Yes. Pass any exact model name (e.g. gpt-4o, claude-3-5-sonnet-20241022) instead of kyrion-auto and Kyrion will route only to that model, while still applying caching and failover.

How do I migrate from direct OpenAI to Kyrion?+

Two environment variable changes: set OPENAI_API_KEY to your Kyrion key and OPENAI_BASE_URL to https://api.kyrion.dev/v1. No code changes required. Your existing SDK calls work as-is.

Is there a sandbox / test mode?+

Yes. Keys prefixed with kyr_test_ return synthetic responses without calling any provider. Use them in CI/CD pipelines and automated tests to avoid provider costs.

Changelog

What's new in Kyrion.

v1.0.0Initial releaseMay 2026
newOpenAI-compatible routing API — change two lines of code, done.
newThree-tier complexity scoring engine (Simple / Medium / Complex).
newSemantic Redis cache — repeated queries served at $0.00.
newCircuit breaker with automatic failover across providers.
newSupport for OpenAI, Anthropic, Google, Groq, Mistral, and 10 more providers.
newX-Kyrion-* response headers on every request (model, provider, score, saved, latency).
newDashboard with real-time usage logs, provider key vault, API key management.
newSandbox mode (kyr_test_ keys) for zero-cost CI/CD testing.
newStreaming (SSE) support — compatible with all OpenAI SDK stream helpers.
newSelf-hosting via Docker image for Pro plan users.

Older versions will appear here as Kyrion evolves.