Documentation

Build smarter.
Pay less.

Kyrion is a drop-in OpenAI-compatible proxy that automatically routes every prompt to the cheapest capable model, caches repeated queries in Redis, and fails over between providers — with zero changes to your existing SDK code.

Quick Start

Change two lines of code. That's all.

Kyrion is fully compatible with the OpenAI SDK. Replace the api_key and base_url — every other parameter, model name, and response format works identically.

import openai

client = openai.OpenAI(
    api_key="kyr_live_...",           # Your Kyrion key
    base_url="https://api.kyrion.dev/v1",
)

response = client.chat.completions.create(
    model="kyrion-auto",              # Kyrion picks the optimal model
    messages=[{"role": "user", "content": "What is the capital of France?"}],
)

print(response.choices[0].message.content)

# Inspect what Kyrion did:
# response.headers["X-Kyrion-Model"]     → "llama-3-8b-8192"
# response.headers["X-Kyrion-Provider"]  → "groq"
# response.headers["X-Kyrion-Saved"]     → "$0.0031"

Tip:Use model="kyrion-auto" to let Kyrion pick the optimal model for every request. You can also pass any specific model name (e.g. gpt-4o) and Kyrion will route that request accordingly while still applying caching and failover.

Authentication

API keys and environments.

Every request must include your Kyrion API key in the Authorization header.

kyr_live_...ProductionCharges apply. Real provider calls.

kyr_test_...SandboxFree. Returns synthetic responses. No provider calls.

Note:Your Kyrion key is scoped to your account and can be rotated at any time from the dashboard. Never commit it to source control — use environment variables.

shell

export KYRION_API_KEY="kyr_live_..."
export OPENAI_API_KEY="$KYRION_API_KEY"
export OPENAI_BASE_URL="https://api.kyrion.dev/v1"

How routing works

Every prompt gets a complexity score from 0 to 1.

Before forwarding a request to any provider, Kyrion runs the prompt through a lightweight semantic scorer. The scorer outputs a value between 0.00 (trivial) and 1.00 (highly complex), which determines the routing tier.

Scoring factors

Prompt lengthHighLonger prompts tend to be more complex tasks.

Instruction verbsHigh"Analyze", "design", "refactor", "evaluate" push score up. "What", "when", "define" pull it down.

Code presenceMediumCode blocks or technical syntax raise the score.

Context lengthMediumMulti-turn conversations with long history score higher.

Structured outputLowRequests for JSON/markdown formatting add a small bonus.

Tip:You can inspect the exact score Kyrion assigned to any request via the X-Kyrion-Score response header.

Models & tiers

Three tiers covering 99% of real-world use cases.

Tier	Score	Default model	Provider	Cost / 1K tokens	vs GPT-4
Simple	0.00–0.35	Llama 3 8B	Groq	$0.00009	97% cheaper
Medium	0.35–0.65	GPT-4o Mini	OpenAI	$0.00038	85% cheaper
Complex	0.65–1.00	Claude 3.5 Sonnet	Anthropic	$0.00300	60% cheaper

Note:Default models are chosen for the best cost/quality ratio at each tier. You can override them per-request by passing a specific model name, or globally from your dashboard under Settings → Routing.

python

# Force a specific model (bypass auto-routing)
response = client.chat.completions.create(
    model="claude-3-5-sonnet-20241022",   # exact provider model name
    messages=[{"role": "user", "content": "..."}],
)

Semantic caching

Identical queries served instantly at $0.00.

Kyrion maintains a Redis cache keyed on a semantic hash of each prompt. On a cache hit, the stored response is returned in under 2ms with zero provider cost. The cache is shared across your entire team — a query answered once benefits everyone.

~52%

Average hit rate

Across mixed production workloads

<2ms

Cache latency

Redis lookup on hit

24h

Default TTL

Configurable in dashboard

To bypass the cache for a specific request (e.g. time-sensitive queries), add the X-Kyrion-No-Cache: true header:

python

# Disable cache for a specific request
response = client.chat.completions.create(
    model="kyrion-auto",
    messages=[{"role": "user", "content": "What time is it?"}],
    extra_headers={"X-Kyrion-No-Cache": "true"},
)

Cache behaviour

HitX-Kyrion-Cached: trueResponse served from Redis. Cost: $0.00.

MissX-Kyrion-Cached: falseRequest forwarded to provider. Response stored for next time.

BypassX-Kyrion-Cached: bypassX-Kyrion-No-Cache header was set. Cache not read or written.

Circuit breaker & failover

Your app keeps running even when a provider goes down.

Kyrion monitors the health of every provider in real time. If a provider returns errors or exceeds latency thresholds, the circuit breaker opens and traffic is automatically rerouted to the next capable provider — in milliseconds, without any action on your part.

Fallback chain — Complex tier example

1stClaude 3.5 SonnetAnthropicPrimary

2ndGPT-4oOpenAIFallback

3rdGPT-4o MiniOpenAIDegraded fallback

Warning:Failover to a degraded model (e.g. falling back to GPT-4o Mini for a complex request) will be flagged in the X-Kyrion-Degraded: true response header so your app can handle it if needed.

POST /v1/chat/completions

The core endpoint. Fully OpenAI-compatible.

shell

curl https://api.kyrion.dev/v1/chat/completions \
  -H "Authorization: Bearer kyr_live_..." \
  -H "Content-Type: application/json" \
  -d '{"model":"kyrion-auto","messages":[{"role":"user","content":"Hello"}]}'

Request body

Parameter	Type	Required	Description
model	string	Yes	"kyrion-auto" for smart routing, or any exact model name (gpt-4o, claude-3-5-sonnet-20241022, etc.)
messages	array	Yes	Array of {role, content} objects. Same format as OpenAI.
temperature	number	No	0–2. Passed through to the provider.
max_tokens	integer	No	Maximum tokens in the response.
stream	boolean	No	Enable SSE streaming. Default false.
top_p	number	No	Nucleus sampling. Passed through.
stop	string[]	No	Stop sequences. Passed through.
response_format	object	No	{ type: "json_object" } for structured output.
tools	array	No	Tool/function definitions. Routes to a capable model automatically.

GET /v1/models

List all models available via your account.

Returns an OpenAI-compatible model list including all Kyrion routing aliases and underlying provider models your account has access to.

shell

curl https://api.kyrion.dev/v1/models \
  -H "Authorization: Bearer kyr_live_..."

GET /health

Check gateway and provider status.

Returns JSON with current system health, provider statuses, and cache uptime. Useful for monitoring integrations.

json

{
  "status": "ok",
  "cache": "healthy",
  "providers": {
    "openai":    "healthy",
    "anthropic": "healthy",
    "groq":      "healthy"
  },
  "uptime_seconds": 2847392
}

Response headers

Every response tells you exactly what Kyrion did.

Header	Example value	Description
X-Kyrion-Model	llama-3-8b-8192	Exact model ID that handled the request
X-Kyrion-Provider	groq	Provider used: openai \| anthropic \| groq
X-Kyrion-Tier	simple	Routing tier: simple \| medium \| complex
X-Kyrion-Score	0.082	Complexity score 0.000–1.000
X-Kyrion-Cached	false	true if served from Redis cache
X-Kyrion-Saved	$0.0031	Cost saved vs direct GPT-4 call
X-Kyrion-Latency	89ms	Total round-trip including Kyrion overhead
X-Kyrion-Overhead	11ms	Kyrion scoring + routing overhead only
X-Kyrion-Degraded	false	true if a fallback model was used
X-Kyrion-RequestId	req_abc123	Unique request ID for support & tracing

Streaming

Server-sent events work exactly as with OpenAI.

Kyrion fully supports SSE streaming. Set stream: true (or use the stream helper in your SDK) and responses will be streamed back token-by-token from the selected provider.

import openai

client = openai.OpenAI(
    api_key="kyr_live_...",
    base_url="https://api.kyrion.dev/v1",
)

with client.chat.completions.stream(
    model="kyrion-auto",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Note:Kyrion response headers are sent with the final SSE data: [DONE] chunk when streaming.

Errors & retries

Standard HTTP status codes with machine-readable error bodies.

Status	Code	Meaning
400	invalid_request	Malformed request body or missing required field.
401	invalid_api_key	API key missing, invalid, or revoked.
402	insufficient_credits	Account has run out of credits.
429	rate_limit_exceeded	Too many requests. See X-RateLimit-* headers.
500	provider_error	All providers returned an error. Retry with backoff.
503	no_models_available	No capable model is healthy right now.

Warning:On 500 errors, Kyrion has already exhausted its internal failover chain. Implement exponential backoff before retrying in your application layer.

Provider setup

Add your API keys once. Kyrion handles the rest.

Provider keys are stored AES-256 encrypted in Supabase Vault and are never exposed to your application. Kyrion uses them server-side to call providers on your behalf.

OpenAI

sk-...Used for: Medium, Complex

Anthropic

sk-ant-...Used for: Complex

Groq

gsk_...Used for: Simple

Add keys in Dashboard → Providers. You only need the providers for the tiers you use — Kyrion skips providers with no key configured.

Tip:You can use your own OpenAI key (billing goes to your OpenAI account) or let Kyrion proxy through its own keys (billing via Kyrion credits).

Rate limits

Per-account limits applied before provider limits.

Plan	Requests / month	Requests / minute	Concurrent
Hobby	5,000	20	5
Startup	200,000	200	50
Pro	Unlimited	600	200

Rate limit status is returned in every response via X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers.

SDK compatibility

If it works with OpenAI, it works with Kyrion.

Kyrion implements the full OpenAI REST API surface. Any library that can set a custom base_url works without modification.

openai-pythonPython

✓ Tested

openai-nodeNode.js

✓ Tested

openai-goGo

✓ Tested

openai-javaJava

✓ Tested

LangChainPython/JS

✓ Tested

LlamaIndexPython

✓ Tested

Vercel AI SDKNode.js

✓ Tested

instructorPython

✓ Tested

Self-hosting

Deploy the routing engine on your own infrastructure.

The Kyrion routing engine is written in Go and available as a Docker image. You can deploy it on Fly.io, Railway, or any container-capable host. Self-hosted instances use the same API surface but bill directly to your provider accounts.

shell

# Pull the image
docker pull kyrion/gateway:latest

# Run with your provider keys
docker run -p 8080:8080 \
  -e OPENAI_API_KEY=sk-... \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  -e GROQ_API_KEY=gsk_... \
  -e REDIS_URL=redis://localhost:6379 \
  kyrion/gateway:latest

Environment variables

OPENAI_API_KEYRequired if using OpenAI models

ANTHROPIC_API_KEYRequired if using Anthropic models

GROQ_API_KEYRequired if using Groq models

REDIS_URLRedis connection string for caching

PORTListen port (default: 8080)

LOG_LEVELdebug | info | warn | error (default: info)

Note:Self-hosting is available on the Pro plan. Contact hello@kyrion.dev for the Docker image access token and deployment guide.

FAQ

Answers to the most common questions.

Does Kyrion store my prompts?+

Only the first 80 characters of each prompt are stored as a preview in your usage logs — never the full content. Cached responses are stored in Redis with your configured TTL (default 24h), then deleted. Provider keys are encrypted with AES-256-GCM and never logged.

What happens if a provider is down?+

The circuit breaker detects failures within seconds and automatically reroutes to the next healthy provider in the fallback chain. Your request completes — just via a different model. The X-Kyrion-Degraded: true header tells you when this happens.

Can I use my own OpenAI/Anthropic API keys?+

Yes. Add your own keys in Dashboard → Providers. Kyrion will use them for calls to those providers — billing goes directly to your provider accounts. If you don't add a key for a provider, Kyrion skips that provider entirely.

Is kyrion-auto just a random router?+

No. Each prompt gets a complexity score (0–1) based on action verbs, subjects, bigrams, and length. The score determines the tier, and the tier maps to the cheapest model that can handle it reliably. You can inspect the score on every response via X-Kyrion-Score.

Will switching to Kyrion break my existing code?+

No. Kyrion is fully OpenAI-compatible. Change api_key to your Kyrion key and base_url to https://api.kyrion.dev/v1 — every other parameter, model name, streaming flag, and response format works identically. You can revert in seconds.

What is the overhead added by Kyrion?+

Scoring and routing adds ~8–15ms per request. On a cache hit, the entire response is returned in under 2ms with zero provider latency. The X-Kyrion-Overhead header shows the exact overhead for every request.

Does caching work across my whole team?+

Yes. The Redis cache is shared across all API keys in your account. If two team members send identical prompts, the second request is served from cache at $0.00 cost regardless of which key they use.

Can I force a specific model instead of auto-routing?+

Yes. Pass any exact model name (e.g. gpt-4o, claude-3-5-sonnet-20241022) instead of kyrion-auto and Kyrion will route only to that model, while still applying caching and failover.

How do I migrate from direct OpenAI to Kyrion?+

Two environment variable changes: set OPENAI_API_KEY to your Kyrion key and OPENAI_BASE_URL to https://api.kyrion.dev/v1. No code changes required. Your existing SDK calls work as-is.

Is there a sandbox / test mode?+

Yes. Keys prefixed with kyr_test_ return synthetic responses without calling any provider. Use them in CI/CD pipelines and automated tests to avoid provider costs.

Changelog

What's new in Kyrion.

v1.0.0Initial releaseMay 2026

newOpenAI-compatible routing API — change two lines of code, done.

newThree-tier complexity scoring engine (Simple / Medium / Complex).

newSemantic Redis cache — repeated queries served at $0.00.

newCircuit breaker with automatic failover across providers.

newSupport for OpenAI, Anthropic, Google, Groq, Mistral, and 10 more providers.

newX-Kyrion-* response headers on every request (model, provider, score, saved, latency).

newDashboard with real-time usage logs, provider key vault, API key management.

newSandbox mode (kyr_test_ keys) for zero-cost CI/CD testing.

newStreaming (SSE) support — compatible with all OpenAI SDK stream helpers.

newSelf-hosting via Docker image for Pro plan users.

Older versions will appear here as Kyrion evolves.

Build smarter.Pay less.

Quick Start

Authentication

How routing works

Models & tiers

Semantic caching

Circuit breaker & failover

POST /v1/chat/completions

GET /v1/models

GET /health

Response headers

Streaming

Errors & retries

Provider setup

Rate limits

SDK compatibility

Self-hosting

FAQ

Changelog

Build smarter.
Pay less.