Documentation
Build smarter.
Pay less.
Kyrion is a drop-in OpenAI-compatible proxy that automatically routes every prompt to the cheapest capable model, caches repeated queries in Redis, and fails over between providers — with zero changes to your existing SDK code.
Quick Start
Change two lines of code. That's all.
Kyrion is fully compatible with the OpenAI SDK. Replace the api_key and base_url — every other parameter, model name, and response format works identically.
import openai
client = openai.OpenAI(
api_key="kyr_live_...", # Your Kyrion key
base_url="https://api.kyrion.dev/v1",
)
response = client.chat.completions.create(
model="kyrion-auto", # Kyrion picks the optimal model
messages=[{"role": "user", "content": "What is the capital of France?"}],
)
print(response.choices[0].message.content)
# Inspect what Kyrion did:
# response.headers["X-Kyrion-Model"] → "llama-3-8b-8192"
# response.headers["X-Kyrion-Provider"] → "groq"
# response.headers["X-Kyrion-Saved"] → "$0.0031"model="kyrion-auto" to let Kyrion pick the optimal model for every request. You can also pass any specific model name (e.g. gpt-4o) and Kyrion will route that request accordingly while still applying caching and failover.Authentication
API keys and environments.
Every request must include your Kyrion API key in the Authorization header.
kyr_live_...ProductionCharges apply. Real provider calls.kyr_test_...SandboxFree. Returns synthetic responses. No provider calls.export KYRION_API_KEY="kyr_live_..." export OPENAI_API_KEY="$KYRION_API_KEY" export OPENAI_BASE_URL="https://api.kyrion.dev/v1"
How routing works
Every prompt gets a complexity score from 0 to 1.
Before forwarding a request to any provider, Kyrion runs the prompt through a lightweight semantic scorer. The scorer outputs a value between 0.00 (trivial) and 1.00 (highly complex), which determines the routing tier.
Scoring factors
X-Kyrion-Score response header.Models & tiers
Three tiers covering 99% of real-world use cases.
| Tier | Score | Default model | Provider | Cost / 1K tokens | vs GPT-4 |
|---|---|---|---|---|---|
| Simple | 0.00–0.35 | Llama 3 8B | Groq | $0.00009 | 97% cheaper |
| Medium | 0.35–0.65 | GPT-4o Mini | OpenAI | $0.00038 | 85% cheaper |
| Complex | 0.65–1.00 | Claude 3.5 Sonnet | Anthropic | $0.00300 | 60% cheaper |
# Force a specific model (bypass auto-routing)
response = client.chat.completions.create(
model="claude-3-5-sonnet-20241022", # exact provider model name
messages=[{"role": "user", "content": "..."}],
)Semantic caching
Identical queries served instantly at $0.00.
Kyrion maintains a Redis cache keyed on a semantic hash of each prompt. On a cache hit, the stored response is returned in under 2ms with zero provider cost. The cache is shared across your entire team — a query answered once benefits everyone.
~52%
Average hit rate
Across mixed production workloads
<2ms
Cache latency
Redis lookup on hit
24h
Default TTL
Configurable in dashboard
To bypass the cache for a specific request (e.g. time-sensitive queries), add the X-Kyrion-No-Cache: true header:
# Disable cache for a specific request
response = client.chat.completions.create(
model="kyrion-auto",
messages=[{"role": "user", "content": "What time is it?"}],
extra_headers={"X-Kyrion-No-Cache": "true"},
)Cache behaviour
Circuit breaker & failover
Your app keeps running even when a provider goes down.
Kyrion monitors the health of every provider in real time. If a provider returns errors or exceeds latency thresholds, the circuit breaker opens and traffic is automatically rerouted to the next capable provider — in milliseconds, without any action on your part.
X-Kyrion-Degraded: true response header so your app can handle it if needed.POST /v1/chat/completions
The core endpoint. Fully OpenAI-compatible.
curl https://api.kyrion.dev/v1/chat/completions \
-H "Authorization: Bearer kyr_live_..." \
-H "Content-Type: application/json" \
-d '{"model":"kyrion-auto","messages":[{"role":"user","content":"Hello"}]}'Request body
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | "kyrion-auto" for smart routing, or any exact model name (gpt-4o, claude-3-5-sonnet-20241022, etc.) |
| messages | array | Yes | Array of {role, content} objects. Same format as OpenAI. |
| temperature | number | No | 0–2. Passed through to the provider. |
| max_tokens | integer | No | Maximum tokens in the response. |
| stream | boolean | No | Enable SSE streaming. Default false. |
| top_p | number | No | Nucleus sampling. Passed through. |
| stop | string[] | No | Stop sequences. Passed through. |
| response_format | object | No | { type: "json_object" } for structured output. |
| tools | array | No | Tool/function definitions. Routes to a capable model automatically. |
GET /v1/models
List all models available via your account.
Returns an OpenAI-compatible model list including all Kyrion routing aliases and underlying provider models your account has access to.
curl https://api.kyrion.dev/v1/models \ -H "Authorization: Bearer kyr_live_..."
GET /health
Check gateway and provider status.
Returns JSON with current system health, provider statuses, and cache uptime. Useful for monitoring integrations.
{
"status": "ok",
"cache": "healthy",
"providers": {
"openai": "healthy",
"anthropic": "healthy",
"groq": "healthy"
},
"uptime_seconds": 2847392
}Response headers
Every response tells you exactly what Kyrion did.
| Header | Example value | Description |
|---|---|---|
| X-Kyrion-Model | llama-3-8b-8192 | Exact model ID that handled the request |
| X-Kyrion-Provider | groq | Provider used: openai | anthropic | groq |
| X-Kyrion-Tier | simple | Routing tier: simple | medium | complex |
| X-Kyrion-Score | 0.082 | Complexity score 0.000–1.000 |
| X-Kyrion-Cached | false | true if served from Redis cache |
| X-Kyrion-Saved | $0.0031 | Cost saved vs direct GPT-4 call |
| X-Kyrion-Latency | 89ms | Total round-trip including Kyrion overhead |
| X-Kyrion-Overhead | 11ms | Kyrion scoring + routing overhead only |
| X-Kyrion-Degraded | false | true if a fallback model was used |
| X-Kyrion-RequestId | req_abc123 | Unique request ID for support & tracing |
Streaming
Server-sent events work exactly as with OpenAI.
Kyrion fully supports SSE streaming. Set stream: true (or use the stream helper in your SDK) and responses will be streamed back token-by-token from the selected provider.
import openai
client = openai.OpenAI(
api_key="kyr_live_...",
base_url="https://api.kyrion.dev/v1",
)
with client.chat.completions.stream(
model="kyrion-auto",
messages=[{"role": "user", "content": "Explain quantum computing"}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)data: [DONE] chunk when streaming.Errors & retries
Standard HTTP status codes with machine-readable error bodies.
| Status | Code | Meaning |
|---|---|---|
| 400 | invalid_request | Malformed request body or missing required field. |
| 401 | invalid_api_key | API key missing, invalid, or revoked. |
| 402 | insufficient_credits | Account has run out of credits. |
| 429 | rate_limit_exceeded | Too many requests. See X-RateLimit-* headers. |
| 500 | provider_error | All providers returned an error. Retry with backoff. |
| 503 | no_models_available | No capable model is healthy right now. |
Provider setup
Add your API keys once. Kyrion handles the rest.
Provider keys are stored AES-256 encrypted in Supabase Vault and are never exposed to your application. Kyrion uses them server-side to call providers on your behalf.
sk-...Used for: Medium, Complexsk-ant-...Used for: Complexgsk_...Used for: SimpleAdd keys in Dashboard → Providers. You only need the providers for the tiers you use — Kyrion skips providers with no key configured.
Rate limits
Per-account limits applied before provider limits.
| Plan | Requests / month | Requests / minute | Concurrent |
|---|---|---|---|
| Hobby | 5,000 | 20 | 5 |
| Startup | 200,000 | 200 | 50 |
| Pro | Unlimited | 600 | 200 |
Rate limit status is returned in every response via X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers.
SDK compatibility
If it works with OpenAI, it works with Kyrion.
Kyrion implements the full OpenAI REST API surface. Any library that can set a custom base_url works without modification.
Self-hosting
Deploy the routing engine on your own infrastructure.
The Kyrion routing engine is written in Go and available as a Docker image. You can deploy it on Fly.io, Railway, or any container-capable host. Self-hosted instances use the same API surface but bill directly to your provider accounts.
# Pull the image docker pull kyrion/gateway:latest # Run with your provider keys docker run -p 8080:8080 \ -e OPENAI_API_KEY=sk-... \ -e ANTHROPIC_API_KEY=sk-ant-... \ -e GROQ_API_KEY=gsk_... \ -e REDIS_URL=redis://localhost:6379 \ kyrion/gateway:latest
Environment variables
OPENAI_API_KEYRequired if using OpenAI modelsANTHROPIC_API_KEYRequired if using Anthropic modelsGROQ_API_KEYRequired if using Groq modelsREDIS_URLRedis connection string for cachingPORTListen port (default: 8080)LOG_LEVELdebug | info | warn | error (default: info)FAQ
Answers to the most common questions.
Does Kyrion store my prompts?+
Only the first 80 characters of each prompt are stored as a preview in your usage logs — never the full content. Cached responses are stored in Redis with your configured TTL (default 24h), then deleted. Provider keys are encrypted with AES-256-GCM and never logged.
What happens if a provider is down?+
The circuit breaker detects failures within seconds and automatically reroutes to the next healthy provider in the fallback chain. Your request completes — just via a different model. The X-Kyrion-Degraded: true header tells you when this happens.
Can I use my own OpenAI/Anthropic API keys?+
Yes. Add your own keys in Dashboard → Providers. Kyrion will use them for calls to those providers — billing goes directly to your provider accounts. If you don't add a key for a provider, Kyrion skips that provider entirely.
Is kyrion-auto just a random router?+
No. Each prompt gets a complexity score (0–1) based on action verbs, subjects, bigrams, and length. The score determines the tier, and the tier maps to the cheapest model that can handle it reliably. You can inspect the score on every response via X-Kyrion-Score.
Will switching to Kyrion break my existing code?+
No. Kyrion is fully OpenAI-compatible. Change api_key to your Kyrion key and base_url to https://api.kyrion.dev/v1 — every other parameter, model name, streaming flag, and response format works identically. You can revert in seconds.
What is the overhead added by Kyrion?+
Scoring and routing adds ~8–15ms per request. On a cache hit, the entire response is returned in under 2ms with zero provider latency. The X-Kyrion-Overhead header shows the exact overhead for every request.
Does caching work across my whole team?+
Yes. The Redis cache is shared across all API keys in your account. If two team members send identical prompts, the second request is served from cache at $0.00 cost regardless of which key they use.
Can I force a specific model instead of auto-routing?+
Yes. Pass any exact model name (e.g. gpt-4o, claude-3-5-sonnet-20241022) instead of kyrion-auto and Kyrion will route only to that model, while still applying caching and failover.
How do I migrate from direct OpenAI to Kyrion?+
Two environment variable changes: set OPENAI_API_KEY to your Kyrion key and OPENAI_BASE_URL to https://api.kyrion.dev/v1. No code changes required. Your existing SDK calls work as-is.
Is there a sandbox / test mode?+
Yes. Keys prefixed with kyr_test_ return synthetic responses without calling any provider. Use them in CI/CD pipelines and automated tests to avoid provider costs.
Changelog
What's new in Kyrion.
Older versions will appear here as Kyrion evolves.
