Understand Our Enterprise AI Router in 3 Minutes โ Architecture and Quick Start
Posted October 5, 2025ย โย 3ย min read
You probably care about two things: 1) what this is, 2) can I use it now. This post first shows how to run it in 60 seconds, then explains our architecture and security in simple words.
If you prefer using official accounts (not API keys) with tools, see:
โข Claude Code via official account โ Use Claude Code with XAI Control
โข OpenAI Codex via official account โ Proxy OpenAI Codex via XAI Control
โข Claude Code via official account โ Use Claude Code with XAI Control
โข OpenAI Codex via official account โ Proxy OpenAI Codex via XAI Control
What it is โ one sentence
An enterpriseโgrade AI API entry. Change your base URL and use your own provider keys (BYOK) to call OpenAI / Anthropic / OpenAI Codex / Claude Code โ with builtโin rate limits, allowlists, smart routing, and auditable usage.
What it supports (out of the box)
- OpenAI compatible:
/v1/chat/completions
,/v1/responses
,/v1/embeddings
,/v1/audio/*
, etc. - Anthropic compatible:
/v1/messages
- OpenAI Codex / gptโ*-codex for code completion
- Claude Code (IDE coding assistant): see โClaude Codeโ
- Others: Gemini, Mistral, Cohere, DeepSeek, Grok, Perplexityโฆ (plug in your provider keys)
60โsecond quick start
Get an API key (your provider key)
Just change the base URL:
export XAI_API_KEY="sk-Xvs..."
curl https://api.xaixapi.com/v1/chat/completions \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hello"}]}'
Node.js:
import OpenAI from "openai";
const client = new OpenAI({ apiKey: process.env.XAI_API_KEY, baseURL: "https://api.xaixapi.com/v1" });
const res = await client.chat.completions.create({ model: "gpt-4o-mini", messages: [{ role: "user", content: "hello" }] });
Python:
from openai import OpenAI
client = OpenAI(api_key="sk-Xvs...", base_url="https://api.xaixapi.com/v1")
resp = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role":"user","content":"hello"}])
No SDK rewrites โ just change the base URL.
Architecture (plainโEnglish)
We talk about three planes. The names sound academic; the ideas are simple:
Control plane (rules)
- Model Mapper โ alias old names to new models; switch without app changes.
- Level Mapper โ send different models to different pools by cost/latency/stability.
- Switch Over โ explicit โprimaryโbackupโ relations (e.g., 1โ2) with clear triggers.
- Resources allowlist โ restrict which API paths are callable.
- Model Limits โ hard perโmodel RPM/TPM caps to prevent cost spikes.
Routing plane (fast path)
- Sameโlevel roundโrobin with atomic indices and fineโgrained locks.
- Sleep on errors โ bad keys cool down without punishing healthy keys.
- Explicit switch over โ only after sameโlevel retries fail; no blackโbox fallbacks.
- Transport tuning โ hot pools, HTTP/2, large buffers, streaming with backpressure.
Usage plane (see clearly)
- Perโuser/model metering with daily/monthly rollups (requests/tokens/cost).
- Governance in one place: IP/Model/Resource allowlists + dashboards/logs.
Security & compliance (four lines)
- BYOKโfirst: always your keys; no silent fallbacks; clear billing/audit boundaries.
- Key encryption: sanitize + encrypt at rest; decrypt justโinโtime; never log secrets.
- Allowlists enforced: IP / Models / Resources with inheritance that never exceeds the parent.
- Fail fast: exceed limits or violate policy โ explicit failure, not invisible overspend.
Cost & governance (straightforward)
- Direct billing to providers; your discounts apply.
- Hard caps per user/model; protect first, then serve.
- Zero markup in selfโhosted mode; transparent policies, predictable totals.
Next steps
- 60โsecond โQuick Startโ and โProxy APIโ.
- Ops & users: โAdmin Consoleโ, โManage Consoleโ.
- IDE coding: โClaude Codeโ.