Understand Our Enterprise AI Router in 3 Minutes — Architecture and Quick Start

Posted October 5, 2025 ‐ 3 min read

You probably care about two things: 1) what this is, 2) can I use it now. This post first shows how to run it in 60 seconds, then explains our architecture and security in simple words.

If you prefer using official accounts (not API keys) with tools, see:
• Claude Code via official account → Use Claude Code with XAI Control
• OpenAI Codex via official account → Proxy OpenAI Codex via XAI Control

What it is — one sentence

An enterprise‑grade AI API entry. Change your base URL and use your own provider keys (BYOK) to call OpenAI / Anthropic / OpenAI Codex / Claude Code — with built‑in rate limits, allowlists, smart routing, and auditable usage.

What it supports (out of the box)

OpenAI compatible: /v1/chat/completions, /v1/responses, /v1/embeddings, /v1/audio/*, etc.
Anthropic compatible: /v1/messages
OpenAI Codex / gpt‑*-codex for code completion
Claude Code (IDE coding assistant): see “Claude Code”
Others: Gemini, Mistral, Cohere, DeepSeek, Grok, Perplexity… (plug in your provider keys)

60‑second quick start

Get an API key (your provider key)
Just change the base URL:

export XAI_API_KEY="sk-Xvs..."
curl https://api.xaixapi.com/v1/chat/completions \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hello"}]}'

Node.js:

import OpenAI from "openai";
const client = new OpenAI({ apiKey: process.env.XAI_API_KEY, baseURL: "https://api.xaixapi.com/v1" });
const res = await client.chat.completions.create({ model: "gpt-4o-mini", messages: [{ role: "user", content: "hello" }] });

Python:

from openai import OpenAI
client = OpenAI(api_key="sk-Xvs...", base_url="https://api.xaixapi.com/v1")
resp = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role":"user","content":"hello"}])

No SDK rewrites — just change the base URL.

Architecture (plain‑English)

We talk about three planes. The names sound academic; the ideas are simple:

Control plane (rules)
- Model Mapper — alias old names to new models; switch without app changes.
- Level Mapper — send different models to different pools by cost/latency/stability.
- Switch Over — explicit “primary→backup” relations (e.g., 1→2) with clear triggers.
- Resources allowlist — restrict which API paths are callable.
- Model Limits — hard per‑model RPM/TPM caps to prevent cost spikes.
Routing plane (fast path)
- Same‑level round‑robin with atomic indices and fine‑grained locks.
- Sleep on errors — bad keys cool down without punishing healthy keys.
- Explicit switch over — only after same‑level retries fail; no black‑box fallbacks.
- Transport tuning — hot pools, HTTP/2, large buffers, streaming with backpressure.
Usage plane (see clearly)
- Per‑user/model metering with daily/monthly rollups (requests/tokens/cost).
- Governance in one place: IP/Model/Resource allowlists + dashboards/logs.

Security & compliance (four lines)

BYOK‑first: always your keys; no silent fallbacks; clear billing/audit boundaries.
Key encryption: sanitize + encrypt at rest; decrypt just‑in‑time; never log secrets.
Allowlists enforced: IP / Models / Resources with inheritance that never exceeds the parent.
Fail fast: exceed limits or violate policy → explicit failure, not invisible overspend.

Cost & governance (straightforward)

Direct billing to providers; your discounts apply.
Hard caps per user/model; protect first, then serve.
Zero markup in self‑hosted mode; transparent policies, predictable totals.

Next steps

60‑second “Quick Start” and “Proxy API”.
Ops & users: “Admin Console”, “Manage Console”.
IDE coding: “Claude Code”.