Understand Our Enterprise AI Router in 3 Minutes โ€” Architecture and Quick Start

Posted October 5, 2025ย โ€ย 3ย min read

You probably care about two things: 1) what this is, 2) can I use it now. This post first shows how to run it in 60 seconds, then explains our architecture and security in simple words.

If you prefer using official accounts (not API keys) with tools, see:
โ€ข Claude Code via official account โ†’ Use Claude Code with XAI Control
โ€ข OpenAI Codex via official account โ†’ Proxy OpenAI Codex via XAI Control

What it is โ€” one sentence

An enterpriseโ€‘grade AI API entry. Change your base URL and use your own provider keys (BYOK) to call OpenAI / Anthropic / OpenAI Codex / Claude Code โ€” with builtโ€‘in rate limits, allowlists, smart routing, and auditable usage.

What it supports (out of the box)

  • OpenAI compatible: /v1/chat/completions, /v1/responses, /v1/embeddings, /v1/audio/*, etc.
  • Anthropic compatible: /v1/messages
  • OpenAI Codex / gptโ€‘*-codex for code completion
  • Claude Code (IDE coding assistant): see โ€œClaude Codeโ€
  • Others: Gemini, Mistral, Cohere, DeepSeek, Grok, Perplexityโ€ฆ (plug in your provider keys)

60โ€‘second quick start

  1. Get an API key (your provider key)

  2. Just change the base URL:

export XAI_API_KEY="sk-Xvs..."
curl https://api.xaixapi.com/v1/chat/completions \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hello"}]}'

Node.js:

import OpenAI from "openai";
const client = new OpenAI({ apiKey: process.env.XAI_API_KEY, baseURL: "https://api.xaixapi.com/v1" });
const res = await client.chat.completions.create({ model: "gpt-4o-mini", messages: [{ role: "user", content: "hello" }] });

Python:

from openai import OpenAI
client = OpenAI(api_key="sk-Xvs...", base_url="https://api.xaixapi.com/v1")
resp = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role":"user","content":"hello"}])

No SDK rewrites โ€” just change the base URL.

Architecture (plainโ€‘English)

We talk about three planes. The names sound academic; the ideas are simple:

  1. Control plane (rules)

    • Model Mapper โ€” alias old names to new models; switch without app changes.
    • Level Mapper โ€” send different models to different pools by cost/latency/stability.
    • Switch Over โ€” explicit โ€œprimaryโ†’backupโ€ relations (e.g., 1โ†’2) with clear triggers.
    • Resources allowlist โ€” restrict which API paths are callable.
    • Model Limits โ€” hard perโ€‘model RPM/TPM caps to prevent cost spikes.
  2. Routing plane (fast path)

    • Sameโ€‘level roundโ€‘robin with atomic indices and fineโ€‘grained locks.
    • Sleep on errors โ€” bad keys cool down without punishing healthy keys.
    • Explicit switch over โ€” only after sameโ€‘level retries fail; no blackโ€‘box fallbacks.
    • Transport tuning โ€” hot pools, HTTP/2, large buffers, streaming with backpressure.
  3. Usage plane (see clearly)

    • Perโ€‘user/model metering with daily/monthly rollups (requests/tokens/cost).
    • Governance in one place: IP/Model/Resource allowlists + dashboards/logs.

Security & compliance (four lines)

  • BYOKโ€‘first: always your keys; no silent fallbacks; clear billing/audit boundaries.
  • Key encryption: sanitize + encrypt at rest; decrypt justโ€‘inโ€‘time; never log secrets.
  • Allowlists enforced: IP / Models / Resources with inheritance that never exceeds the parent.
  • Fail fast: exceed limits or violate policy โ†’ explicit failure, not invisible overspend.

Cost & governance (straightforward)

  • Direct billing to providers; your discounts apply.
  • Hard caps per user/model; protect first, then serve.
  • Zero markup in selfโ€‘hosted mode; transparent policies, predictable totals.

Next steps