Under the Hood of XAI Router

Posted July 3, 2025 ‐ 5 min read

In today's AI-driven world, simply accessing a large language model (LLM) is not enough. Businesses require a router that is not only fast and reliable but also intelligent, secure, and cost-effective. At XAI, we've built a proxy that does exactly that.

This post pulls back the curtain on the architecture of the XAI Router, revealing how we deliver enterprise-grade performance, unparalleled reliability, and a suite of powerful features.

The Big Picture: Our Architecture

At its core, the XAI Router is a cluster of horizontally scalable applications built on a Rust-native async runtime, sitting between your services and the various upstream AI providers like OpenAI, Anthropic, and Google.

This architecture is deliberately designed around four key pillars: High Performance, High Availability, Enhanced Features, and Robust Security.

1. High Performance & Scalability

Speed is critical. We've engineered our system to minimize latency and handle massive request volumes.

Asynchronous Processing: We don't let heavy tasks slow down your requests. Usage calculation, logging, and database updates are offloaded to background workers via high-throughput async queues. This ensures the request-response cycle remains lightning-fast.
Multi-Layered Caching: The system utilizes a sophisticated caching strategy. Hot data like user credentials and rate-limit counters are stored in a distributed Redis cache for cluster-wide access, with an additional in-memory cache on each instance for near-instantaneous lookups.
Horizontal Scalability: Our proxy instances are stateless. All shared state is managed in Redis and PostgreSQL. This design means we can instantly scale out by adding more proxy instances behind a load balancer to meet any level of demand without downtime.

2. Unmatched Reliability & High Availability

An API router cannot be a single point of failure. Our system is built for resilience from the ground up.

Round-Robin Key Pooling: We don't rely on a single API key. We maintain pools of upstream API keys, categorized by performance tiers or "levels." Our smart round-robin scheduler distributes requests across these keys, preventing any single key from being rate-limited.
Automatic Failover & Retry Logic: This is our secret sauce. If a request to an upstream API fails (e.g., due to a rate limit 429 or a temporary server error 5xx), the XAI Router automatically and transparently retries the request with the next available key in the pool. Your application never sees the intermittent failure.
Cross-Tier Failover: For maximum reliability, the system can even failover to a different tier of keys if an entire level becomes unresponsive, ensuring critical requests always get through.
Real-time Configuration Sync: Any change made by an administrator—like adding a new key, updating a user's plan, or changing a routing rule—is instantly broadcast to all proxy instances in the cluster. This ensures immediate, cluster-wide consistency without needing to restart services.

3. Enhanced Features & Intelligence

Our proxy is more than just a pipe; it's an intelligent control plane for your AI operations.

Dynamic Model Mapping: You can request a generic model name like "gpt-4-best", and the proxy can intelligently map it to a specific, fine-tuned, or cost-effective backend model like "gpt-4o-mini" based on system-wide or user-specific rules. This simplifies client-side logic and allows for seamless model upgrades on the backend.
Intelligent Tiering (Key Levels): By grouping keys into levels, we can create sophisticated routing. For example, high-priority users can be routed to premium, high-rate-limit keys (Level 100), while background tasks can use more economical keys (Level 1).
Dynamic Key Discovery: In a unique and powerful feature, our proxy can analyze traffic to discover and validate new, working API keys, automatically adding them to the available pools. This self-healing and self-expanding capability further enhances system resilience.
Comprehensive & Precise Usage Tracking: We parse every response to accurately calculate token usage (prompt, completion, reasoning, etc.) and associated costs for a wide variety of models, including chat, image, and audio. This provides you with precise, real-time billing and budget control.

4. Robust Security

Security is non-negotiable. We've implemented a layered security model to protect your service and your data.

Multi-Layered Access Control (ACL): Every incoming request passes through a rigorous pipeline:
1. Authentication: Validates the user's API key.
2. IP allowlisting: Ensures requests originate from authorized IP addresses or CIDR ranges.
3. User-Level Policies: Enforces status checks (e.g., active, suspended) and spending limits.
4. Model & Resource ACL: Granularly controls which users can access which models and API endpoints.
Per-User, Per-Model Rate Limiting: Go beyond simple global limits. You can define precise Requests-Per-Minute (RPM) and Tokens-Per-Minute (TPM) limits for each user, and even for specific models used by that user.
Secure Credential Management: All sensitive data, such as upstream API keys and user credentials, is encrypted at rest in our persistent database.

The XAI Router is an intelligent, resilient, and highly-performant router engineered to solve the real-world challenges of building and scaling AI-powered applications. By combining intelligent routing, automatic failover, and robust security, we provide a solid foundation you can build your business on with confidence.