Under the Hood of XAI Gateway

Posted July 3, 2025 ‐ 5 min read

In today's AI-driven world, simply accessing a large language model (LLM) is not enough. Businesses require a gateway that is not only fast and reliable but also intelligent, secure, and cost-effective. At XAI, we've built a proxy that does exactly that.

This post pulls back the curtain on the architecture of the XAI Gateway, revealing how we deliver enterprise-grade performance, unparalleled reliability, and a suite of powerful features.

The Big Picture: Our Architecture

At its core, the XAI Gateway is a cluster of horizontally scalable Golang applications sitting between your services and the various upstream AI providers like OpenAI, Anthropic, and Google.

XAI XAPI Architecture

This architecture is deliberately designed around four key pillars: High Performance, High Availability, Enhanced Features, and Robust Security.

1. High Performance & Scalability

Speed is critical. We've engineered our system to minimize latency and handle massive request volumes.

Built with Go: We chose Golang for its high concurrency, compiled performance, and efficient memory management, making it perfect for I/O-bound tasks like proxying API requests.
Asynchronous Processing: We don't let heavy tasks slow down your requests. Usage calculation, logging, and database updates are offloaded to background workers via high-throughput channels (usageChan, logChan). This ensures the request-response cycle remains lightning-fast.
Multi-Layered Caching: The system utilizes a sophisticated caching strategy. Hot data like user credentials and rate-limit counters are stored in a distributed Redis cache for cluster-wide access, with an additional in-memory cache on each instance for near-instantaneous lookups.
Horizontal Scalability: Our proxy instances are stateless. All shared state is managed in Redis and PostgreSQL. This design means we can instantly scale out by adding more proxy instances behind a load balancer to meet any level of demand without downtime.

2. Unmatched Reliability & High Availability

An API gateway cannot be a single point of failure. Our system is built for resilience from the ground up.

Round-Robin Key Pooling: We don't rely on a single API key. We maintain pools of upstream API keys, categorized by performance tiers or "levels." Our smart round-robin scheduler distributes requests across these keys, preventing any single key from being rate-limited.
Automatic Failover & Retry Logic: This is our secret sauce. If a request to an upstream API fails (e.g., due to a rate limit 429 or a temporary server error 5xx), the XAI Gateway automatically and transparently retries the request with the next available key in the pool. Your application never sees the intermittent failure.
Cross-Tier Failover: For maximum reliability, the system can even failover to a different tier of keys if an entire level becomes unresponsive, ensuring critical requests always get through.
Real-time Configuration Sync: Any change made by an administrator—like adding a new key, updating a user's plan, or changing a routing rule—is instantly broadcast to all proxy instances in the cluster. This ensures immediate, cluster-wide consistency without needing to restart services.

3. Enhanced Features & Intelligence

Our proxy is more than just a pipe; it's an intelligent control plane for your AI operations.

Dynamic Model Mapping: You can request a generic model name like "gpt-4-best", and the proxy can intelligently map it to a specific, fine-tuned, or cost-effective backend model like "gpt-4o-mini" based on system-wide or user-specific rules. This simplifies client-side logic and allows for seamless model upgrades on the backend.
Intelligent Tiering (Key Levels): By grouping keys into levels, we can create sophisticated routing. For example, high-priority users can be routed to premium, high-rate-limit keys (Level 100), while background tasks can use more economical keys (Level 1).
Dynamic Key Discovery: In a unique and powerful feature, our proxy can analyze traffic to discover and validate new, working API keys, automatically adding them to the available pools. This self-healing and self-expanding capability further enhances system resilience.
Comprehensive & Precise Usage Tracking: We parse every response to accurately calculate token usage (prompt, completion, reasoning, etc.) and associated costs for a wide variety of models, including chat, image, and audio. This provides you with precise, real-time billing and budget control.

4. Robust Security

Security is non-negotiable. We've implemented a layered security model to protect your service and your data.

Multi-Layered Access Control (ACL): Every incoming request passes through a rigorous pipeline:
1. Authentication: Validates the user's API key.
2. IP Whitelisting: Ensures requests originate from authorized IP addresses or CIDR ranges.
3. User-Level Policies: Enforces status checks (e.g., active, suspended) and spending limits.
4. Model & Resource ACL: Granularly controls which users can access which models and API endpoints.
Per-User, Per-Model Rate Limiting: Go beyond simple global limits. You can define precise Requests-Per-Minute (RPM) and Tokens-Per-Minute (TPM) limits for each user, and even for specific models used by that user.
Secure Credential Management: All sensitive data, such as upstream API keys and user credentials, is encrypted at rest in our persistent database.

The XAI Gateway is an intelligent, resilient, and highly-performant gateway engineered to solve the real-world challenges of building and scaling AI-powered applications. By combining intelligent routing, automatic failover, and robust security, we provide a solid foundation you can build your business on with confidence.