Under the Hood of XAI Gateway
Posted July 3, 2025 ‐ 5 min read
In today's AI-driven world, simply accessing a large language model (LLM) is not enough. Businesses require a gateway that is not only fast and reliable but also intelligent, secure, and cost-effective. At XAI, we've built a proxy that does exactly that.
This post pulls back the curtain on the architecture of the XAI Gateway, revealing how we deliver enterprise-grade performance, unparalleled reliability, and a suite of powerful features.
The Big Picture: Our Architecture
At its core, the XAI Gateway is a cluster of horizontally scalable Golang applications sitting between your services and the various upstream AI providers like OpenAI, Anthropic, and Google.
This architecture is deliberately designed around four key pillars: High Performance, High Availability, Enhanced Features, and Robust Security.
1. High Performance & Scalability
Speed is critical. We've engineered our system to minimize latency and handle massive request volumes.
- Built with Go: We chose Golang for its high concurrency, compiled performance, and efficient memory management, making it perfect for I/O-bound tasks like proxying API requests.
- Asynchronous Processing: We don't let heavy tasks slow down your requests. Usage calculation, logging, and database updates are offloaded to background workers via high-throughput channels (
usageChan
,logChan
). This ensures the request-response cycle remains lightning-fast. - Multi-Layered Caching: The system utilizes a sophisticated caching strategy. Hot data like user credentials and rate-limit counters are stored in a distributed Redis cache for cluster-wide access, with an additional in-memory cache on each instance for near-instantaneous lookups.
- Horizontal Scalability: Our proxy instances are stateless. All shared state is managed in Redis and PostgreSQL. This design means we can instantly scale out by adding more proxy instances behind a load balancer to meet any level of demand without downtime.
2. Unmatched Reliability & High Availability
An API gateway cannot be a single point of failure. Our system is built for resilience from the ground up.
- Round-Robin Key Pooling: We don't rely on a single API key. We maintain pools of upstream API keys, categorized by performance tiers or "levels." Our smart round-robin scheduler distributes requests across these keys, preventing any single key from being rate-limited.
- Automatic Failover & Retry Logic: This is our secret sauce. If a request to an upstream API fails (e.g., due to a rate limit
429
or a temporary server error5xx
), the XAI Gateway automatically and transparently retries the request with the next available key in the pool. Your application never sees the intermittent failure. - Cross-Tier Failover: For maximum reliability, the system can even failover to a different tier of keys if an entire level becomes unresponsive, ensuring critical requests always get through.
- Real-time Configuration Sync: Any change made by an administratorβlike adding a new key, updating a user's plan, or changing a routing ruleβis instantly broadcast to all proxy instances in the cluster. This ensures immediate, cluster-wide consistency without needing to restart services.
3. Enhanced Features & Intelligence
Our proxy is more than just a pipe; it's an intelligent control plane for your AI operations.
- Dynamic Model Mapping: You can request a generic model name like
"gpt-4-best"
, and the proxy can intelligently map it to a specific, fine-tuned, or cost-effective backend model like"gpt-4o-mini"
based on system-wide or user-specific rules. This simplifies client-side logic and allows for seamless model upgrades on the backend. - Intelligent Tiering (Key Levels): By grouping keys into levels, we can create sophisticated routing. For example, high-priority users can be routed to premium, high-rate-limit keys (Level 100), while background tasks can use more economical keys (Level 1).
- Dynamic Key Discovery: In a unique and powerful feature, our proxy can analyze traffic to discover and validate new, working API keys, automatically adding them to the available pools. This self-healing and self-expanding capability further enhances system resilience.
- Comprehensive & Precise Usage Tracking: We parse every response to accurately calculate token usage (prompt, completion, reasoning, etc.) and associated costs for a wide variety of models, including chat, image, and audio. This provides you with precise, real-time billing and budget control.
4. Robust Security
Security is non-negotiable. We've implemented a layered security model to protect your service and your data.
- Multi-Layered Access Control (ACL): Every incoming request passes through a rigorous pipeline:
- Authentication: Validates the user's API key.
- IP Whitelisting: Ensures requests originate from authorized IP addresses or CIDR ranges.
- User-Level Policies: Enforces status checks (e.g., active, suspended) and spending limits.
- Model & Resource ACL: Granularly controls which users can access which models and API endpoints.
- Per-User, Per-Model Rate Limiting: Go beyond simple global limits. You can define precise Requests-Per-Minute (RPM) and Tokens-Per-Minute (TPM) limits for each user, and even for specific models used by that user.
- Secure Credential Management: All sensitive data, such as upstream API keys and user credentials, is encrypted at rest in our persistent database.
The XAI Gateway is an intelligent, resilient, and highly-performant gateway engineered to solve the real-world challenges of building and scaling AI-powered applications. By combining intelligent routing, automatic failover, and robust security, we provide a solid foundation you can build your business on with confidence.