Swift on the Server — Hummingbird 2 and a Local LLM Gateway

Swift on the Server — Hummingbird 2 and a Local LLM Gateway

Local LLM inference, compatible with both the Anthropic and OpenAI APIs, a 12 MB binary on macOS and Linux: three years ago that would have sounded like a weekend hack with ten moving parts. With Swift and Hummingbird it is a clean, self-contained service in 2026. In this series we build exactly that: an LLM gateway running against a local MLX model on Apple Silicon, serving both the Anthropic Messages API and the OpenAI-compatible interface, with streaming, API-key auth, rate limiting, observability, and a path to Linux deployment. This first article explains why Swift makes sense for this, why we chose Hummingbird, and what the series delivers in the end.

Swift on the Server, Status 2026

Anyone who last looked at the Swift-on-server ecosystem three or four years ago knew a situation that could be summed up as “interesting, but not yet production-ready.” That picture has changed.

Two migration reports cited in Swift.org’s server ecosystem overview illustrate the shift concretely: Apple’s internal password monitoring team migrated its Java stack to Swift and documented a 40% increase in throughput, 50% reduction in hardware requirements, and a 90% decrease in memory usage. Cultured Code migrated the backend for “Things” from Python to Swift and reports 4× performance at two thirds of the original operating cost.

These are not marketing benchmarks; both are internal production migrations, measured in real workloads. In both cases Swift was not chosen for its ecosystem or tooling maturity, but for a concrete performance and memory profile: no garbage-collector pauses, no JVM overhead, minimal binary footprint.

The ecosystem has caught up over the past two years. The Swift Server Workgroup now coordinates a package ecosystem listed on the Swift Package Index with Linux build verification; every search result shows whether a package builds on Ubuntu 22.04. Areweserveryet.org still exists, but the question of whether Swift is production-ready is rarely asked anymore. ServerSide.swift ran for the fifth time in October 2025, in London.

For iOS developers, there is an additional argument: Swift 6 and Structured Concurrency are now the same language on macOS, on iOS, and on the server. Anyone comfortable with async/await and Sendable constraints can read a Hummingbird backend. Vapor reports zero data-race crashes since adopting structured concurrency, a number that would be unthinkable in Node or Python.

Hummingbird in Relation to Vapor

Vapor is the older of the two frameworks, broader in scope and with a significantly larger user base. The team positions Vapor as a “one-stop shop”: HTTP/2, TLS, authentication, caching, compression, validation, WebSockets and queues are all bundled in the framework or in official Vapor packages. Anyone who wants to move fast and prefers to source all infrastructure from a single place is well served by Vapor.

Hummingbird, built by Adam Fowler, takes the opposite approach. The core covers SwiftNIO-based routing, the application lifecycle, and the middleware protocol. Authentication, database integration, WebSockets and compression come as separate packages, added only when needed. This keeps the build smaller and runtime overhead lower, at the cost of more deliberate composition decisions.

The most telling statement on this question comes from the Vapor core team itself: in the Vapor 5 roadmap post, the team announces that the new HTTP server foundation will be based on “Adam’s great work from Hummingbird.” The opposite of competition. The two frameworks are converging at the infrastructure level; the ecosystem becomes more coherent, not more fragmented.

In the Web Frameworks Benchmark (Swift 6.2, May 2026), Hummingbird 2.17 achieves 11,215 requests/sec, Vapor 4.119 achieves 8,859 requests/sec. Both numbers are more than sufficient for server-side workloads; the difference is not a decision criterion, but it shows what the thinner abstraction layer buys.

We use Hummingbird because the scope matches. We are not building a production web framework for a team of twenty developers, but a tightly scoped gateway service: HTTP routing, middleware for auth and rate limiting, streaming responses, and a background service for the MLX process. Hummingbird’s minimal core with targeted module additions fits that scope better.

What Hummingbird 2 Does Differently

Hummingbird 2, released September 2024, is a complete rewrite. Adam Fowler writes: “This is the version of Hummingbird I wanted to write initially but wasn’t able to because the language features weren’t ready.” That sentence names what makes HB2 different: not an evolutionary step, but a framework that only became possible after Swift 6 and Structured Concurrency were finished.

Concretely: Hummingbird 1 layered Swift Concurrency over an EventLoopFuture base. That was practical but created structural limits. Structured Concurrency, task locals and task cancellation remained only half-accessible. HB2 has removed EventLoopFutures entirely. All route handlers are async, all middleware types @Sendable, every NIO reference replaced by native concurrency primitives.

The most important change in the API design is the generic RequestContext. In HB1, everything was tied to a central Request type; carrying custom values across the request lifecycle meant either subclassing the request type or falling back on global state containers. HB2 cleanly separates context and request: you define your own protocol conforming to RequestContext, carry arbitrary properties there, and the framework code stays generic over that type. For our LLM gateway, that means API key, tenant ID and rate-limit budget land in a GatewayRequestContext, without any framework patches.

Further technical points relevant to later articles: the HummingbirdCore layer has been merged into the main repository, simplifying the dependency graph. The router uses an optimised trie algorithm alongside the new Swift HTTP Types library. Service lifecycle follows Swift Service Lifecycle v2: graceful shutdown means that in-flight requests finish before the process exits.

The three-layer architecture from the documentation is a useful mental model: HummingbirdCore as a pure HTTP server on SwiftNIO, Hummingbird as the web application framework above it, and separate extension modules pulled in as needed. Each layer can be used independently, or in the combination the service actually requires.

Why a Local LLM Gateway as a Demo

Anyone running a local model on Apple Silicon, whether Qwen3, Mistral, Llama or another from the MLX Community Hub, has two options: wrap Python/FastAPI around mlx_lm.server, which ships an integrated OpenAI-compatible server with MLX-lm. Or build the gateway in Swift.

The argument for Swift is not purism, it is stack coherence. Apple Silicon, MLX and Swift share the same native stack: the same language runtime, the same memory model, the same build toolchain. No Python interpreter on a machine set up for Swift development, no language boundary between gateway code and deployment artifact. The end result of this series, a fully statically linked binary built via the Swift Static Linux SDK (SE-0387, available since Swift 5.9), is 12 MB and runs without dependencies on any Linux host.

The second reason is dual protocol compatibility. The gateway serves two interfaces: the Anthropic Messages API (/v1/messages) for Claude Code, and the OpenAI-compatible interface (/v1/chat/completions) for Cursor, Aider, and other tools. The local gateway becomes a drop-in replacement for both cloud APIs: no API key, no network latency, no privacy concerns for internal documents.

The third reason is didactic. The demo project moves naturally through Hummingbird’s key features. Routing comes in article 1. In article 2 both protocol implementations follow (Anthropic first, then OpenAI), in article 3 asynchronous backend calls. Article 4 brings streaming via Server-Sent Events, the most technically demanding piece in the series and the point where Hummingbird’s AsyncSequence integration stands out. In article 5 the Custom RequestContext for multi-tenancy follows, in article 6 observability and cross-compilation.

No Hummingbird feature is added to this project because the series needs it. Every feature appears because the gateway needs it.

Architecture of swift-mlx-gateway

How the Series Is Structured

The series covers six articles, plus an optional bonus:

  • Article 1 — Hello Hummingbird: swift package init, Package.swift, minimal router with /healthz and /v1/models, application lifecycle, logger setup
  • Article 2 — Anthropic Messages API & OpenAI Spec: both protocols, one backend — /v1/messages for Claude Code, /v1/chat/completions for Cursor and Aider
  • Article 3 — Connecting the MLX Backend: MLXClient as async wrapper, model routing from configuration, timeout handling, first real inference with a quantized model on Apple Silicon
  • Article 4 — Streaming with Server-Sent Events: SSE-compliant streaming via AsyncSequence, connection cancellation, token-by-token passthrough
  • Article 5 — Auth & Generic RequestContext: API-key middleware, custom GatewayRequestContext with tenant and rate-limit budget, PostgresNIO for audit logging
  • Article 6 — Observability & Linux Deployment: Prometheus metrics, distributed tracing, cross-compilation with the Swift Static Linux SDK, container build under 50 MB
  • Article 7 (optional): performance benchmark against FastAPI and Node; bridge to Season II

The code lives at codeberg.org/rotecodefraktion/swift-mlx-gateway. Every article that introduces code gets a Git tag (article-01, article-02, …); readers joining from article 3 onward can check out the article-02 tag and continue from there.

Article 1 starts with the skeleton.

Sources