Two Protocols, One Backend — Anthropic and OpenAI with Codable

In Article 1, our server answered /healthz and a static /v1/models endpoint. Now we turn the skeleton into a gateway that speaks two real LLM API standards: the Anthropic Messages API for Claude Code, and the OpenAI Chat Completions API for Cursor, Aider, and other tools. Both endpoints return mock responses for now; the real MLX backend arrives in Article 3.

The Two Formats Compared

Both protocols follow the same basic idea: a messages array with role and content, plus control parameters like model and temperature. The differences are in the details.

Field	Anthropic	OpenAI
`system`	Top-level string	Message with `role: "system"`
`max_tokens`	Required	Optional
`content` in request	String or block array	String
`content` in response	Array of content blocks	String in `choices[i].message.content`
Token fields	`input_tokens`, `output_tokens`	`prompt_tokens`, `completion_tokens`, `total_tokens`
Finish field	`stop_reason`	`finish_reason`
Response wrapper	Direct	`choices[]`

The biggest gotcha is the system prompt: Anthropic uses a dedicated top-level field, OpenAI uses a regular message with role: "system" in the array. The gateway needs to know both conventions.

We create two new files: Sources/gateway/Models/Anthropic.swift and Sources/gateway/Models/OpenAI.swift.

Codable Types for Anthropic

The most interesting part of the Anthropic spec is the content field in messages: it can be a plain string or an array of content blocks. Claude Code typically sends strings; other clients sometimes send structured blocks. Our gateway needs to accept both.

We use a Swift enum with a custom init(from:):

//  Models/Anthropic.swift

import Foundation
import Hummingbird

enum AnthropicRole: String, Codable {
    case user
    case assistant
}

enum AnthropicContent: Codable {
    case text(String)
    case blocks([AnthropicContentBlock])

    init(from decoder: Decoder) throws {
        let container = try decoder.singleValueContainer()
        if let string = try? container.decode(String.self) {
            self = .text(string)
            return
        }
        if let blocks = try? container.decode([AnthropicContentBlock].self) {
            self = .blocks(blocks)
            return
        }
        throw DecodingError.typeMismatch(
            AnthropicContent.self,
            .init(
                codingPath: decoder.codingPath,
                debugDescription: "content must be a string or an array of content blocks"
            )
        )
    }

    func encode(to encoder: Encoder) throws {
        var container = encoder.singleValueContainer()
        switch self {
        case .text(let string):
            try container.encode(string)
        case .blocks(let blocks):
            try container.encode(blocks)
        }
    }

    var asText: String {
        switch self {
        case .text(let string):
            return string
        case .blocks(let blocks):
            return blocks.map(\.text).joined(separator: "\n")
        }
    }
}

struct AnthropicContentBlock: Codable {
    let type: String
    let text: String
}

struct AnthropicMessage: Codable {
    let role: AnthropicRole
    let content: AnthropicContent
}

struct MessageRequest: Codable {
    let model: String
    let maxTokens: Int
    let system: AnthropicContent?
    let messages: [AnthropicMessage]
    let stream: Bool?
    let temperature: Double?
    let topP: Double?
    let stopSequences: [String]?

    enum CodingKeys: String, CodingKey {
        case model
        case maxTokens = "max_tokens"
        case system
        case messages
        case stream
        case temperature
        case topP = "top_p"
        case stopSequences = "stop_sequences"
    }
}

struct MessageResponse: Codable, ResponseGenerator {
    let id: String
    let type: String
    let role: AnthropicRole
    let content: [AnthropicContentBlock]
    let model: String
    let stopReason: String
    let stopSequence: String?
    let usage: AnthropicUsage

    enum CodingKeys: String, CodingKey {
        case id, type, role, content, model
        case stopReason = "stop_reason"
        case stopSequence = "stop_sequence"
        case usage
    }

    public func response(from request: Request, context: some RequestContext) throws -> Response {
        let encoder = JSONEncoder()
        let data = try encoder.encode(self)
        return Response(
            status: .ok,
            headers: [.contentType: "application/json"],
            body: .init(byteBuffer: .init(data: data))
        )
    }
}

struct AnthropicUsage: Codable {
    let inputTokens: Int
    let outputTokens: Int

    enum CodingKeys: String, CodingKey {
        case inputTokens = "input_tokens"
        case outputTokens = "output_tokens"
    }
}

Three observations about this code.

singleValueContainer instead of a keyed container, because the content field at that position in the JSON is not a key-value pair but a bare value or array. The try? pattern attempts the simpler case first (string), fails silently if that does not match, then tries blocks. Only when both fail does it throw a controlled error.

asText on the enum is convenience for processing: regardless of which format the content arrived in, we can always extract it as a string. The mock inference relies on this.

In the response, content is always [AnthropicContentBlock], not the request-side enum. The Anthropic spec is flexible about request format and strict about response format; we mirror that exactly.

Codable Types for OpenAI

The OpenAI types are structurally similar but consistently simpler: content is always a string, system comes as a regular message, and most parameters are optional.

//  Models/OpenAI.swift

import Foundation
import Hummingbird

enum ChatRole: String, Codable {
    case system
    case user
    case assistant
    case tool
}

struct ChatMessage: Codable {
    let role: ChatRole
    let content: String
}

struct ChatCompletionRequest: Codable {
    let model: String
    let messages: [ChatMessage]
    let maxTokens: Int?
    let temperature: Double?
    let topP: Double?
    let stream: Bool?
    let stop: [String]?
    let presencePenalty: Double?
    let frequencyPenalty: Double?
    let user: String?

    enum CodingKeys: String, CodingKey {
        case model, messages, temperature, stream, stop, user
        case maxTokens = "max_tokens"
        case topP = "top_p"
        case presencePenalty = "presence_penalty"
        case frequencyPenalty = "frequency_penalty"
    }
}

struct ChatCompletionResponse: Codable, ResponseGenerator {
    let id: String
    let object: String
    let created: Int
    let model: String
    let choices: [ChatCompletionChoice]
    let usage: ChatCompletionUsage

    public func response(from request: Request, context: some RequestContext) throws -> Response {
        let encoder = JSONEncoder()
        let data = try encoder.encode(self)
        return Response(
            status: .ok,
            headers: [.contentType: "application/json"],
            body: .init(byteBuffer: .init(data: data))
        )
    }
}

struct ChatCompletionChoice: Codable {
    let index: Int
    let message: ChatMessage
    let finishReason: String

    enum CodingKeys: String, CodingKey {
        case index, message
        case finishReason = "finish_reason"
    }
}

struct ChatCompletionUsage: Codable {
    let promptTokens: Int
    let completionTokens: Int
    let totalTokens: Int

    enum CodingKeys: String, CodingKey {
        case promptTokens = "prompt_tokens"
        case completionTokens = "completion_tokens"
        case totalTokens = "total_tokens"
    }
}

ChatRole includes .tool alongside system, user, and assistant — for future function-calling extensions that are outside the scope of this series, but cleanly modelled now. maxTokens is optional; OpenAI has a server-side default, Anthropic does not.

Routes and Mock Inference

The two new POST endpoints go into Router+build.swift. request.decode(as:context:) is Hummingbird’s standard path: BasicRequestContext comes with a JSON decoder that maps the request body directly into our Codable types.

// Additions in Router+build.swift

// Anthropic Messages API
router.post("v1/messages") { request, context -> MessageResponse in
    let payload = try await request.decode(as: MessageRequest.self, context: context)
    try validate(payload)
    return mockResponse(for: payload)
}

// OpenAI Chat Completions
router.post("v1/chat/completions") { request, context -> ChatCompletionResponse in
    let payload = try await request.decode(as: ChatCompletionRequest.self, context: context)
    try validate(payload)
    return mockResponse(for: payload)
}

Validation checks the minimum requirements of both specs:

private func validate(_ request: MessageRequest) throws {
    guard !request.messages.isEmpty else {
        throw HTTPError(.badRequest, message: "messages array must not be empty")
    }
    guard request.maxTokens > 0 else {
        throw HTTPError(.badRequest, message: "max_tokens must be greater than 0")
    }
}

private func validate(_ request: ChatCompletionRequest) throws {
    guard !request.messages.isEmpty else {
        throw HTTPError(.badRequest, message: "messages array must not be empty")
    }
}

HTTPError(.badRequest, message:) serialises in Hummingbird to {"error":{"message":"..."}}. That does not match the exact Anthropic or OpenAI error format; spec-compliant error structures follow in Article 5 when we extend the RequestContext.

The mock inference functions use Swift overloading: both are named mockResponse(for:), and the compiler distinguishes them by parameter type.

private let mockSuffix = "\n\n(Mock response from swift-mlx-gateway. Article 3 will wire up the MLX backend.)"

private func lastUserText(in messages: [AnthropicMessage]) -> String {
    for message in messages.reversed() where message.role == .user {
        return message.content.asText
    }
    return ""
}

private func lastUserText(in messages: [ChatMessage]) -> String {
    for message in messages.reversed() where message.role == .user {
        return message.content
    }
    return ""
}

private func mockResponse(for request: MessageRequest) -> MessageResponse {
    let echo = lastUserText(in: request.messages)
    let text = "Echo: \(echo)" + mockSuffix
    let id = "msg_" + String(UUID().uuidString.replacingOccurrences(of: "-", with: "").lowercased().prefix(24))
    return MessageResponse(
        id: id,
        type: "message",
        role: .assistant,
        content: [AnthropicContentBlock(type: "text", text: text)],
        model: request.model,
        stopReason: "end_turn",
        stopSequence: nil,
        usage: AnthropicUsage(
            inputTokens: estimateTokens(echo),
            outputTokens: estimateTokens(text)
        )
    )
}

private func mockResponse(for request: ChatCompletionRequest) -> ChatCompletionResponse {
    let echo = lastUserText(in: request.messages)
    let text = "Echo: \(echo)" + mockSuffix
    let id = "chatcmpl-" + String(UUID().uuidString.prefix(8).lowercased())
    let promptTokens = estimateTokens(echo)
    let completionTokens = estimateTokens(text)
    return ChatCompletionResponse(
        id: id,
        object: "chat.completion",
        created: Int(Date().timeIntervalSince1970),
        model: request.model,
        choices: [
            ChatCompletionChoice(
                index: 0,
                message: ChatMessage(role: .assistant, content: text),
                finishReason: "stop"
            )
        ],
        usage: ChatCompletionUsage(
            promptTokens: promptTokens,
            completionTokens: completionTokens,
            totalTokens: promptTokens + completionTokens
        )
    )
}

private func estimateTokens(_ text: String) -> Int {
    max(1, text.count / 4)
}

estimateTokens is a rough heuristic — roughly one token per four characters. That is sufficient for the mock response; Article 3 replaces it with accurate token counts from the MLX backend.

Testing with curl

Start the server:

swift run gateway

Anthropic endpoint:

curl -s -X POST localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-3-5-sonnet","max_tokens":256,"messages":[{"role":"user","content":"Hello World"}]}' | jq .

{
  "id": "msg_02acd2c6af7943fd8f623aa5",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Echo: Hello World\n\n(Mock response from swift-mlx-gateway. Article 3 will wire up the MLX backend.)"
    }
  ],
  "model": "claude-3-5-sonnet",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": { "input_tokens": 2, "output_tokens": 24 }
}

OpenAI endpoint:

curl -s -X POST localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello World"}]}' | jq .

{
  "id": "chatcmpl-c7573297",
  "object": "chat.completion",
  "created": 1778786375,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Echo: Hello World\n\n(Mock response ...)" },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 2, "completion_tokens": 24, "total_tokens": 26 }
}

The block-array content format works too:

curl -s -X POST localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"x","max_tokens":256,"messages":[{"role":"user","content":[{"type":"text","text":"Block form"}]}]}' | jq .

Validation errors return 400:

# Empty messages array
curl -s -i -X POST localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"x","max_tokens":256,"messages":[]}'
# HTTP/1.1 400 Bad Request
# {"error":{"message":"messages array must not be empty"}}

# max_tokens zero
curl -s -i -X POST localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"x","max_tokens":0,"messages":[{"role":"user","content":"hi"}]}'
# HTTP/1.1 400 Bad Request
# {"error":{"message":"max_tokens must be greater than 0"}}

Connecting Claude Code

Claude Code respects ANTHROPIC_BASE_URL and routes all API requests to the given base URL:

export ANTHROPIC_BASE_URL=http://localhost:8080
claude

After starting claude, every request lands at our local gateway. The mock response echoes the input.

This confirms the protocol handshake works; the response is not useful yet — the real model arrives in Article 3.

Commit and Tag

git add .
git commit -m "article-02: Anthropic Messages API & OpenAI Chat Completions"
git tag article-02
git push origin main --tags

Two Files, Two Protocols, One Backend

File	Content
`Sources/gateway/App.swift`	unchanged
`Sources/gateway/Application+build.swift`	unchanged
`Sources/gateway/Router+build.swift`	two POST endpoints, validation, mock inference
`Sources/gateway/Models/ModelTypes.swift`	unchanged
`Sources/gateway/Models/Anthropic.swift`	request, response, content enum, usage
`Sources/gateway/Models/OpenAI.swift`	request, response, choice, usage

The Codable types are now the contract between the gateway layer and the backend. Article 3 brings the MLX backend; the mockResponse functions will be replaced by a real MLXClient that returns the same types.

Sources

Anthropic Messages API Reference, docs.anthropic.com
OpenAI Chat Completions Reference, platform.openai.com
Hummingbird Documentation, docs.hummingbird.codes
Claude Code: Bedrock, Vertex & Proxy Configuration, code.claude.com