Two Protocols, One Backend — Anthropic and OpenAI with Codable
In Article 1, our server answered /healthz and a static /v1/models endpoint. Now we turn the skeleton into a gateway that speaks two real LLM API standards: the Anthropic Messages API for Claude Code, and the OpenAI Chat Completions API for Cursor, Aider, and other tools. Both endpoints return mock responses for now; the real MLX backend arrives in Article 3.
The Two Formats Compared
Both protocols follow the same basic idea: a messages array with role and content, plus control parameters like model and temperature. The differences are in the details.
| Field | Anthropic | OpenAI |
|---|---|---|
system | Top-level string | Message with role: "system" |
max_tokens | Required | Optional |
content in request | String or block array | String |
content in response | Array of content blocks | String in choices[i].message.content |
| Token fields | input_tokens, output_tokens | prompt_tokens, completion_tokens, total_tokens |
| Finish field | stop_reason | finish_reason |
| Response wrapper | Direct | choices[] |
The biggest gotcha is the system prompt: Anthropic uses a dedicated top-level field, OpenAI uses a regular message with role: "system" in the array. The gateway needs to know both conventions.
We create two new files: Sources/gateway/Models/Anthropic.swift and Sources/gateway/Models/OpenAI.swift.
Codable Types for Anthropic
The most interesting part of the Anthropic spec is the content field in messages: it can be a plain string or an array of content blocks. Claude Code typically sends strings; other clients sometimes send structured blocks. Our gateway needs to accept both.
We use a Swift enum with a custom init(from:):
// Models/Anthropic.swift
import Foundation
import Hummingbird
enum AnthropicRole: String, Codable {
case user
case assistant
}
enum AnthropicContent: Codable {
case text(String)
case blocks([AnthropicContentBlock])
init(from decoder: Decoder) throws {
let container = try decoder.singleValueContainer()
if let string = try? container.decode(String.self) {
self = .text(string)
return
}
if let blocks = try? container.decode([AnthropicContentBlock].self) {
self = .blocks(blocks)
return
}
throw DecodingError.typeMismatch(
AnthropicContent.self,
.init(
codingPath: decoder.codingPath,
debugDescription: "content must be a string or an array of content blocks"
)
)
}
func encode(to encoder: Encoder) throws {
var container = encoder.singleValueContainer()
switch self {
case .text(let string):
try container.encode(string)
case .blocks(let blocks):
try container.encode(blocks)
}
}
var asText: String {
switch self {
case .text(let string):
return string
case .blocks(let blocks):
return blocks.map(\.text).joined(separator: "\n")
}
}
}
struct AnthropicContentBlock: Codable {
let type: String
let text: String
}
struct AnthropicMessage: Codable {
let role: AnthropicRole
let content: AnthropicContent
}
struct MessageRequest: Codable {
let model: String
let maxTokens: Int
let system: AnthropicContent?
let messages: [AnthropicMessage]
let stream: Bool?
let temperature: Double?
let topP: Double?
let stopSequences: [String]?
enum CodingKeys: String, CodingKey {
case model
case maxTokens = "max_tokens"
case system
case messages
case stream
case temperature
case topP = "top_p"
case stopSequences = "stop_sequences"
}
}
struct MessageResponse: Codable, ResponseGenerator {
let id: String
let type: String
let role: AnthropicRole
let content: [AnthropicContentBlock]
let model: String
let stopReason: String
let stopSequence: String?
let usage: AnthropicUsage
enum CodingKeys: String, CodingKey {
case id, type, role, content, model
case stopReason = "stop_reason"
case stopSequence = "stop_sequence"
case usage
}
public func response(from request: Request, context: some RequestContext) throws -> Response {
let encoder = JSONEncoder()
let data = try encoder.encode(self)
return Response(
status: .ok,
headers: [.contentType: "application/json"],
body: .init(byteBuffer: .init(data: data))
)
}
}
struct AnthropicUsage: Codable {
let inputTokens: Int
let outputTokens: Int
enum CodingKeys: String, CodingKey {
case inputTokens = "input_tokens"
case outputTokens = "output_tokens"
}
}
Three observations about this code.
singleValueContainer instead of a keyed container, because the content field at that position in the JSON is not a key-value pair but a bare value or array. The try? pattern attempts the simpler case first (string), fails silently if that does not match, then tries blocks. Only when both fail does it throw a controlled error.
asText on the enum is convenience for processing: regardless of which format the content arrived in, we can always extract it as a string. The mock inference relies on this.
In the response, content is always [AnthropicContentBlock], not the request-side enum. The Anthropic spec is flexible about request format and strict about response format; we mirror that exactly.
Codable Types for OpenAI
The OpenAI types are structurally similar but consistently simpler: content is always a string, system comes as a regular message, and most parameters are optional.
// Models/OpenAI.swift
import Foundation
import Hummingbird
enum ChatRole: String, Codable {
case system
case user
case assistant
case tool
}
struct ChatMessage: Codable {
let role: ChatRole
let content: String
}
struct ChatCompletionRequest: Codable {
let model: String
let messages: [ChatMessage]
let maxTokens: Int?
let temperature: Double?
let topP: Double?
let stream: Bool?
let stop: [String]?
let presencePenalty: Double?
let frequencyPenalty: Double?
let user: String?
enum CodingKeys: String, CodingKey {
case model, messages, temperature, stream, stop, user
case maxTokens = "max_tokens"
case topP = "top_p"
case presencePenalty = "presence_penalty"
case frequencyPenalty = "frequency_penalty"
}
}
struct ChatCompletionResponse: Codable, ResponseGenerator {
let id: String
let object: String
let created: Int
let model: String
let choices: [ChatCompletionChoice]
let usage: ChatCompletionUsage
public func response(from request: Request, context: some RequestContext) throws -> Response {
let encoder = JSONEncoder()
let data = try encoder.encode(self)
return Response(
status: .ok,
headers: [.contentType: "application/json"],
body: .init(byteBuffer: .init(data: data))
)
}
}
struct ChatCompletionChoice: Codable {
let index: Int
let message: ChatMessage
let finishReason: String
enum CodingKeys: String, CodingKey {
case index, message
case finishReason = "finish_reason"
}
}
struct ChatCompletionUsage: Codable {
let promptTokens: Int
let completionTokens: Int
let totalTokens: Int
enum CodingKeys: String, CodingKey {
case promptTokens = "prompt_tokens"
case completionTokens = "completion_tokens"
case totalTokens = "total_tokens"
}
}
ChatRole includes .tool alongside system, user, and assistant — for future function-calling extensions that are outside the scope of this series, but cleanly modelled now. maxTokens is optional; OpenAI has a server-side default, Anthropic does not.
Routes and Mock Inference
The two new POST endpoints go into Router+build.swift. request.decode(as:context:) is Hummingbird’s standard path: BasicRequestContext comes with a JSON decoder that maps the request body directly into our Codable types.
// Additions in Router+build.swift
// Anthropic Messages API
router.post("v1/messages") { request, context -> MessageResponse in
let payload = try await request.decode(as: MessageRequest.self, context: context)
try validate(payload)
return mockResponse(for: payload)
}
// OpenAI Chat Completions
router.post("v1/chat/completions") { request, context -> ChatCompletionResponse in
let payload = try await request.decode(as: ChatCompletionRequest.self, context: context)
try validate(payload)
return mockResponse(for: payload)
}
Validation checks the minimum requirements of both specs:
private func validate(_ request: MessageRequest) throws {
guard !request.messages.isEmpty else {
throw HTTPError(.badRequest, message: "messages array must not be empty")
}
guard request.maxTokens > 0 else {
throw HTTPError(.badRequest, message: "max_tokens must be greater than 0")
}
}
private func validate(_ request: ChatCompletionRequest) throws {
guard !request.messages.isEmpty else {
throw HTTPError(.badRequest, message: "messages array must not be empty")
}
}
HTTPError(.badRequest, message:) serialises in Hummingbird to {"error":{"message":"..."}}. That does not match the exact Anthropic or OpenAI error format; spec-compliant error structures follow in Article 5 when we extend the RequestContext.
The mock inference functions use Swift overloading: both are named mockResponse(for:), and the compiler distinguishes them by parameter type.
private let mockSuffix = "\n\n(Mock response from swift-mlx-gateway. Article 3 will wire up the MLX backend.)"
private func lastUserText(in messages: [AnthropicMessage]) -> String {
for message in messages.reversed() where message.role == .user {
return message.content.asText
}
return ""
}
private func lastUserText(in messages: [ChatMessage]) -> String {
for message in messages.reversed() where message.role == .user {
return message.content
}
return ""
}
private func mockResponse(for request: MessageRequest) -> MessageResponse {
let echo = lastUserText(in: request.messages)
let text = "Echo: \(echo)" + mockSuffix
let id = "msg_" + String(UUID().uuidString.replacingOccurrences(of: "-", with: "").lowercased().prefix(24))
return MessageResponse(
id: id,
type: "message",
role: .assistant,
content: [AnthropicContentBlock(type: "text", text: text)],
model: request.model,
stopReason: "end_turn",
stopSequence: nil,
usage: AnthropicUsage(
inputTokens: estimateTokens(echo),
outputTokens: estimateTokens(text)
)
)
}
private func mockResponse(for request: ChatCompletionRequest) -> ChatCompletionResponse {
let echo = lastUserText(in: request.messages)
let text = "Echo: \(echo)" + mockSuffix
let id = "chatcmpl-" + String(UUID().uuidString.prefix(8).lowercased())
let promptTokens = estimateTokens(echo)
let completionTokens = estimateTokens(text)
return ChatCompletionResponse(
id: id,
object: "chat.completion",
created: Int(Date().timeIntervalSince1970),
model: request.model,
choices: [
ChatCompletionChoice(
index: 0,
message: ChatMessage(role: .assistant, content: text),
finishReason: "stop"
)
],
usage: ChatCompletionUsage(
promptTokens: promptTokens,
completionTokens: completionTokens,
totalTokens: promptTokens + completionTokens
)
)
}
private func estimateTokens(_ text: String) -> Int {
max(1, text.count / 4)
}
estimateTokens is a rough heuristic — roughly one token per four characters. That is sufficient for the mock response; Article 3 replaces it with accurate token counts from the MLX backend.
Testing with curl
Start the server:
swift run gateway
Anthropic endpoint:
curl -s -X POST localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-d '{"model":"claude-3-5-sonnet","max_tokens":256,"messages":[{"role":"user","content":"Hello World"}]}' | jq .
{
"id": "msg_02acd2c6af7943fd8f623aa5",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Echo: Hello World\n\n(Mock response from swift-mlx-gateway. Article 3 will wire up the MLX backend.)"
}
],
"model": "claude-3-5-sonnet",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": { "input_tokens": 2, "output_tokens": 24 }
}
OpenAI endpoint:
curl -s -X POST localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello World"}]}' | jq .
{
"id": "chatcmpl-c7573297",
"object": "chat.completion",
"created": 1778786375,
"model": "gpt-4",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Echo: Hello World\n\n(Mock response ...)" },
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 2, "completion_tokens": 24, "total_tokens": 26 }
}
The block-array content format works too:
curl -s -X POST localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-d '{"model":"x","max_tokens":256,"messages":[{"role":"user","content":[{"type":"text","text":"Block form"}]}]}' | jq .
Validation errors return 400:
# Empty messages array
curl -s -i -X POST localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-d '{"model":"x","max_tokens":256,"messages":[]}'
# HTTP/1.1 400 Bad Request
# {"error":{"message":"messages array must not be empty"}}
# max_tokens zero
curl -s -i -X POST localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-d '{"model":"x","max_tokens":0,"messages":[{"role":"user","content":"hi"}]}'
# HTTP/1.1 400 Bad Request
# {"error":{"message":"max_tokens must be greater than 0"}}
Connecting Claude Code
Claude Code respects ANTHROPIC_BASE_URL and routes all API requests to the given base URL:
export ANTHROPIC_BASE_URL=http://localhost:8080
claude
After starting claude, every request lands at our local gateway. The mock response echoes the input.
This confirms the protocol handshake works; the response is not useful yet — the real model arrives in Article 3.
Commit and Tag
git add .
git commit -m "article-02: Anthropic Messages API & OpenAI Chat Completions"
git tag article-02
git push origin main --tags
Two Files, Two Protocols, One Backend
| File | Content |
|---|---|
Sources/gateway/App.swift | unchanged |
Sources/gateway/Application+build.swift | unchanged |
Sources/gateway/Router+build.swift | two POST endpoints, validation, mock inference |
Sources/gateway/Models/ModelTypes.swift | unchanged |
Sources/gateway/Models/Anthropic.swift | request, response, content enum, usage |
Sources/gateway/Models/OpenAI.swift | request, response, choice, usage |
The Codable types are now the contract between the gateway layer and the backend. Article 3 brings the MLX backend; the mockResponse functions will be replaced by a real MLXClient that returns the same types.
Sources
- Anthropic Messages API Reference, docs.anthropic.com
- OpenAI Chat Completions Reference, platform.openai.com
- Hummingbird Documentation, docs.hummingbird.codes
- Claude Code: Bedrock, Vertex & Proxy Configuration, code.claude.com