Understanding tool calling: from schema to round-trip
Article 4 · Series: A Local Coding Agent with apfel
In Article 3 the connection was in place: prompt in, streamed answer out. That lets the model talk, but not act. The step from chat to agent is tool calling — the model may invoke tools, we run them and feed the result back. In this article we build exactly that mechanism: a tool definition in the OpenAI schema, the round-trip of invocation, execution and continuation, and a small abstraction in Swift that captures a tool as a protocol and a registry. The demo is get_time, a trivial tool with no side effects. Along the way we check how reliably the small model plays along — and run into a few quirks that cannot be derived from the OpenAI standard. The state is frozen as tag v0.4: https://codeberg.org/rotecodefraktion/apfel-coding-agent/src/tag/v0.4
What a tool definition promises the model
To the model, a tool is at first just a description: a name, a sentence of explanation and a JSON schema of its parameters. In the OpenAI format that apfel passes through to the Foundation Model, the definition of get_time looks like this:
{
"type": "function",
"function": {
"name": "get_time",
"description": "Get the current date and time in ISO-8601 format. Takes no arguments.",
"parameters": { "type": "object", "properties": {}, "required": [] }
}
}
That is all the model sees. It knows neither the code behind the tool nor its return value — only the promise that there is a get_time taking no arguments. On the basis of this description it decides whether and how to call the tool. In Swift we model the same structure as Codable types:
public struct ToolDefinition: Codable, Sendable, Equatable {
public let type: String // always "function"
public let function: FunctionDefinition
}
public struct FunctionDefinition: Codable, Sendable, Equatable {
public let name: String
public let description: String
public let parameters: JSONSchema
}
public struct JSONSchema: Codable, Sendable, Equatable {
public let type: String
public let properties: [String: Property]
public let required: [String]
}
We keep JSONSchema deliberately minimal: object type, typed properties, a required list. For get_time the properties are empty. That is enough for now; richer schemas arrive when a tool needs them — the real file and shell tools in Article 5.
The round-trip: invocation, execution, continuation
Tool calling is not a single request but a small choreography across two calls:
- We send the conversation plus tool definitions to
/v1/chat/completions. - Instead of a text answer, an assistant message comes back with
tool_callsandfinish_reason: "tool_calls". Thecontentis thennull. - We run each tool call and append the result as a
role: "tool"message, linked via thetool_call_id. - We send the extended conversation again. Now the model answers with text — the final answer.
How exactly apfel returns the tool call, we read off the actual response rather than deriving it from the OpenAI standard. apfel’s raw response to the get_time request:
{
"choices": [{
"finish_reason": "tool_calls",
"message": {
"content": null,
"role": "assistant",
"tool_calls": [{
"id": "call_1",
"type": "function",
"function": {
"name": "get_time",
"arguments": "{\"current_time\": \"2023-10-29T15:48:30.567Z\"}"
}
}]
}
}]
}
Two details stand out. First, function.arguments is a string, not an object — more on that shortly. Second, the model invented a current_time argument even though get_time’s schema has no parameters at all. That is not a fluke but a trait of the small model, one that occupies us later.
The tool as a protocol
So the round-trip does not need bespoke code per tool, we capture a tool as a protocol. The signature is deliberately aligned with the wire format — what actually goes over HTTP — not with Swift convenience:
public protocol Tool: Sendable {
var name: String { get }
var description: String { get }
var parametersSchema: JSONSchema { get }
func call(_ arguments: Data) async throws -> String
}
Data in, because the model delivers the arguments as a JSON string — each tool decodes them itself and is therefore also responsible for catching broken arguments. String out, because the result goes back as the content of the role: tool message. We derive the tool definition from the tool itself, so schema and implementation cannot drift apart:
extension Tool {
public var definition: ToolDefinition {
ToolDefinition(function: FunctionDefinition(
name: name, description: description, parameters: parametersSchema
))
}
}
The registry: looking tools up and offering them
The registry has exactly two jobs, one per direction of the round-trip. On the way out it collects all definitions for the request; on the way back it looks up the tool the model named and runs it.
public struct ToolRegistry: Sendable {
private var tools: [String: any Tool]
public var definitions: [ToolDefinition] {
tools.values.map(\.definition)
}
public func dispatch(name: String, arguments: Data) async throws -> String {
guard let tool = tools[name] else {
throw ToolError.unknownTool(name)
}
return try await tool.call(arguments)
}
}
An unknown name becomes a typed ToolError, not a crash. That is the first line of defence against hallucinated tool names — and it is needed as soon as the model makes up a name it was never offered.
arguments is a string, not an object
The obvious assumption is that a tool call is a function invocation with ready-made object arguments. In fact function.arguments arrives as a JSON string in the response body — a string that must be parsed first, and on the small model this string is not guaranteed valid against the schema. We saw above that get_time with no parameters got back a current_time argument. Treat the string as an object unchecked, or pass it straight through, and you get crashes or silently wrong calls.
So arguments stays a raw string in our ToolCall type, and decoding is the tool’s job — the tool may reject the arguments or, like get_time, simply ignore them:
public struct ToolCall: Codable, Sendable, Equatable {
public let id: String
public let type: String
public let function: FunctionCall
public let index: Int? // set only when streamed
public struct FunctionCall: Codable, Sendable, Equatable {
public let name: String
public let arguments: String // raw JSON string
}
}
The demo tool get_time
get_time is deliberately trivial: no parameters, no side effects, a predictable result. It shows the round-trip without the safety machinery that writing tools will need. We inject the clock so the tool stays testable:
public struct GetTimeTool: Tool {
public let name = "get_time"
public let description = "Get the current date and time in ISO-8601 format. Takes no arguments."
public let parametersSchema = JSONSchema()
private let now: @Sendable () -> Date
public func call(_ arguments: Data) async throws -> String {
// The model sometimes sends arguments even though the schema is empty.
// We ignore them: get_time takes none.
let payload = ["time": ISO8601DateFormatter().string(from: now())]
return String(decoding: try JSONEncoder().encode(payload), as: UTF8.self)
}
}
The round-trip itself lives in its own type. It performs exactly one pass — not a loop. If the model calls a tool again after the results, that second round is not executed here; the full plan/act/observe loop is Article 7.
public func run(_ messages: [ChatMessage]) async throws -> Result {
var conversation = messages
let first = try await complete(conversation, toolChoice)
guard let calls = first.choices.first?.message.toolCalls, !calls.isEmpty else {
// The model answered directly; no tool needed.
return Result(finalContent: first.choices.first?.message.content, toolCalls: [])
}
conversation.append(ChatMessage(assistantToolCalls: calls))
for call in calls {
conversation.append(ChatMessage(toolCallID: call.id, content: await result(for: call)))
}
let final = try await complete(conversation, nil)
return Result(finalContent: final.choices.first?.message.content, toolCalls: calls)
}
A failing or unknown tool comes back as a result, not a thrown error — so the model gets a chance to recover instead of the round-trip aborting:
private func result(for call: ToolCall) async -> String {
do {
return try await registry.dispatch(name: call.function.name,
arguments: Data(call.function.arguments.utf8))
} catch {
let payload = ["error": String(describing: error)]
return (try? String(decoding: JSONEncoder().encode(payload), as: UTF8.self))
?? #"{"error":"tool failed"}"#
}
}
A round-trip against the real model
The CLI gets a --tools path that stocks the registry with get_time and triggers the round-trip. The tool calls go to stderr so that stdout carries only the final answer:
$ swift run apfel-agent --tools "What time is it right now? Use the get_time tool."
→ tool call: get_time({"current_time": "2025-02-02T14:34:56.789Z"})
The current time is June 6, 2026, at 9:12 PM UTC.
The whole arc is visible here: the model calls get_time (with an invented argument that get_time ignores), gets the real time back as the result and formulates an answer from it in natural language. The round-trip stands.
Why an abstraction instead of a special case
Protocol, registry and derived definition are justified as docs/adr/002-tool-abstraktion.md in the repo. The core: the definition is generated from the tool, not maintained as a second, separate record — schema and code cannot drift. Schema encoding, tool-call decoding and dispatch are tested offline against recorded fixtures and a scripted fake backend, with no running apfel. And because arguments are treated as an untrusted string and tool failures are returned as results, the round-trip holds even when the model works sloppily. What exactly “works sloppily” means, we determined empirically.
Where the small model wobbles
Tool calling works — but unreliably, and in a way we only found out by measuring. We ran get_time against apfel 1.5.1 repeatedly and counted how often the model actually makes the tool call.
| Variant | tool-call rate |
|---|---|
directive prompt, no tool_choice | 12/15 |
directive prompt, tool_choice: "auto" | 6/15 |
neutral prompt, no tool_choice | 2/10 |
directive prompt, tool_choice: "required" | 1/15 |
Source: own data, apfel 1.5.1, 2026-06-06, scripts/tool-choice-experiment.sh.
Three findings stand out. First: even when we name the tool explicitly, the model calls it in only about four of five runs — without explicit naming, far less often. Second, and counterintuitively: tool_choice: "required", which in the OpenAI standard forces a tool call, does the opposite on apfel’s Foundation Model. The model refuses:
$ # request with tool_choice: "required"
"content": "I'm sorry, but I can't assist with that.", "finish_reason": "stop"
Third, omitting tool_choice clearly beats the explicit value "auto" (12/15 against 6/15), even though both nominally mean the same thing. From these findings follows a concrete decision: our agent leaves tool_choice off. The mechanism stays in the code for later articles, but the demo forces nothing.
There is a third failure type beyond the missing call and the hallucinated argument. In one run the model called get_time correctly, got the real time back — and still answered “It seems there was an error retrieving the current time.” The tool ran without fault; the model misread its result. This is the kind of behaviour that separates a local 3-billion-parameter agent from a cloud model — and the reason Article 6 evaluates the performance limit systematically rather than glossing over it.
Demo repo: apfel-coding-agent v0.4
The state of this article is frozen as tag v0.4: https://codeberg.org/rotecodefraktion/apfel-coding-agent/src/tag/v0.4
Setting up the demo repo apfel-coding-agent v0.4
Clone (if you haven’t already) and check out the tag:
git clone https://codeberg.org/rotecodefraktion/apfel-coding-agent.git
cd apfel-coding-agent
git checkout v0.4
New in v0.4 over v0.3:
Sources/AgentCore/Tools/—Toolprotocol,ToolRegistry, schema types,GetTimeTool,ToolRoundTripSources/AgentCore/Client/ChatModels.swift— extended withtools,tool_calls,role: toolandtool_choiceSources/apfel-agent/AgentCommand.swift— new--toolspathdocs/adr/002-tool-abstraktion.md— the tool abstractionscripts/smoke-tool.sh— end-to-end test of the round-tripscripts/tool-choice-experiment.sh— reproduces the tool-call rate
Build, test, run:
swift build
swift test # offline, no apfel needed
swift run apfel-agent --tools "What time is it? Use the get_time tool."
The unit tests run without apfel. The end-to-end test and the experiment need a running apfel serve:
./scripts/smoke-tool.sh
./scripts/tool-choice-experiment.sh
The tool call is unreliable, so the smoke test retries and turns green as soon as one run completes the round-trip. Prerequisites are in docs/setup.md.
Pitfalls from the build
arguments is a string, not an object. This is the most common misconception. Expect a [String: Any] in your Codable model and decoding fails. The value is a JSON string inside the JSON, and it must be treated, decoded and validated as such.
Consume tool calls non-stream. In the stream apfel delivers the tool call twice: first as raw delta.content text fragments, then as a single bundled delta.tool_calls chunk at the end. Collect only delta.content in the stream and you mistake the raw tool-call JSON for the answer. For the round-trip we take the non-stream response; it is simpler and unambiguous.
tool_choice: "required" forces nothing. Unlike the OpenAI standard suggests, the Foundation Model refuses under required instead of calling a tool. Try to force reliability through this parameter and you get the opposite.
Omitting tool_choice beats "auto". Setting the default explicitly measurably worsens the rate. When in doubt, leave the parameter off entirely.
What comes next
get_time is harmless: no arguments, no side effects. Article 5 takes on the first real tools — read_file, list_dir, write_file, run_shell. With them the agent changes something outside itself for the first time, and that is exactly where we need what get_time does not: a path sandbox, confirmation gates before writing actions, and showing diffs before the human decides. The tool abstraction from this article carries all of it — the new tools are just more Tool implementations in the same registry.
Previous article: The Swift client: first connection to the model. Next article: The first real tools: file system and shell. Repo tag: v0.4.