Editing that works: constrained output instead of tool guessing

Article 7 · Series: A Local Coding Agent with apfel

The eval in Article 6 left an uncomfortable finding. Even the smallest file edit, renaming a function, fails with the local model in the majority vote (0 of 9 across three single-file edit tasks; own measurement v0.6). Not because the model cannot do the code. It formulates the correct new code as text without trouble. It just does not get it through the writing tool. This article builds the agent not around a stronger model but around exactly that weakness. By the end the same model edits reliably, and we can say in numbers how far that carries and where the model’s own limit begins. The state is frozen as tag v0.7.

Why free tool calling fails at editing

A naive edit demands four things of the model at once. It has to pick the right tool (write_file), hit the right keys in the argument object (path, content), reproduce the full new file content without error, and escape all of it as valid JSON. Each one alone the model handles. In combination it breaks.

An isolated measurement shows the pattern. We give the model the file content directly in the prompt, simulating the preceding read_file, and ask only for the single write_file call with the new content. Even in this shortened form it rarely succeeds (5 of 30 across six edit tasks, five runs each; own measurement v0.7). Sometimes the model invents a key that does not exist, sometimes it delivers the code as prose instead of a tool call, sometimes the JSON is broken. Operating the tool while formulating correctly at the same time is the real hurdle, not the formulating alone.

Why the iterative loop does not help

The obvious reflex is an agent loop. Instead of a single attempt, you let the model plan, act, observe the result and correct. When a tool call returns an error, the model sees it on the next step and fixes it.

We built that loop, plan, act, observe, with the tool results fed back. It does not raise the edit rate. The model does not systematically correct itself after the error feedback, it repeats the same faulty call or falls into another, equally wrong form. More rounds does not mean more hits when every single round fails on the same simultaneity. The loop stays in the repo as a documented dead end. The solution is not in repeating the attempt but in making the attempt easier.

Constrained output as the lever

The apfel serve mode supports response_format with a JSON schema. That changes the rules. Instead of hoping the model hits the right keys, we prescribe the schema, and the server forces the answer into it. Invented keys disappear because they do not fit the schema. With exact context in the prompt, placeholders disappear too, because the schema demands a concrete value.

In the client this is one extra type and one extra field on the request:

public struct ResponseFormat: Codable, Sendable, Equatable {
    public let type: String          // always "json_schema"
    public let jsonSchema: NamedSchema

    public struct NamedSchema: Codable, Sendable, Equatable {
        public let name: String
        public let schema: JSONSchema
    }
}

With that, the hard edit decision splits into two small, forced steps, each of which the model finds easy on its own. That is exactly EditFlow.

EditFlow, stage 1: which file, which change

The first stage settles what to do without touching the file content. The model gets the task and the real list of files in the working directory and fills three forced fields: the path, the instruction in its own words, and the kind of operation.

let prompt = """
Task: \(task)
Files in the working directory: \(listing.joined(separator: ", "))
Name the file (path), restate the change (instruction). Also set operation: \
use insert ONLY when a brand-new standalone line is added (a comment line, an \
import line). Use replace for everything that changes existing code.
"""
guard let json = try await structuredComplete(
        [ChatMessage(role: "user", content: prompt)], Self.pickFormat),
      let pick = try? JSONDecoder().decode(Pick.self, from: Data(json.utf8)) else {
    return "Error: could not determine which file to edit."
}
let resolved = Self.resolvePath(pick.path, in: listing) ?? pick.path

The instruction the model hits reliably. The path not always. It likes to decorate it with invented directories, /home/example/greet.swift, although the file is simply called greet.swift. We do not follow that path blindly. We know the real file list and map the model path back onto a real file by its basename:

static func resolvePath(_ modelPath: String, in listing: [String]) -> String? {
    if listing.contains(modelPath) { return modelPath }
    let base = (modelPath as NSString).lastPathComponent
    return listing.first { ($0 as NSString).lastPathComponent == base }
}

The invented directory prefix falls away, the basename remains, and the operation lands on the right file. This resolution is pure, deterministic program code and is covered by its own tests.

Stage 2 for substitutions: old_string, new_string

Once the file is fixed, the program reads its content and asks the second forced question. For a substitution, replacing existing text, the schema is {old_string, new_string}. The prompt hands in the exact file content as ordinary prose, not as a code block, because Apple’s safety filters block code blocks and numbered lines in chat prompts but let plain text between them through.

let prompt = """
The file currently contains these lines:
\(content)
Apply this change: \(arg.instruction). \
Give old_string (the exact text to find in the file) and new_string (the replacement).
"""
guard let json = try await structuredComplete(
        [ChatMessage(role: "user", content: prompt)], Self.editFormat),
      let spec = try? JSONDecoder().decode(EditSpec.self, from: Data(json.utf8)) else {
    return "Error: could not derive a concrete edit for \(arg.path)"
}
guard !spec.oldString.isEmpty, content.contains(spec.oldString) else {
    return "Error: the text to change was not found. Try a more specific instruction."
}
let updated = content.replacingOccurrences(of: spec.oldString, with: spec.newString)

Before anything is written, the program checks that old_string is not empty and actually occurs in the file content. If not, no write is attempted, and the error message doubles as a hint for another try. On success the change goes through the diff and the confirmation gate from Article 5 before it lands on disk.

For substitutions this works through and through. Renaming a function, changing a number, swapping a string, all three succeed fully across five runs each (15 of 15; own measurement v0.7). What was 0 of 9 naively has become a reliable operation, and the model never once operated a tool. It only filled slots.

The insertion wall and the anchor-preserving primitive

One class of changes stays stubborn at first. Inserting a new line above a function, putting an import at the top of the file, both fail completely (0 of 6; own measurement v0.7). The reason lies in the primitive, that is, in the elementary editing operation itself that we have worked with so far. To prepend a line, the model would have to repeat the anchor text in new_string, the existing line plus the new one above it. That is exactly what it fails to do. It drops the anchor and produces func /// Doc, the function is destroyed. This is not a bug in applying the change. The apply step does exactly what the {old, new} says. The model just picks the wrong thing.

The fix is a separate, anchor-preserving primitive. Anchor-preserving means the existing anchor line can no longer be lost, because the program keeps it and asks the model only for the new text. When inserting, the model thus names only an existing anchor and the new text, and the program places the text on its own line before or after it:

public static func apply(content: String, anchor: String,
                         position: Position, text: String) -> String? {
    let trimmedAnchor = anchor.trimmingCharacters(in: .whitespacesAndNewlines)
    guard !trimmedAnchor.isEmpty else { return nil }

    var lines = content.components(separatedBy: "\n")
    let hadTrailingNewline = content.hasSuffix("\n")
    if hadTrailingNewline, lines.last == "" { lines.removeLast() }

    guard let idx = lines.firstIndex(where: { $0.contains(trimmedAnchor) }) else { return nil }
    let insertAt = position == .before ? idx : idx + 1
    lines.insert(text, at: insertAt)

    return lines.joined(separator: "\n") + (hadTrailingNewline ? "\n" : "")
}

If the program does not find the anchor, it returns nil and rejects the change rather than silently mangling the file. Which of the two stage-2 schemas applies is decided by the operation from stage 1:

if pick.operation?.lowercased() == "insert" {
    return try await InsertFileTool(sandbox: sandbox, gate: gate,
                                    structuredComplete: structuredComplete).call(argsData)
}
return try await EditFileTool(sandbox: sandbox, gate: gate,
                              structuredComplete: structuredComplete).call(argsData)

With the anchor-preserving primitive the same insertions succeed fully (10 of 10; own measurement v0.7), without the clean substitutions suffering. The same principle shows for the second time here: where the model would have to do “keep the anchor and add something”, it fails; where the program enforces the anchor, it succeeds.

The model’s limit

One operation resists even this design. Turning a function signature async, making func load() async -> Int out of func load() -> Int, succeeds in only one of five runs (own measurement v0.7). This time the reason is not the tooling. For make load async the model delivers the edit old_string: "load", new_string: "async" and thereby renames the function to async instead of inserting the keyword. It confuses “make the function load asynchronous” with “rename load to async”. Even the explicit instruction “keep the name load” does not change that.

That is the model’s own ceiling, its coding judgment, and no agent scaffold raises it. The proof is in the cloud comparison from Article 6: the same async that fails the local model succeeds easily on a frontier model (cloud sample, Sonnet 4.6). With that, two failure sources separate cleanly. The mechanics of tool use we solved with constrained output. The coding judgment is a property of the model that a larger model has and a smaller one does not. An honest agent does not hide that limit, it makes it visible.

A side finding worth noting, important for the whole series: we tried to have the operation classified in a separate question, detached from the task. That meta-question is hard-blocked by Apple’s safety filters. Only folded into the concrete task of stage 1 does the same classification go through. The guardrails react not to the content but to the form of the request, a topic we will return to later.

Demo repo: apfel-coding-agent v0.7

The state of this article is frozen as tag v0.7: https://codeberg.org/rotecodefraktion/apfel-coding-agent/src/tag/v0.7

Try the edit workflow

Check out the tag:

git clone https://codeberg.org/rotecodefraktion/apfel-coding-agent.git
cd apfel-coding-agent
git checkout v0.7

New in v0.7 over v0.6:

Sources/AgentCore/Agent/EditFlow.swift — the two-stage workflow, resolvePath
Sources/AgentCore/Agent/Insertion.swift — anchor-preserving insertion
Sources/AgentCore/Tools/EditFileTool.swift — constrained replace stage
Sources/AgentCore/Tools/InsertFileTool.swift — constrained insert stage
Sources/AgentCore/Client/ — ResponseFormat and complete(…, responseFormat:)
--edit in the CLI

Build, test, run an edit (a running apfel serve on its own port, since Ollama takes the default):

swift build
swift test                        # offline, no apfel needed
apfel --serve --port 11509 &
printf 'func processItem(_ x: Int) -> Int { x * 2 }\n' > /tmp/work/sample.swift
swift run apfel-agent --edit --workdir /tmp/work \
  --base-url http://127.0.0.1:11509 \
  "In sample.swift, rename the function processItem to handleItem everywhere."

The unit tests check the deterministic parts offline: the basename resolution and the anchor-preserving insertion. The --edit run shows the workflow against the real model.

The model fills slots, the program does the rest

The agent now edits reliably, and at no point does the model operate a tool or deliver a path the program follows blindly. It fills forced slots, the program does the rest: the file list, the path resolution, applying the change, the diff, the gate. Constrained output turns a task on which the simultaneity overwhelms the model into two small decisions it handles one at a time.

That is the lesson reaching beyond the edit. A local agent does not become good by trusting the model with more, but by distributing the work so the model only takes on the parts it can. What goes beyond that, the coding judgment on ambiguous tasks, stays the model’s limit, and knowing it is part of the build plan, not a flaw. In the next step of the series we take these building blocks and assemble them into longer sequences.

Previous article: The local coding agent put to the eval. Next article: longer sequences from the edit building blocks (placeholder, link finalized when Article 8 is published). Repo tag: v0.7.