Where the cloud usually sits

Where the cloud usually sits

Article 13 · Series: A Local Coding Agent with apfel

Article 12 ended with a promise: to place the agent where the cloud assistants usually sit, inside Xcode. That is the last step of the series. The occasion for it comes from Apple itself. At WWDC26 the MLX team showed that Xcode can talk to a local, OpenAI-compatible server as a coding provider (source: Apple, WWDC26 session “Run local agentic AI on the Mac using MLX”, 2026-06-08). To put it to the test, whether apfel fits into exactly this slot, we installed macOS 27 and Xcode 27 on purpose. That also gives us the chance to check the moving ground from Article 12, a model Apple can change whenever it likes. So the finale does not begin with a build, it begins with a measurement.

macOS 27 does not move the foundation

The worry from Article 12 was concrete: an agent that runs reliably today can behave differently after the next update, and you only see the movement once it has happened. A version change is the chance to check that instead of guessing. We asked apfel, after the update, what it ships (measured on 20 June 2026 on macOS 27.0, build 26A5353q).

AspectmacOS 26.3 (Article 2)macOS 27.0 (measured)
apfel version1.5.11.5.1
model idapple-foundationmodelapple-foundationmodel
context window4096 tokens4096 tokens
tool calling, constrained outputpresentpresent
serve, OpenAI-compatibleyesyes

apfel --model-info still reports apple-foundationmodel, context: 4096 tokens, framework: FoundationModels (macOS 26+). The server starts, answers, streams. Nothing on the surface our agent depends on has shifted. For our purposes the move from 26.3 to 27 changed nothing.

That is good news, but not a guarantee. Apple can swap the model next time, change the context window, retune the guardrails. We have seen in this series that both are behaviour an update can shift (Articles 7 and 8). That it did not happen this time does not mean it will not happen. It only means we can keep building on the same model we started with.

The slot where the cloud usually sits

Xcode 27 has a place for language models that was not there in earlier years. Under Settings, in the Intelligence tab, you can add chat providers. By default these are the big names, Anthropic and OpenAI, serving their models from the cloud. But Apple has opened the same mechanism for local servers. This is exactly what the WWDC26 session mentioned above demonstrated, register a local, OpenAI-compatible server in Xcode and have it fix bugs in the open project. The server in the demo was MLX-LM. What sits behind it is interchangeable, as long as it speaks the protocol.

apfel-Serve speaks that protocol. That was the decision from Article 2: not a direct framework call, but an OpenAI-compatible endpoint that any SDK expecting one can talk to. That decision pays off now. The “Add a Model Provider” dialog offers a toggle between “Internet Hosted” and “Locally Hosted”. Pick Locally Hosted, and Xcode asks for a port and a description. No API key, no token, no model name. We enter the port apfel listens on and give the entry a name.

There is a trap here that we stumbled into. apfel’s default port is 11434, and that is also Ollama’s default port. Xcode ships an Ollama integration for local models, and it speaks Ollama’s own protocol at /api/chat. You might expect the Locally Hosted slot to hit that same Ollama path on the same port, leaving apfel, which only serves OpenAI paths, with nothing to answer. It does not. The capture at the apfel server shows what Xcode actually calls: POST /v1/chat/completions, in streaming mode, with status 200 (measured, macOS 27.0 / Xcode 27.0, 2026-06-20). The Locally Hosted slot speaks OpenAI, not Ollama. apfel fits in without any change on our side, and because the dialog has no token field, we run apfel without requiring one.

With that, the sovereignty claim of the series is technically delivered. In the field where Xcode otherwise pulls Claude or ChatGPT from the cloud, there now sits a model that runs on your own Mac, costs nothing and gives nothing away. The first short prompt from Xcode’s coding chat goes through, the model answers, the answer appears in the editor.

Connected, but overwhelmed

The wiring is the easy part. The hard part shows on the first serious prompt. We open a Swift file and ask the assistant to explain it. What comes back is not an explanation but a stock phrase: the model lacks context, please provide the code. The capture shows the opposite. The request contains the file, a good six thousand characters of Swift, sent along as context (measured). The model has the code in front of it and answers as if it did not. The request came to 2578 prompt tokens, well below the 4096 window. So it was not too tight. It was too hard.

The reason lies in the system prompt Xcode sends along. It runs to nearly four thousand characters, about a thousand tokens, and demands a multi-step procedure from the model: first reason in prose about which types are missing, then search the project for them, wait for them to be supplied, and only then answer. That is the choreography a large cloud model goes along with. Our Foundation Model, with its reported three billion parameters, does not. It falls back to the generic answer. This is exactly the boundary Article 6 measured: small, local tasks yes, multi-step choreography no.

The second attempt runs into another wall. We start a fresh conversation, leave the file open and again ask for an explanation. This time Xcode sends the whole file, not just an excerpt. The request grows to a good thirty-three thousand characters, far beyond what fits in 4096 tokens. apfel rejects it cleanly, with status 400 and a clear message that Xcode passes through verbatim: “Input exceeds the 4096-token context window. Shorten the conversation history.” (measured).

We have now seen both modes of failure, and they combine into one picture.

CaseWhat Xcode sendsWhat apfel doesCause
Excerpt of the filefits the window (2578 tokens)answers, but with a stock phrasemodel capability
Whole fileexceeds the window (~33000 chars)rejects with 400context window

Neither time is the connection the problem, nor the protocol. apfel answers correctly, once with a reply, once with a clean error. The limit is the model. It is too small for the task Xcode sets a coding assistant, and its window is too narrow for the files that task involves.

Here apfel-Serve answers, not our agent

The Locally Hosted path does bypass something, though. What answers in Xcode is apfel-Serve, the bare Foundation Model behind an OpenAI endpoint. It is not the agent we built across twelve articles. The plan/act/observe loop, the tightly steered tool calling, the constrained editing from Articles 4 to 7, the ContextManager from Article 8, none of that stands between Xcode and the model in this path. Xcode talks to the model directly and brings its own agentic layer, which overwhelms our model.

The path on which our built agent would come into play is the other one, and it runs over the Model Context Protocol. There are two directions to keep apart. In one, Xcode is the MCP client and talks to an external agent that provides tools and capabilities. In the other, our agent is the MCP client and uses the tools Xcode’s own MCP server offers, project access, diagnostics, build information. Both directions exist in Xcode 27, and neither did we test here. They are the open edge of the finale, not a settled point. What we can demonstrate is the direct provider slot. What goes beyond it, the full agent in Xcode over MCP, remains the next step, not this one. One thing MCP does not move, though. Our agent uses the same Foundation Model, so over that path it hits the same ceiling we just measured. MCP changes the connection, not the strength of the model.

A closing thought

The local agent now really runs where the cloud usually sits. It costs nothing, it does not leave the Mac, and it is set up with a port and a description. For anyone who wants a sense of how a local model feels in the editor, without a key, without a provider, this is a low-threshold entry. For serious coding work in Xcode the model is not enough. We checked and tried it ourselves.

This is exactly where the circle closes back to the triangle from Article 12. apfel’s Foundation Model is the low-threshold corner, local and frugal, but small and borrowed. Anyone who wants more in Xcode ends up at a larger model, that is MLX with an open model and a bit more RAM, or back in the cloud. And the move there is not a rebuild. Xcode’s Locally Hosted slot speaks the same OpenAI protocol as apfel-Serve and as the MLX-LM server. A better local model means the same slot, a different port. What we learned in this series holds beyond the model we used. We did not understand one model, we understood how to build against a protocol, and that knowledge travels with us when the foundation shifts. This time it did not. Next time we will see it when it happens, and take the agent with us.


Previous article: Sovereignty on borrowed ground. This article is the conclusion of the series. The Xcode integration via the Locally Hosted provider is configuration, not new code; the agent stays at v1.0. The measured evidence is recorded in the integration snapshot of 2026-06-20.