apfel from the command line
Article 1 · Series: A Local Coding Agent with apfel
Before we build an agent, we get to know the tool it sits on. apfel is a small Swift CLI that exposes Apple’s Foundation Model on macOS 26 as a command line, an OpenAI-compatible HTTP server, and an interactive chat. We install apfel via Homebrew, walk through the three modes in order, prompt, serve, chat, and write a handful of small shell scripts along the way that will later serve as building blocks for the agent. At the end we have a demo repo on Codeberg pinned to tag v0.1 and a first qualitative impression of what the small on-device model handles well and where it gives way.
Verifying Requirements
apfel assumes a very specific platform. We verify it before installing.
sw_vers # macOS 26.3 (Tahoe) or newer
uname -m # arm64
xcodebuild -version # Xcode 26.4 or newer (optional)
The hard requirements: an Apple Silicon Mac (M1 or newer), macOS 26.3 (Tahoe), and Apple Intelligence enabled in System Settings. Without Apple Intelligence enabled, apfel returns exit code 5 (“Model unavailable”) on every call. Xcode 26.4+ is only needed if we later want to build the HEAD variant of apfel from source; for the Homebrew bottle it’s not required.
| Component | State at v0.1 | What for |
|---|---|---|
| Hardware | Apple Silicon (M1+) | arm64 architecture, Neural Engine |
| macOS | 26.3 (Tahoe) | Foundation Models framework |
| Apple Intelligence | enabled | Model is otherwise not unlocked |
| Homebrew | current | Installation package manager |
| Xcode (optional) | 26.4 | Source build of apfel |
Installation
brew install apfel
apfel --version # apfel v1.5.1
apfel --release # detailed release and build info
apfel --help # all modes and flags at a glance
The Homebrew formula pulls the current bottle. State of tag v0.1 is apfel 1.5.1; the version lives in this article’s frontmatter and in docs/setup.md of the demo repo. When the version jumps in later articles, we name the jump explicitly in the article body.
apfel --help is the most important first read. It shows the three modes — prompt, --serve, --chat — as primary uses and lists the flags with descriptions. The USAGE line is binding:
USAGE:
apfel [OPTIONS] <prompt> Send a single prompt
apfel -f <file> <prompt> Attach file content to prompt
apfel --chat Interactive conversation
apfel --serve Start OpenAI-compatible HTTP server
Flags come before the positional prompt argument. apfel -s "..." "..." is correct; apfel "..." -s "..." silently drops what doesn’t fit. That sounds trivial; we return to it in “Setup Pitfalls.”
Prompt Mode
Prompt mode is a single self-contained request. We pass a string, apfel sends it to the local model, and writes the response to stdout.
apfel "What is a closure in Swift, in two sentences?"
This is the building block everything else sits on. The agent loop we build later is at its core a loop of such calls, with added context, tool calling, and confirmation gates.
A first useful example instead of “Hello World” is a commit message suggestion from a staged diff:
git diff --cached | apfel \
-s "You write Conventional Commits in one line, max 60 characters, lowercase except proper nouns, no trailing period." \
"Write a fitting message for this diff."
stdin carries the diff, the positional prompt steers behavior, -s sets the role. Three mechanics in one line, and that’s at its core what we’ll encapsulate as a tool call in the agent.
JSON Output and the Script Pattern
The scripting pattern is built into apfel: -o json switches from plain-text response to structured JSON, letting responses pipe cleanly through jq.
apfel -o json "Explain higher-order functions in one sentence." | jq -r '.content'
This exact pattern lives in the demo repo as examples/cli/04-json-pipe.sh. It’s three lines long and shows how an apfel call becomes a UNIX tool that fits into pipes.
For scripts, apfel’s clean exit codes are useful:
| Code | Meaning |
|---|---|
0 | Success |
1 | Runtime error |
2 | Usage error (bad flags) |
3 | Guardrail blocked (content policy) |
4 | Context overflow (input too long) |
5 | Model unavailable (Apple Intelligence not enabled) |
6 | Rate limited / busy |
A script calling apfel can react to 5 by pointing the user to Apple Intelligence, to 4 by retrying with shorter context, to 3 by logging the guardrail. That’s more structure than many cloud CLIs offer.
The Foundation Model does not respond deterministically. For reproducible smoke tests there’s --seed <n>. When we later write tests against model behavior in the agent, --seed is the anchor.
System Prompt and File Input
-s "<role>" sets a system prompt that defines persona or output format. File content reaches the model on two documented paths, both are in apfel --help.
Variant A: -f as the apfel-native flag.
apfel -f notes.md "Summarize the following content in three sentences."
apfel -f a.txt -f b.txt "Compare these two files."
-f is repeatable; multiple files attach in one request.
Variant B: stdin (pipe or input redirect).
apfel "Summarize the following content in three sentences." < notes.md
cat notes.md | apfel "Summarize the following content in three sentences."
Both work. The demo scripts in the repo use stdin redirect (< file) because it has no external dependency and chains well with other UNIX tools. -f is the more compact form in the multi-file case.
One thing the small on-device model teaches us right away: for code tasks the system prompt must explicitly focus on the given input, otherwise the model readily invents a different piece of code and explains that instead. What works:
apfel \
-s "You are a precise senior developer. Explain ONLY the code provided. Do not invent other code, do not write your own variant." \
"Explain what this code does." \
< fibonacci.swift
The script examples/cli/02-explain-code.sh does exactly that. Without the anti-hallucination addition in the system prompt, the model was repeatedly more creative than necessary in our first smoke test.
Chat Mode
apfel --chat opens an interactive session in the terminal. The model holds context across multiple turns until we end the session or the context window overflows.
apfel --chat -s "You are a calm coding assistant. Answer briefly and clearly."
apfel ships with strategies for managing context as the session grows long:
newest-first(default) — oldest turns get evicted firstoldest-first— newer turns yield, oldest staysliding-windowwith--context-max-turns <n>— fixed number of turnssummarize— apfel compresses older turns on its ownstrict— error on overflow, no automatic trimming
--context-status enables a display after each turn that reports the context-window fill level. It’s one of the most useful flags for understanding the on-device model: we see directly when we’re hitting the limit.
In the demo repo, a thin wrapper script examples/cli/07-chat-session.sh sets the system prompt to a brief-and-clear style and keeps the default context strategy. Chat mode is the sandbox where we can later trace the Plan/Act/Observe mechanics of the agent most directly.
A Taste of Serve Mode
apfel --serve starts an HTTP server on 127.0.0.1:11434 with an OpenAI-compatible API.
apfel --serve
# Server listens on http://localhost:11434/v1
From another terminal:
curl -s http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "apple-foundationmodel",
"messages": [{"role": "user", "content": "Explain higher-order functions in one sentence."}]
}' | jq -r '.choices[0].message.content'
This is the bridge our Swift agent will dock to in Article 3. /v1/chat/completions with streaming via SSE, /v1/models, and /health are the most important endpoints. apfel describes itself as a drop-in replacement for any SDK expecting an OpenAI endpoint — which makes serve mode the actual door opener for our own tools.
We only step in briefly here. Article 2 takes serve apart in detail: all endpoints, token auth, CORS, everything that’s just a list in --help at this point.
Where the Model Holds Up, Where It Gives Way
After the smoke test with the demo scripts, a first qualitative impression takes shape. This is not an eval, we run that in Article 6 with a task canon and source labels per number. But it shows a rough line.
The model holds up:
- Summarizing medium text segments (a note of a few sentences gets cleanly mirrored into three sentences)
- Translation German ↔ English in both directions, idiomatically
- Code explanation when the system prompt focuses on the provided code
- Diff review in the 3-point format, when the task is clearly scoped
- Concept explanations on programming topics (higher-order functions, recursion, closures)
The model gives way:
- Geographic and political fact questions in German. “Was ist die Hauptstadt von Österreich?” returned a marketing reflex about “websites of the responsible authorities” in our measurement. The same question in English, “What is the capital of Austria?”, returns “Vienna” cleanly. The guardrails fire language-asymmetrically¹.
- Generic “name three …” prompts trigger the marketing reflex even in English.
--permissiveloosens the filters but doesn’t always help here. - Code hallucination: without an explicit focus prompt, on code tasks the model readily invents its own code instead of explaining the one provided.
¹ Own measurement 2026-06-02 with apfel 1.5.1 on macOS 26.3, documented in the series buildlog.
The model is small (Apple Machine Learning Research mentions around three billion parameters² for the on-device variant of the first Apple Intelligence generation) and stochastic: the same call can return a clean answer one time and a deflection the next. For reproducible tests we set --seed. For tasks that need to run reliably, an anti-hallucination prompt with clear focus on the input is worth the effort.
² Apple Machine Learning Research (June 2024, updated July 2024): “Introducing Apple’s On-Device and Server Foundation Models”, https://machinelearning.apple.com/research/introducing-apple-foundation-models.
Demo Repo: apfel-coding-agent v0.1
The state at the end of this article is frozen on Codeberg as tag v0.1: https://codeberg.org/rotecodefraktion/apfel-coding-agent/src/tag/v0.1. Anyone following along finds everything we did here — the seven example scripts, the setup doc, and the CLAUDE.md with the series conventions.
Setting up apfel-coding-agent v0.1
Clone and check out the tag:
git clone https://codeberg.org/rotecodefraktion/apfel-coding-agent.git
cd apfel-coding-agent
git checkout v0.1
chmod +x examples/cli/*.sh
Contents at tag v0.1:
README.md— series link and quick startCLAUDE.md— conventions for code sessions (language, stack, path layout)LICENSE— MIT.gitignore— Swift/macOS standard ignoresdocs/setup.md— installation, USAGE rules, file-input variants, exit codes, language-asymmetric guardrails, state snapshotexamples/cli/— seven small scripts:01-summarize-notes.sh— summarize a note (stdin redirect)02-explain-code.sh— explain code with an anti-hallucination prompt03-suggest-commit-message.sh— Conventional Commit fromgit diff --cached04-json-pipe.sh—-o json | jqas a script pattern05-translate.sh— translate with a system prompt06-explain-diff.sh— diff review in the 3-point format07-chat-session.sh— interactive chat session with a default system prompt
First test that everything responds:
echo "The series builds a local coding agent in Swift on top of apfel." | apfel "Summarize this in one sentence."
When a short summary comes back, the installation is through.
Setup Pitfalls
Three traps that cost time on the first run, as a take-away for anyone following along later.
Argument order. apfel’s USAGE prescribes apfel [OPTIONS] <prompt>. Flags must come before the positional prompt. apfel "..." -f file.md ignores the file; apfel -f file.md "..." is correct. A misread of the --help output that cost the scripts all their file inputs on the first attempt.
Language asymmetry of the guardrails. The capital-of-Austria question in German hits a more restrictive filter than the same question in English. For smoke tests in a German-language series this means: switch fact questions to English, or keep the system prompt in English. In the demo repo all scripts use English system prompts; user inputs (notes, code, diffs) are language-neutral.
Code hallucination without focus. When we ask the model to explain a piece of code without specifying in the system prompt “explain ONLY the code provided,” it readily invents a different one and explains that with confidence. The anti-hallucination addition in 02-explain-code.sh is a correctness measure, not a style preference.
How It Continues
Article 2 takes serve mode apart in detail: all three endpoints, the Chat Completions schema with SSE streaming, token auth and CORS, the anatomy of an OpenAI drop-in. With that we have the foundation our Swift client can dock to in Article 3.
Previous article: The Model Is Already There — A Prologue to the Local Coding Agent. Next article: Serve Mode and the OpenAI Protocol (placeholder — link finalized when Article 2 is published). Repo tag: v0.1.