apfel from the command line

Article 1 · Series: A Local Coding Agent with apfel

Before we build an agent, we get to know the tool it sits on. apfel is a small Swift CLI that exposes Apple’s Foundation Model on macOS 26 as a command line, an OpenAI-compatible HTTP server, and an interactive chat. We install apfel via Homebrew, walk through the three modes in order, prompt, serve, chat, and write a handful of small shell scripts along the way that will later serve as building blocks for the agent. At the end we have a demo repo on Codeberg pinned to tag v0.1 and a first qualitative impression of what the small on-device model handles well and where it gives way.

Verifying Requirements

apfel assumes a very specific platform. We verify it before installing.

sw_vers                   # macOS 26.3 (Tahoe) or newer
uname -m                  # arm64
xcodebuild -version       # Xcode 26.4 or newer (optional)

The hard requirements: an Apple Silicon Mac (M1 or newer), macOS 26.3 (Tahoe), and Apple Intelligence enabled in System Settings. Without Apple Intelligence enabled, apfel returns exit code 5 (“Model unavailable”) on every call. Xcode 26.4+ is only needed if we later want to build the HEAD variant of apfel from source; for the Homebrew bottle it’s not required.

Component	State at `v0.1`	What for
Hardware	Apple Silicon (M1+)	arm64 architecture, Neural Engine
macOS	26.3 (Tahoe)	Foundation Models framework
Apple Intelligence	enabled	Model is otherwise not unlocked
Homebrew	current	Installation package manager
Xcode (optional)	26.4	Source build of apfel

Installation

brew install apfel
apfel --version           # apfel v1.5.1
apfel --release           # detailed release and build info
apfel --help              # all modes and flags at a glance

The Homebrew formula pulls the current bottle. State of tag v0.1 is apfel 1.5.1; the version lives in this article’s frontmatter and in docs/setup.md of the demo repo. When the version jumps in later articles, we name the jump explicitly in the article body.

apfel --help is the most important first read. It shows the three modes — prompt, --serve, --chat — as primary uses and lists the flags with descriptions. The USAGE line is binding:

USAGE:
  apfel [OPTIONS] <prompt>       Send a single prompt
  apfel -f <file> <prompt>       Attach file content to prompt
  apfel --chat                   Interactive conversation
  apfel --serve                  Start OpenAI-compatible HTTP server

Flags come before the positional prompt argument. apfel -s "..." "..." is correct; apfel "..." -s "..." silently drops what doesn’t fit. That sounds trivial; we return to it in “Setup Pitfalls.”

Prompt Mode

Prompt mode is a single self-contained request. We pass a string, apfel sends it to the local model, and writes the response to stdout.

apfel "What is a closure in Swift, in two sentences?"

This is the building block everything else sits on. The agent loop we build later is at its core a loop of such calls, with added context, tool calling, and confirmation gates.

A first useful example instead of “Hello World” is a commit message suggestion from a staged diff:

git diff --cached | apfel \
  -s "You write Conventional Commits in one line, max 60 characters, lowercase except proper nouns, no trailing period." \
  "Write a fitting message for this diff."

stdin carries the diff, the positional prompt steers behavior, -s sets the role. Three mechanics in one line, and that’s at its core what we’ll encapsulate as a tool call in the agent.

JSON Output and the Script Pattern

The scripting pattern is built into apfel: -o json switches from plain-text response to structured JSON, letting responses pipe cleanly through jq.

apfel -o json "Explain higher-order functions in one sentence." | jq -r '.content'

This exact pattern lives in the demo repo as examples/cli/04-json-pipe.sh. It’s three lines long and shows how an apfel call becomes a UNIX tool that fits into pipes.

For scripts, apfel’s clean exit codes are useful:

Code	Meaning
`0`	Success
`1`	Runtime error
`2`	Usage error (bad flags)
`3`	Guardrail blocked (content policy)
`4`	Context overflow (input too long)
`5`	Model unavailable (Apple Intelligence not enabled)
`6`	Rate limited / busy

A script calling apfel can react to 5 by pointing the user to Apple Intelligence, to 4 by retrying with shorter context, to 3 by logging the guardrail. That’s more structure than many cloud CLIs offer.

The Foundation Model does not respond deterministically. For reproducible smoke tests there’s --seed <n>. When we later write tests against model behavior in the agent, --seed is the anchor.

System Prompt and File Input

-s "<role>" sets a system prompt that defines persona or output format. File content reaches the model on two documented paths, both are in apfel --help.

Variant A: -f as the apfel-native flag.

apfel -f notes.md "Summarize the following content in three sentences."
apfel -f a.txt -f b.txt "Compare these two files."

-f is repeatable; multiple files attach in one request.

Variant B: stdin (pipe or input redirect).

apfel "Summarize the following content in three sentences." < notes.md
cat notes.md | apfel "Summarize the following content in three sentences."

Both work. The demo scripts in the repo use stdin redirect (< file) because it has no external dependency and chains well with other UNIX tools. -f is the more compact form in the multi-file case.

One thing the small on-device model teaches us right away: for code tasks the system prompt must explicitly focus on the given input, otherwise the model readily invents a different piece of code and explains that instead. What works:

apfel \
  -s "You are a precise senior developer. Explain ONLY the code provided. Do not invent other code, do not write your own variant." \
  "Explain what this code does." \
  < fibonacci.swift

The script examples/cli/02-explain-code.sh does exactly that. Without the anti-hallucination addition in the system prompt, the model was repeatedly more creative than necessary in our first smoke test.

Chat Mode

apfel --chat opens an interactive session in the terminal. The model holds context across multiple turns until we end the session or the context window overflows.

apfel --chat -s "You are a calm coding assistant. Answer briefly and clearly."

apfel ships with strategies for managing context as the session grows long:

newest-first (default) — oldest turns get evicted first
oldest-first — newer turns yield, oldest stay
sliding-window with --context-max-turns <n> — fixed number of turns
summarize — apfel compresses older turns on its own
strict — error on overflow, no automatic trimming

--context-status enables a display after each turn that reports the context-window fill level. It’s one of the most useful flags for understanding the on-device model: we see directly when we’re hitting the limit.

In the demo repo, a thin wrapper script examples/cli/07-chat-session.sh sets the system prompt to a brief-and-clear style and keeps the default context strategy. Chat mode is the sandbox where we can later trace the Plan/Act/Observe mechanics of the agent most directly.

A Taste of Serve Mode

apfel --serve starts an HTTP server on 127.0.0.1:11434 with an OpenAI-compatible API.

apfel --serve
# Server listens on http://localhost:11434/v1

From another terminal:

curl -s http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "apple-foundationmodel",
    "messages": [{"role": "user", "content": "Explain higher-order functions in one sentence."}]
  }' | jq -r '.choices[0].message.content'

This is the bridge our Swift agent will dock to in Article 3. /v1/chat/completions with streaming via SSE, /v1/models, and /health are the most important endpoints. apfel describes itself as a drop-in replacement for any SDK expecting an OpenAI endpoint — which makes serve mode the actual door opener for our own tools.

We only step in briefly here. Article 2 takes serve apart in detail: all endpoints, token auth, CORS, everything that’s just a list in --help at this point.

Where the Model Holds Up, Where It Gives Way

After the smoke test with the demo scripts, a first qualitative impression takes shape. This is not an eval, we run that in Article 6 with a task canon and source labels per number. But it shows a rough line.

The model holds up:

Summarizing medium text segments (a note of a few sentences gets cleanly mirrored into three sentences)
Translation German ↔ English in both directions, idiomatically
Code explanation when the system prompt focuses on the provided code
Diff review in the 3-point format, when the task is clearly scoped
Concept explanations on programming topics (higher-order functions, recursion, closures)

The model gives way:

Geographic and political fact questions in German. “Was ist die Hauptstadt von Österreich?” returned a marketing reflex about “websites of the responsible authorities” in our measurement. The same question in English, “What is the capital of Austria?”, returns “Vienna” cleanly. The guardrails fire language-asymmetrically¹.
Generic “name three …” prompts trigger the marketing reflex even in English. --permissive loosens the filters but doesn’t always help here.
Code hallucination: without an explicit focus prompt, on code tasks the model readily invents its own code instead of explaining the one provided.

¹ Own measurement 2026-06-02 with apfel 1.5.1 on macOS 26.3, documented in the series buildlog.

The model is small (Apple Machine Learning Research mentions around three billion parameters² for the on-device variant of the first Apple Intelligence generation) and stochastic: the same call can return a clean answer one time and a deflection the next. For reproducible tests we set --seed. For tasks that need to run reliably, an anti-hallucination prompt with clear focus on the input is worth the effort.

² Apple Machine Learning Research (June 2024, updated July 2024): “Introducing Apple’s On-Device and Server Foundation Models”, https://machinelearning.apple.com/research/introducing-apple-foundation-models.

Demo Repo: apfel-coding-agent v0.1

The state at the end of this article is frozen on Codeberg as tag v0.1: https://codeberg.org/rotecodefraktion/apfel-coding-agent/src/tag/v0.1. Anyone following along finds everything we did here — the seven example scripts, the setup doc, and the CLAUDE.md with the series conventions.

Setting up apfel-coding-agent v0.1

Clone and check out the tag:

git clone https://codeberg.org/rotecodefraktion/apfel-coding-agent.git
cd apfel-coding-agent
git checkout v0.1
chmod +x examples/cli/*.sh

Contents at tag v0.1:

README.md — series link and quick start
CLAUDE.md — conventions for code sessions (language, stack, path layout)
LICENSE — MIT
.gitignore — Swift/macOS standard ignores
docs/setup.md — installation, USAGE rules, file-input variants, exit codes, language-asymmetric guardrails, state snapshot
examples/cli/ — seven small scripts:
- 01-summarize-notes.sh — summarize a note (stdin redirect)
- 02-explain-code.sh — explain code with an anti-hallucination prompt
- 03-suggest-commit-message.sh — Conventional Commit from git diff --cached
- 04-json-pipe.sh — -o json | jq as a script pattern
- 05-translate.sh — translate with a system prompt
- 06-explain-diff.sh — diff review in the 3-point format
- 07-chat-session.sh — interactive chat session with a default system prompt

First test that everything responds:

echo "The series builds a local coding agent in Swift on top of apfel." | apfel "Summarize this in one sentence."

When a short summary comes back, the installation is through.

Setup Pitfalls

Three traps that cost time on the first run, as a take-away for anyone following along later.

Argument order. apfel’s USAGE prescribes apfel [OPTIONS] <prompt>. Flags must come before the positional prompt. apfel "..." -f file.md ignores the file; apfel -f file.md "..." is correct. A misread of the --help output that cost the scripts all their file inputs on the first attempt.

Language asymmetry of the guardrails. The capital-of-Austria question in German hits a more restrictive filter than the same question in English. For smoke tests in a German-language series this means: switch fact questions to English, or keep the system prompt in English. In the demo repo all scripts use English system prompts; user inputs (notes, code, diffs) are language-neutral.

Code hallucination without focus. When we ask the model to explain a piece of code without specifying in the system prompt “explain ONLY the code provided,” it readily invents a different one and explains that with confidence. The anti-hallucination addition in 02-explain-code.sh is a correctness measure, not a style preference.

How It Continues

Article 2 takes serve mode apart in detail: all three endpoints, the Chat Completions schema with SSE streaming, token auth and CORS, the anatomy of an OpenAI drop-in. With that we have the foundation our Swift client can dock to in Article 3.

Previous article: The Model Is Already There — A Prologue to the Local Coding Agent. Next article: Serve Mode and the OpenAI Protocol (placeholder — link finalized when Article 2 is published). Repo tag: v0.1.