Subagent-Driven Development — Building a Data Model in Parallel

Article 5 · Series: Agentic Coding with Claude Code

v0.4 produces three JSON files: Epl01, Epl11, and Epl16. That is not a data model. Column names exist, but types are not guaranteed, the files have different formats, and there is no single source a frontend can work with. Three tasks now come due: define the schema, normalize the JSON, build validation. Subagent-Driven Development.

What Subagent-Driven Development Is — and Is Not

A subagent is not a separate process and not a distinct Claude instance. In the context of Claude Code, a subagent is a main agent that instructs Claude via skill or prompt to handle part of a task in isolated responsibility. The difference from a direct prompt: the subagent receives a plan, a clear goal, a verification condition, and a limited scope. It is not responsible for the overall context.

The distinction from slash commands and skills:

Slash command — defines what gets executed (fixed, deterministic, with an argument)
Skill — defines how to approach a task (workflow, context, criteria)
Subagent — executes one step of a plan, with isolated scope and its own verification criterion

Subagents are not a universal solution. They pay off when tasks are truly independent, each task has a clear definition of done, and the coordination overhead is smaller than the gain from parallel execution.

Plan File as Contract

Before a subagent touches anything, it needs a briefing. The difference between a plan file and a wish list:

Wish list: “Build a Polars schema for the budget.”

Briefing:

Task 1 — schema.py (sequential, must run first)
Input:        none (purely declarative)
Output:       parser/src/parser/schema.py with HAUSHALT_SCHEMA: dict[str, pl.DataType]
Verification: python -c "from parser.schema import HAUSHALT_SCHEMA; assert len(HAUSHALT_SCHEMA) >= 6"
Definition of Done: schema importable, all required columns defined.

Every task in the plan has: input, output, test command, definition of done. The test command is decisive: without it, the subagent’s result cannot be verified. A subagent that is “done” but produces no verification has not fulfilled the briefing.

The plan file lives in the repo, versioned, not in a prompt:

Create plans/v0.5-datenmodell.md as a briefing for three tasks:

Task 1 (sequential): schema.py — Polars schema HAUSHALT_SCHEMA, 11 columns.
  Verification: python -c "from parser.schema import HAUSHALT_SCHEMA; assert len(HAUSHALT_SCHEMA) >= 6"

Task 2a (after Task 1, parallel to 2b): normalize.py — read all output/*.json,
  cast against schema, write haushalt.parquet + manifest.json.
  Verification: uv run python -m parser.normalize

Task 2b (after Task 1, parallel to 2a): .claude/skills/validate-totals/SKILL.md —
  document Hypothesis properties and the pytest invocation.
  Verification: file exists and contains uv run pytest command.

Dependency: start 2a and 2b only after Task 1 is complete.

Claude Code writes the plan file from this prompt. The briefing becomes part of the commit, not just the context of one session.

Task 1 — `schema.py`: The Schema Is a Constant

The schema does not belong in the code that uses it. It belongs in its own file, importable from anywhere. That is the first subagent task, sequential: all other tasks depend on this schema.

Run Task 1 from plans/v0.5-datenmodell.md.

Claude Code reads the briefing, writes schema.py, and runs the verification:

→ Task 1: parser/src/parser/schema.py
python -c "from parser.schema import HAUSHALT_SCHEMA; assert len(HAUSHALT_SCHEMA) >= 6"
ok — 11 columns defined. Task 1 complete.

The schema itself:

# parser/src/parser/schema.py
import polars as pl

HAUSHALT_SCHEMA: dict[str, pl.DataType] = {
    "epl_nr": pl.Utf8,
    "kap_nr": pl.Utf8,
    "titel_nr": pl.Utf8,
    "fkz": pl.Utf8,
    "zweckbestimmung": pl.Utf8,
    "soll_2026_tsd": pl.Float64,
    "soll_2027_tsd": pl.Float64,
    "soll_2025_tsd": pl.Float64,
    "ist_2024_tsd": pl.Float64,
    "ist_2023_tsd": pl.Float64,
    "source_file": pl.Utf8,
}

REQUIRED_COLUMNS = frozenset({"epl_nr", "kap_nr", "titel_nr", "zweckbestimmung", "source_file"})
AMOUNT_COLUMNS = frozenset({"soll_2026_tsd", "soll_2027_tsd", "soll_2025_tsd", "ist_2024_tsd", "ist_2023_tsd"})

Historical columns (soll_2025_tsd, ist_2024_tsd, ist_2023_tsd) are still null in v0.5, but they are in the schema. The Parquet stays backward-compatible when v0.6 fills them in.

source_file is a traceability column: every row knows which JSON file it came from. This costs nothing in storage and is invaluable when debugging.

Verification:

python -c "from parser.schema import HAUSHALT_SCHEMA; assert len(HAUSHALT_SCHEMA) >= 6; print('ok')"
ok

Two Parallel Subagents: normalize + validate-totals

Once schema.py exists, two tasks can run in parallel:

Task 2a — normalize.py: reads all EPL JSONs, casts against the schema, writes Parquet + manifest
Task 2b — validate-totals skill: defines the Hypothesis properties and the test invocation

These two have no dependency on each other: the skill does not need a Parquet, and normalize.py does not need the skill. This is exactly where parallelization pays off.

Task 1 complete. Run Task 2a and 2b from plans/v0.5-datenmodell.md.
Order does not matter, both are independent.

Claude Code handles both tasks and runs their verifications:

→ Task 2b: .claude/skills/validate-totals/SKILL.md created. ✓

→ Task 2a: parser/src/parser/normalize.py
uv run python -m parser.normalize

What normalize.py discovers

Writing normalize.py surfaces the first problem: epl11.json has a different format than the other two JSON files.

// epl11.json (v0.3 format) — dict wrapper
{
  "epl_nr": "11",
  "epl_name": "Bayerischer Oberster Rechnungshof",
  "titel_count": 81,
  "titel": [ {...}, {...} ]
}

// Epl01.json (v0.4 format) — direct list
[
  { "titel_nr": "112 01-0", "epl_nr": "01", ... },
  ...
]

The /parse-epl command from v0.4 writes lists. The v0.3 script that generated epl11.json wrote a dict wrapper. normalize.py has to handle both formats:

def _load_json(path: Path) -> list[dict]:
    raw = json.loads(path.read_text(encoding="utf-8"))
    if isinstance(raw, list):
        return raw
    if isinstance(raw, dict) and "titel" in raw:
        return raw["titel"]
    raise ValueError(f"{path.name}: unknown JSON format")

Not an elegant solution, but a pragmatic one. The data model absorbs format inconsistency; the source stays as it is.

The second problem: on the first test run, normalize.py reads manifest.json as well, the file we just generated. It is not an EPL JSON, has no titel key, and is not a list. The function raises ValueError.

# fix: explicitly exclude manifest.json
json_files = [f for f in sorted(output_dir.glob("*.json")) if f.name != "manifest.json"]

These two lines follow a pattern: generated artifacts that land in the same directory as input files need explicit exclusion. Alternatively, EPL JSONs could live in a subdirectory (output/epls/). Cleaner, but an architecture decision for v0.6.

Result after the fix:

Parquet: output/haushalt.parquet (377 rows)
Manifest: output/manifest.json
  Epl01.json: 138 titles, Σ=184234.3 Tsd. €
  Epl16.json: 158 titles, Σ=122046.4 Tsd. €
  epl11.json: 81 titles, Σ=47390.9 Tsd. €

377 rows, three EPL, one Parquet.

Property-Based Tests — Invariants Instead of Examples

With Parquet and skill in place, the last prompt follows:

Write parser/tests/test_normalize.py and parser/tests/test_properties.py.
test_normalize.py: smoke tests — Parquet exists, schema matches, manifest complete.
test_properties.py: Hypothesis properties from validate-totals/SKILL.md.
  Mark property "Σ EPL = total budget" as xfail — only 3 of 16 EPL available.
Then run: uv run pytest -v

test_all_epls.py from v0.4 checks concrete sums against reference values. That is an example test: it checks a known input against a known output. Property-based tests with Hypothesis invert this: they describe what must hold for any possible input, and leave test data generation to the framework.

For the data model the invariants are clear:

Every EPL block has at least one title with soll_2026_tsd != null
All extracted amounts are ≥ 0
Every row has source_file != null (traceability)

def test_every_epl_has_titles_with_values(df: pl.DataFrame) -> None:
    for epl_nr in df["epl_nr"].unique().to_list():
        subset = df.filter(pl.col("epl_nr") == epl_nr)
        non_null = subset["soll_2026_tsd"].drop_nulls()
        assert len(non_null) > 0, f"EPL {epl_nr}: no soll_2026_tsd values"

def test_amounts_non_negative(df: pl.DataFrame) -> None:
    vals = df["soll_2026_tsd"].drop_nulls()
    neg = vals.filter(vals < 0)
    assert len(neg) == 0, f"{len(neg)} negative amounts found"

The fourth property, Σ of all EPL expenditures == 84,647,400 Tsd. € (Bavaria 2026 total budget), is marked xfail:

@pytest.mark.xfail(
    reason="Only 3 of 16 EPL parsed — Σ EPL != total budget until v0.6",
    strict=False,
)
def test_gesamthaushalt_ausgaben(df: pl.DataFrame) -> None:
    ausgaben = df.filter(
        pl.col("soll_2026_tsd").is_not_null()
        & ~pl.col("titel_nr").str.starts_with("1")
    )["soll_2026_tsd"].sum()
    assert abs(ausgaben - 84_647_400.0) <= 1.0

strict=False means: if the test unexpectedly passes, that is not a failure. If it fails, that is known and documented. This is the difference between an xfail that forms a regression guard and one that simply gets ignored.

The Manifest — Why a Sidecar Is Required

haushalt.parquet contains 377 rows. Without a sidecar nobody knows which PDF version this Parquet corresponds to, when it was generated, or which EPL are in it. The manifest resolves that:

{
  "generated_at": "2026-05-13T18:00:00+00:00",
  "parser_version": "v0.5",
  "total_rows": 377,
  "source_files": ["Epl01.json", "Epl16.json", "epl11.json"],
  "epl_count": 3,
  "soll_2026_tsd_total": 353671.6,
  "checksum_sha256_16": "a1b2c3d4e5f60718"
}

The frontend (Article 6) reads the manifest before opening the Parquet. This way it knows whether the file is current or needs regenerating. The checksum hash verifies that the Parquet has not been altered since the last normalize run.

Parquet and manifest go into .gitignore: generated artifacts, not source. The source is normalize.py plus the EPL JSONs.

When Not to Parallelize

Task 1 (schema.py) ran sequentially. That was not arbitrary. normalize.py imports HAUSHALT_SCHEMA from schema.py. If both run in parallel, the normalize subagent writes against a schema that does not yet exist.

Parallelization pays off exactly when:

Tasks have no import dependencies on each other
Tasks do not write to shared state (two subagents writing the same file produce race conditions)
The coordination overhead is smaller than the gain

A 10-line script, a single configuration field: for those, a subagent is overhead without gain. Parallelization is not a goal; it is a means.

Status at the End of This Article

git clone https://codeberg.org/rotecodefraktion/byhaushalt.git
cd byhaushalt
git checkout v0.5

Full state at byhaushalt @ v0.5.

v0.5 contains: schema.py, normalize.py, validate-totals skill, plan file, Hypothesis tests, .gitignore update. 19 tests passing, 3 xfailed (parser gaps from Epl01/Epl16 in v0.4 + total budget property).

cd parser && uv run pytest -v
# 19 passed, 3 xfailed in 1.55s

What Comes Next

Article 6 covers worktrees. The byhaushalt frontend gets three visualization variants: treemap, sunburst, and Sankey. All three are to be developed in parallel without getting in each other’s way. Git worktrees offer exactly that: three isolated working directories, one repository.

How slash commands made the parser repeatable is covered in Article 4. The custom skill and TDD loop from v0.3 are in Article 3.