End-to-End with Playwright MCP — Tests That Claude Writes Itself

Article 8 · Series: Agentic Coding with Claude Code

The component tests from v0.7 pass, the build is green — on paper everything looks fine. What the tests do not show is how the app behaves in a real browser. Vitest runs in jsdom, a simulated browser that differs from the real one in one critical way: it renders nothing. Whether a tab click actually shows the right visualization, whether the SVG charts draw cleanly in Chromium, whether the breadcrumb navigation responds the way it should — all of that goes untested. End-to-end tests in a real browser are the answer. We do not write them by hand; instead, we let Claude generate them from a Markdown file that describes the user flow.

Playwright Library and Playwright MCP: Two Different Things

Playwright comes in two forms that play different roles.

The Playwright library (@playwright/test) is a test runner. You write TypeScript, call page.goto("/"), make assertions with expect() — and Playwright runs that in a real browser. The code lives in the repo, a human writes it.

The Playwright MCP server is something different. It lets Claude Code control a real browser during a conversation. Claude can open a URL, click on elements, read the DOM snapshot, take screenshots — all at runtime, without writing tests first. The result is not test execution but test generation: Claude explores live, finds robust selectors, and then writes the Playwright test.

The combination is what makes the difference: the skill reads a Markdown spec, opens the browser via MCP, walks through the user flow step by step, and produces a .spec.ts file at the end. No guessing, no selectors written blindly.

Plan File for Article 8

Task 1 (sequential): install and configure Playwright.

Task 2 (after Task 1): write three Markdown specs.
  Sunburst, Sankey, Treemap — one spec each describing the user flow.

Task 3 (parallel to Task 2): write the e2e-spec skill.
  Workflow for the skill: read spec, explore browser via MCP,
  find selectors, write test, save screenshot.

Task 4 (after Task 2 + 3): first generation and baseline.
  Skill in action, generated tests pass, baseline screenshots saved.

Task 1: Installing Playwright

Install @playwright/test as a dev dependency in web/.
Download Chromium via npx playwright install chromium.
Create playwright.config.ts: testDir pointing to tests/e2e/generated,
baseURL http://localhost:5173, Playwright starts the dev server automatically.
Add test:e2e and test:e2e:update-baseline to package.json.

The most important part of the configuration is the webServer block. It starts the Vite dev server automatically before any test runs — no separate terminal for npm run dev needed:

webServer: {
  command: "npm run dev -- --port 5173",
  url: "http://localhost:5173",
  reuseExistingServer: !process.env.CI,
  timeout: 60_000,
}

The reuseExistingServer flag makes a practical difference locally: if the dev server is already running, Playwright simply attaches to it. In CI, where no running server can be assumed, it starts a fresh one.

Task 2: Writing Markdown Specs

The specs describe user flows in a language that humans can read and Claude can translate into tests. Anyone writing a spec does not need to know Playwright or understand how selectors work — describing the intended flow is enough: starting state, action, expected result. Every spec follows the same structure:

# Spec: <Vis-Name>

## Prerequisites
- Dev server running on http://localhost:5173
- Data available via `npm run dev`

## Step 1: <Name>
**Action:** ...
**Expected:** ...

## Screenshot
**Location:** `screenshots/baseline/<vis>.png`
**When:** after step N

The sunburst spec shows what this looks like in practice:

# Spec: Sunburst Visualization

## Step 1: Sunburst is Default
**Action:** Open the page.
**Expected:**
- Header shows "Bayerischer Haushalt 2026"
- Tab "Sunburst" is active (aria-selected)
- SVG with data-testid="sunburst" is visible

## Step 2: Half-Circle Layout
**Expected:**
- Heading "Einnahmen" above the SVG
- Heading "Ausgaben" below the SVG

## Screenshot
**Location:** `screenshots/baseline/sunburst.png`
**When:** after step 2

This spec is not code and not a test — it is a statement of intent. Anyone reading it immediately understands what is being checked. Anyone writing it thinks in user flows, not in selectors.

Task 3: The e2e-spec Skill

The skill in .claude/skills/e2e-spec/SKILL.md describes how a Markdown spec becomes a Playwright test. It reads the spec, checks whether the data is available, then opens a real browser via Playwright MCP and walks through the user flow step by step. What it is looking for throughout: stable selectors. What it builds from that: a .spec.ts file with assertions tied directly to spec steps, and a screenshot at the designated point.

The selector strategy is particularly important. The skill establishes:

Selector priority:
1. data-testid (set deliberately in components, most stable)
2. getByRole with accessible name (getByRole("tab", { name: /Sunburst/ }))
3. getByText for unique text content
4. NO CSS selectors with nth-child, framework IDs, or class chains

A test that uses .btn-primary > span:nth-child(3) as a selector breaks at the next refactor. getByTestId("switch-sunburst") survives layout changes.

Task 4: First Generation

Triggering the skill is straightforward:

Generate the E2E test from web/tests/e2e/specs/sunburst.spec.md.

What the skill does is not write code blindly. It opens the browser via Playwright MCP, navigates to the app, reads the DOM snapshot to understand what is there, clicks through the user flow and notes which selectors are stable enough. Only then does it write the test. What ends up in tests/e2e/generated/sunburst.spec.ts:

test("Spec from tests/e2e/specs/sunburst.spec.md", async ({ page }) => {
  // Step 1: Sunburst is default
  await page.goto("/");
  await expect(
    page.getByRole("heading", { name: /Bayerischer Haushalt 2026/ })
  ).toBeVisible();
  await expect(page.getByTestId("switch-sunburst"))
    .toHaveAttribute("aria-selected", "true");
  await expect(page.getByTestId("sunburst")).toBeVisible();

  // Step 2: Half-circle layout
  await expect(page.locator("h3", { hasText: /^Einnahmen$/ })).toBeVisible();
  await expect(page.locator("h3", { hasText: /^Ausgaben$/ })).toBeVisible();

  // Screenshot: default half-circle layout
  await page.getByTestId("sunburst").screenshot({
    path: "tests/e2e/screenshots/baseline/sunburst.png",
  });

The comments in the test code reference the spec steps. Anyone reading the test can trace it back to the spec.

Visual Regression Baseline

Alongside the three generated tests, the skill also saves one screenshot per visualization on the first run, under web/tests/e2e/screenshots/baseline/. What these screenshots are meant to do is deliberately limited: a sanity check, not a pixel-exact diff.

Pixel-exact image comparisons are a headache with dynamic charts. Anti-aliasing, font rendering, and GPU differences between a development machine and a CI server produce diffs in tests that should be green. What the screenshots offer instead is a quick visual confirmation that the visualization has the right character: the Sankey showing three columns, the sunburst as a half-circle, the treemap with two large tiles. That is enough as a baseline.

Tests Passing

cd web && npm run test:e2e

Output:

Running 3 tests using 1 worker

  ✓ sankey.spec.ts   Spec from tests/e2e/specs/sankey.spec.md   (760ms)
  ✓ sunburst.spec.ts Spec from tests/e2e/specs/sunburst.spec.md (456ms)
  ✓ treemap.spec.ts  Spec from tests/e2e/specs/treemap.spec.md  (490ms)

  3 passed (3.1s)

Vitest stays responsible for component tests and runs in seconds. Playwright needs a real browser startup and accordingly takes a bit longer — which is why both run through separate scripts. npm run test for the fast component tests, npm run test:e2e when you want to know what actually happens in the real browser.

Status at the End of This Article

git clone https://codeberg.org/rotecodefraktion/byhaushalt.git
cd byhaushalt
git checkout v0.8
cd parser && uv run python -m parser.normalize
cd ../web && npm install && npx playwright install chromium
npm run test:e2e

Full state at byhaushalt @ v0.8.

v0.8 contains the complete E2E infrastructure: playwright.config.ts, three Markdown specs in web/tests/e2e/specs/, three Playwright tests generated from them in web/tests/e2e/generated/, three baseline screenshots, and .claude/skills/e2e-spec/SKILL.md. The total test count: 3 Vitest, 3 Playwright, and 19 pytest passing, with 3 pytest xfails documented.

What Comes Next

Article 9 covers hooks — automatic actions bound to events in Claude Code: a hook that runs tests before a commit, one that logs the context state after a session, one that triggers a skill when certain files change. The foundation is in place: tests run, baseline is set. Now comes the automation.

How MCP servers are configured and Context7 is used for library docs is in Article 7. Worktrees and parallel visualizations are in Article 6.