test(v0): add boring PR lifecycle fixture (closes #33) #52

Merged

pdurlej merged 1 commit from claude/patchwarden-boring-pr-lifecycle-fixture into main

2026-05-27 12:37:58 +02:00

claude commented

2026-05-27 12:16:10 +02:00

Collaborator

What

One new test file: tests/test_boring_pr_lifecycle.py (+151 lines). Pure test, zero src changes. Proves the deterministic PR gate — fed by the real policies/platform.v0.toml bundle — returns eligible_clean for safe docs/status PRs and needs_human for workflow / runtime / control-plane paths.

No Forgejo network calls. File-based policy load (load_bundle) + in-memory evaluate_pull_request.

What's covered (5 tests)

Positive (`safe_docs_status` → `eligible_clean`)

Test	Changed files	Why
`test_docs_prefix_pr_is_eligible_clean`	`docs/roadmap.md`, `docs/operations/notes.md`	Primary dogfood path on `pdurlej/platform`
`test_status_marker_pr_is_eligible_clean`	`state/STATUS_NOW.md`, `state/cycle/W6d-2026-05-25.md`	W6d status-marker lane (second proven path per `docs/operations/platform-dogfood.md` from #51)

Negative (blocked classification → `needs_human`)

Test	Changed files	Classification
`test_workflow_change_holds_for_human`	`.forgejo/workflows/patchwarden-client-dry-run.yml`	`workflow`
`test_runtime_change_holds_for_human`	`compose/docker-compose.yml`	`runtime`
`test_policy_governance_change_holds_for_human`	`src/patchwarden/pr_check.py`	`policy_governance`

Each negative asserts: exit_code == 1, verdict == "needs_human", would_auto_merge_later == False. The workflow case also asserts manual_pr_class in blockers.

Why this matters

This is the calibration baseline for the W6d-automerge-calibration lane. If these five tests fail, the dogfood loop documented in docs/operations/platform-dogfood.md (PR #51) cannot be trusted — a regression in policies/platform.v0.toml, policy_bundle.py, or pr_check.py that silently flips a verdict would slip through without these end-to-end fixtures catching it.

Existing tests/test_pr_check.py uses DEFAULT_BUNDLE (in-memory, hardcoded). This new file uses load_bundle(Path("policies/platform.v0.toml")) — the actual TOML the dry-run workflow loads in production. Different failure mode, different coverage. Complementary, not redundant.

Spec sources (per codex issue #33)

tests/test_pipeline.py — pipeline shape (consulted, not edited)
tests/test_pr_check.py — pull_request() helper pattern inspired the _boring_pr() helper here
policies/platform.v0.toml — actual policy bundle (used as input)

Atomic per ADR-0017

1 file, +151 lines, 0 src changes, 0 deletions.
Re-uses existing evaluate_pull_request, PullRequestInput, CheckStatus, load_bundle — no new public surface.
base=main, no stacking on #51 (already merged).

Anti-scope (per #33 no-go)

❌ No model fixtures
❌ No giant copied real PR payloads (each fixture is <10 lines)
❌ No snapshot tests with unstable timestamps (deterministic SHAs only)
❌ No Forgejo network (file-based bundle + dataclass input only)

Test count

133 → 138 green (+5 new, all passing on first run).

Token-accounting

~2-3% weekly Opus. Could have gone to a Sonnet sub-agent, but the brief overhead (read 3 existing test files + policy TOML + understand classify_files semantics) would have matched the writing cost. Net wash.

Closes #33.

## What One new test file: `tests/test_boring_pr_lifecycle.py` (+151 lines). Pure test, zero src changes. Proves the deterministic PR gate — fed by the real `policies/platform.v0.toml` bundle — returns `eligible_clean` for safe docs/status PRs and `needs_human` for workflow / runtime / control-plane paths. **No Forgejo network calls.** File-based policy load (`load_bundle`) + in-memory `evaluate_pull_request`. ## What's covered (5 tests) ### Positive (`safe_docs_status` → `eligible_clean`) | Test | Changed files | Why | |---|---|---| | `test_docs_prefix_pr_is_eligible_clean` | `docs/roadmap.md`, `docs/operations/notes.md` | Primary dogfood path on `pdurlej/platform` | | `test_status_marker_pr_is_eligible_clean` | `state/STATUS_NOW.md`, `state/cycle/W6d-2026-05-25.md` | W6d status-marker lane (second proven path per `docs/operations/platform-dogfood.md` from #51) | ### Negative (blocked classification → `needs_human`) | Test | Changed files | Classification | |---|---|---| | `test_workflow_change_holds_for_human` | `.forgejo/workflows/patchwarden-client-dry-run.yml` | `workflow` | | `test_runtime_change_holds_for_human` | `compose/docker-compose.yml` | `runtime` | | `test_policy_governance_change_holds_for_human` | `src/patchwarden/pr_check.py` | `policy_governance` | Each negative asserts: `exit_code == 1`, `verdict == "needs_human"`, `would_auto_merge_later == False`. The workflow case also asserts `manual_pr_class` in blockers. ## Why this matters This is the **calibration baseline** for the W6d-automerge-calibration lane. If these five tests fail, the dogfood loop documented in `docs/operations/platform-dogfood.md` (PR #51) cannot be trusted — a regression in `policies/platform.v0.toml`, `policy_bundle.py`, or `pr_check.py` that silently flips a verdict would slip through without these end-to-end fixtures catching it. **Existing `tests/test_pr_check.py` uses `DEFAULT_BUNDLE` (in-memory, hardcoded).** This new file uses `load_bundle(Path("policies/platform.v0.toml"))` — the actual TOML the dry-run workflow loads in production. Different failure mode, different coverage. Complementary, not redundant. ## Spec sources (per codex issue #33) - `tests/test_pipeline.py` — pipeline shape (consulted, not edited) - `tests/test_pr_check.py` — `pull_request()` helper pattern inspired the `_boring_pr()` helper here - `policies/platform.v0.toml` — actual policy bundle (used as input) ## Atomic per ADR-0017 - 1 file, +151 lines, 0 src changes, 0 deletions. - Re-uses existing `evaluate_pull_request`, `PullRequestInput`, `CheckStatus`, `load_bundle` — no new public surface. - `base=main`, no stacking on #51 (already merged). ## Anti-scope (per #33 no-go) - ❌ No model fixtures - ❌ No giant copied real PR payloads (each fixture is <10 lines) - ❌ No snapshot tests with unstable timestamps (deterministic SHAs only) - ❌ No Forgejo network (file-based bundle + dataclass input only) ## Test count **133 → 138 green** (+5 new, all passing on first run). ## Token-accounting ~2-3% weekly Opus. Could have gone to a Sonnet sub-agent, but the brief overhead (read 3 existing test files + policy TOML + understand classify_files semantics) would have matched the writing cost. Net wash. Closes #33.

claude added 1 commit

2026-05-27 12:16:10 +02:00

test(v0): add boring PR lifecycle fixture (closes #33 ) b916a236f8

New test file proving the deterministic PR gate, fed by the real
`policies/platform.v0.toml` bundle, returns `eligible_clean` for safe
docs/status PRs and `needs_human` for workflow / runtime / control-plane
paths. Pure file-based policy load + in-memory evaluation — zero
Forgejo network calls.

## What's covered (5 tests)

**Positive (safe_docs_status → eligible_clean):**
- `docs/` prefix — primary dogfood path on `pdurlej/platform`
- `state/` prefix — W6d status-marker lane (second proven path per
  `docs/operations/platform-dogfood.md`)

**Negative (blocked classification → needs_human):**
- `.forgejo/workflows/` → classification: `workflow`
- `compose/` → classification: `runtime`
- `src/patchwarden/` → classification: `policy_governance`

Each negative asserts:
- exit_code == 1
- verdict == "needs_human"
- would_auto_merge_later == False
- (for workflow) blockers include `manual_pr_class`

## Why this matters

This is the calibration baseline for the W6d-automerge-calibration
lane. If these five tests fail, the dogfood loop documented in
`docs/operations/platform-dogfood.md` cannot be trusted — a regression
in `policies/platform.v0.toml`, `policy_bundle.py`, or `pr_check.py`
that silently flips a verdict would slip through without these
end-to-end fixtures catching it.

The existing `tests/test_pr_check.py` uses `DEFAULT_BUNDLE` (in-memory).
This new file uses `load_bundle(Path("policies/platform.v0.toml"))` —
the actual TOML the dry-run workflow loads in production. Different
failure mode, different coverage.

## Atomic per ADR-0017

- 1 file, +131 lines, 0 src changes, 0 deletions.
- Re-uses existing `evaluate_pull_request`, `PullRequestInput`,
  `CheckStatus`, `load_bundle` — no new public surface.
- `base=main`, no stacking on #51 (already merged).

## Test impact

133 → 138 green (+5 new, all passing on first run).

## Spec sources per codex issue #33

- `tests/test_pipeline.py` — pipeline shape (consulted, not edited)
- `tests/test_pr_check.py` — PR check pattern (helper shape inspired by)
- `policies/platform.v0.toml` — actual policy bundle (used as input)

## Anti-scope

- ❌ No model fixtures (per #33 no-go).
- ❌ No giant copied real PR payloads (each fixture: <10 lines).
- ❌ No snapshot tests with unstable timestamps (deterministic SHAs only).
- ❌ No Forgejo network (file-based bundle load + dataclass input only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>