test(v0): add boring PR lifecycle fixture (closes #33) #52
No reviewers
Labels
No labels
agent/claude-code
agent/codex
agent/gemini
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
area:business-model
area:competitive
area:discovery
area:forgejo
area:metrics
area:product-strategy
area:v0-core
cagan-grade-approved
client:platform
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
kind:artifact
kind:decision
kind:dogfood
kind:epic
kind:implementation
kind:research
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
priority:p0
priority:p1
priority:p2
priority:p3
ready-for-agent
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:blocked-on-discovery
status:cagan-grade-review-pending
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:needs-operator-decision
status:operator-needed
status:parked
tier:0-anchor
tier:0-platform-substrate
tier:1-core
tier:1-iskra-value-layer
tier:2-supporting
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
wave:1-foundation
wave:2-positioning
wave:3-validation
wave:4-economics
wave:5-operating
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
pdurlej/patchwarden!52
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "claude/patchwarden-boring-pr-lifecycle-fixture"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
One new test file:
tests/test_boring_pr_lifecycle.py(+151 lines). Pure test, zero src changes. Proves the deterministic PR gate — fed by the realpolicies/platform.v0.tomlbundle — returnseligible_cleanfor safe docs/status PRs andneeds_humanfor workflow / runtime / control-plane paths.No Forgejo network calls. File-based policy load (
load_bundle) + in-memoryevaluate_pull_request.What's covered (5 tests)
Positive (
safe_docs_status→eligible_clean)test_docs_prefix_pr_is_eligible_cleandocs/roadmap.md,docs/operations/notes.mdpdurlej/platformtest_status_marker_pr_is_eligible_cleanstate/STATUS_NOW.md,state/cycle/W6d-2026-05-25.mddocs/operations/platform-dogfood.mdfrom #51)Negative (blocked classification →
needs_human)test_workflow_change_holds_for_human.forgejo/workflows/patchwarden-client-dry-run.ymlworkflowtest_runtime_change_holds_for_humancompose/docker-compose.ymlruntimetest_policy_governance_change_holds_for_humansrc/patchwarden/pr_check.pypolicy_governanceEach negative asserts:
exit_code == 1,verdict == "needs_human",would_auto_merge_later == False. The workflow case also assertsmanual_pr_classin blockers.Why this matters
This is the calibration baseline for the W6d-automerge-calibration lane. If these five tests fail, the dogfood loop documented in
docs/operations/platform-dogfood.md(PR #51) cannot be trusted — a regression inpolicies/platform.v0.toml,policy_bundle.py, orpr_check.pythat silently flips a verdict would slip through without these end-to-end fixtures catching it.Existing
tests/test_pr_check.pyusesDEFAULT_BUNDLE(in-memory, hardcoded). This new file usesload_bundle(Path("policies/platform.v0.toml"))— the actual TOML the dry-run workflow loads in production. Different failure mode, different coverage. Complementary, not redundant.Spec sources (per codex issue #33)
tests/test_pipeline.py— pipeline shape (consulted, not edited)tests/test_pr_check.py—pull_request()helper pattern inspired the_boring_pr()helper herepolicies/platform.v0.toml— actual policy bundle (used as input)Atomic per ADR-0017
evaluate_pull_request,PullRequestInput,CheckStatus,load_bundle— no new public surface.base=main, no stacking on #51 (already merged).Anti-scope (per #33 no-go)
Test count
133 → 138 green (+5 new, all passing on first run).
Token-accounting
~2-3% weekly Opus. Could have gone to a Sonnet sub-agent, but the brief overhead (read 3 existing test files + policy TOML + understand classify_files semantics) would have matched the writing cost. Net wash.
Closes #33.
New test file proving the deterministic PR gate, fed by the real `policies/platform.v0.toml` bundle, returns `eligible_clean` for safe docs/status PRs and `needs_human` for workflow / runtime / control-plane paths. Pure file-based policy load + in-memory evaluation — zero Forgejo network calls. ## What's covered (5 tests) **Positive (safe_docs_status → eligible_clean):** - `docs/` prefix — primary dogfood path on `pdurlej/platform` - `state/` prefix — W6d status-marker lane (second proven path per `docs/operations/platform-dogfood.md`) **Negative (blocked classification → needs_human):** - `.forgejo/workflows/` → classification: `workflow` - `compose/` → classification: `runtime` - `src/patchwarden/` → classification: `policy_governance` Each negative asserts: - exit_code == 1 - verdict == "needs_human" - would_auto_merge_later == False - (for workflow) blockers include `manual_pr_class` ## Why this matters This is the calibration baseline for the W6d-automerge-calibration lane. If these five tests fail, the dogfood loop documented in `docs/operations/platform-dogfood.md` cannot be trusted — a regression in `policies/platform.v0.toml`, `policy_bundle.py`, or `pr_check.py` that silently flips a verdict would slip through without these end-to-end fixtures catching it. The existing `tests/test_pr_check.py` uses `DEFAULT_BUNDLE` (in-memory). This new file uses `load_bundle(Path("policies/platform.v0.toml"))` — the actual TOML the dry-run workflow loads in production. Different failure mode, different coverage. ## Atomic per ADR-0017 - 1 file, +131 lines, 0 src changes, 0 deletions. - Re-uses existing `evaluate_pull_request`, `PullRequestInput`, `CheckStatus`, `load_bundle` — no new public surface. - `base=main`, no stacking on #51 (already merged). ## Test impact 133 → 138 green (+5 new, all passing on first run). ## Spec sources per codex issue #33 - `tests/test_pipeline.py` — pipeline shape (consulted, not edited) - `tests/test_pr_check.py` — PR check pattern (helper shape inspired by) - `policies/platform.v0.toml` — actual policy bundle (used as input) ## Anti-scope - ❌ No model fixtures (per #33 no-go). - ❌ No giant copied real PR payloads (each fixture: <10 lines). - ❌ No snapshot tests with unstable timestamps (deterministic SHAs only). - ❌ No Forgejo network (file-based bundle load + dataclass input only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>