[B-bis][enforcement] Blocking-precision contract test — make ADR 0013 non-theater #123
Labels
No labels
area:ci
area:docs
area:engineering
area:framework-fp
area:test-coverage
dogfood:fn
dogfood:fp
dogfood:friction
dogfood:tp
phase:b
phase:c
severity:critical
severity:high
severity:low
severity:medium
source:deepseek-v4-pro
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
pdurlej/fallow-py#123
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Architect direction (claude, Opus 4.8). This is "B-bis" — the real enforcement of ADR 0013, the thing that keeps the principle from being theater.
ADR 0013 (
decisions/0013-blocking-requires-precision-evidence.md) decided: a rule may carryblockingseverity only if its measured precision is ≥ 90% over ≥ 15 adjudicated findings. A principle in a doc enforces nothing. This issue builds the CI gate that makes severity a mechanical function of measured evidence.Depends on: #119 (precision harness — produces
benchmarks/precision/precision-snapshot.json). The golden ground truth is already committed:benchmarks/precision/golden-adjudication-2026-06-15.json.Part 1 — The blocking-precision contract test
Add
tests/test_blocking_precision_contract.py:blocking— read from the analyzer's rule definitions (fallow_py.models.RULESor wherever severity defaults live), not a hand-maintained list (the test must track reality).benchmarks/precision/precision-snapshot.json(produced by the harness, #119).blocking, the snapshot shows precision ≥ 0.90 over ≥ 15 adjudicated findings. Otherwise fail with a message naming the rule, its measured precision, and the bar — e.g.missing-runtime-dependency is configured blocking but measured 3% precision (29 FP / 30) — downgrade or fix to ≥90% before shipping blocking (ADR 0013).This test makes it impossible to ship — or regress into — a
blockingrule that has not earned it.Part 2 — The immediate enforced consequence
Apply ADR 0013 §3: change
missing-runtime-dependency's default severity/bucket fromblockingtodecision_needed.decision_needed); only its CI-failing severity is withdrawn until evidence justifies it.blocking.main.Acceptance criteria
tests/test_blocking_precision_contract.pyexists, runs underpytest -qand in CI, and reads the live rule set (not a frozen copy).missing-runtime-dependencydowngraded, the contract test is green onmain.missing-runtime-dependencytoblockingwithout fixing precision — makes the test red (prove the gate bites; can be shown in the PR description, not committed).blockingin the shipped config unless the snapshot backs it.Sequencing note
If #119 is not yet merged when this is picked up, Part 2 (the downgrade) can land first — it is a correct, ADR-mandated change on its own merits and needs no harness. Part 1 (the contract test) lands once the snapshot artifact from #119 exists. Do not skip Part 1: the downgrade without the gate is exactly the theater ADR 0013 exists to prevent.
Out of scope
missing-runtime-dependency(that happens later, automatically gated, once #115/#117 raise its precision).Architect:
claude(Opus 4.8). Decision: ADR 0013 (PR #122). Depends on harness #119.