ADR 0013: blocking severity requires measured precision evidence (+ golden seed) #122

Open
claude wants to merge 1 commit from claude/adr-0013-blocking-precision into main
Collaborator

Canary Context Pack

Product story

The 2026-06-15 precision audit (PR #113) measured the only blocking rule, missing-runtime-dependency, at 3% precision — it fails CI on TYPE_CHECKING / try-except-guarded imports. A blocking rule at 3% precision violates ADR 0007's conservative-classification promise. The operator's direction: make blocking a function of measured precision — and make the enforcement real, not theater.

What changed

  • decisions/0013-blocking-requires-precision-evidence.md — ADR establishing that a rule may carry blocking severity only if its measured precision (via the precision harness) is ≥ 90% over ≥ 15 adjudicated findings, enforced by a required CI test (the blocking-precision contract). Immediate consequence: missing-runtime-dependency downgraded blocking → decision_needed until #115/#117 raise its precision past the bar.
  • benchmarks/precision/golden-adjudication-2026-06-15.json — the frozen ground truth: 236 audit verdicts (157 fp / 23 tp / 56 opinion), each keyed by stable fingerprint for cross-run matching. This is the seed the precision harness (#119) and the contract test (B-bis) consume.
  • benchmarks/precision/README.md — provenance + how the golden set is used + refresh cadence.
  • decisions/README.md — index updated.

No analyzer/CI/config code changes in this PR — it records the decision and lands the golden seed. The enforcement implementation is tracked as the B-bis issue (depends on harness #119).

Why it changed

This is the classification layer's application of ADR 0006 (evidence over imagination): severity stops being an author's declaration and becomes a measured, mechanically-enforced property. The operator named the failure mode to avoid directly — "a principle without enforcement is theater" — so the ADR is paired with a named CI gate, not left as prose.

Runtime evidence

  • python3 -c "import json; json.load(open('benchmarks/precision/golden-adjudication-2026-06-15.json'))" — valid, 236 verdicts, all fingerprinted
  • No code touched; analyzer behavior unchanged by this PR (the downgrade is implemented in the B-bis issue)

Known constraints

  • The contract depends on the precision harness (#119) producing precision-snapshot.json. Until #119 + B-bis land, ADR 0013 is accepted-but-not-yet-mechanically-enforced; the golden seed here unblocks both.
  • The 90% / 15-finding thresholds are initial judgment calls (see ADR § Consequences); revisable via a superseding edit note.

Explicit out-of-scope

  • The contract test implementation + the missing-runtime-dependency downgrade (→ B-bis issue).
  • The precision harness itself (→ #119).

Requested decision

approve_merge if ADR 0013's principle + enforcement design are sound and the golden seed is correct. Block on a threshold you disagree with, or if the enforcement mechanism is not actually enforceable as described.

Merge blockers

An enforcement design that is in fact theater (no real CI gate), a wrong golden verdict that changes a headline, or a threshold the operator rejects.

Refs: ADR 0006/0007/0008/0009, audit PR #113, harness #119, fixes #115/#117.

## Canary Context Pack ### Product story The 2026-06-15 precision audit (PR #113) measured the only `blocking` rule, `missing-runtime-dependency`, at **3% precision** — it fails CI on `TYPE_CHECKING` / `try-except`-guarded imports. A `blocking` rule at 3% precision violates ADR 0007's conservative-classification promise. The operator's direction: make `blocking` a function of measured precision — **and make the enforcement real, not theater.** ### What changed - **`decisions/0013-blocking-requires-precision-evidence.md`** — ADR establishing that a rule may carry `blocking` severity only if its measured precision (via the precision harness) is ≥ 90% over ≥ 15 adjudicated findings, **enforced by a required CI test** (the blocking-precision contract). Immediate consequence: `missing-runtime-dependency` downgraded `blocking → decision_needed` until #115/#117 raise its precision past the bar. - **`benchmarks/precision/golden-adjudication-2026-06-15.json`** — the frozen ground truth: 236 audit verdicts (157 fp / 23 tp / 56 opinion), each keyed by stable `fingerprint` for cross-run matching. This is the seed the precision harness (#119) and the contract test (B-bis) consume. - **`benchmarks/precision/README.md`** — provenance + how the golden set is used + refresh cadence. - `decisions/README.md` — index updated. No analyzer/CI/config code changes in this PR — it records the decision and lands the golden seed. The enforcement implementation is tracked as the B-bis issue (depends on harness #119). ### Why it changed This is the classification layer's application of ADR 0006 (evidence over imagination): severity stops being an author's declaration and becomes a measured, mechanically-enforced property. The operator named the failure mode to avoid directly — "a principle without enforcement is theater" — so the ADR is paired with a named CI gate, not left as prose. ### Runtime evidence - `python3 -c "import json; json.load(open('benchmarks/precision/golden-adjudication-2026-06-15.json'))"` — valid, 236 verdicts, all fingerprinted - No code touched; analyzer behavior unchanged by this PR (the downgrade is implemented in the B-bis issue) ### Known constraints - The contract depends on the precision harness (#119) producing `precision-snapshot.json`. Until #119 + B-bis land, ADR 0013 is accepted-but-not-yet-mechanically-enforced; the golden seed here unblocks both. - The 90% / 15-finding thresholds are initial judgment calls (see ADR § Consequences); revisable via a superseding edit note. ### Explicit out-of-scope - The contract test implementation + the `missing-runtime-dependency` downgrade (→ B-bis issue). - The precision harness itself (→ #119). ### Requested decision `approve_merge` if ADR 0013's principle + enforcement design are sound and the golden seed is correct. Block on a threshold you disagree with, or if the enforcement mechanism is not actually enforceable as described. ### Merge blockers An enforcement design that is in fact theater (no real CI gate), a wrong golden verdict that changes a headline, or a threshold the operator rejects. Refs: ADR 0006/0007/0008/0009, audit PR #113, harness #119, fixes #115/#117.
decisions: ADR 0013 (blocking requires precision evidence) + golden seed
All checks were successful
CI / Python 3.11 (push) Successful in 54s
CI / Python 3.12 (push) Successful in 58s
CI / Python 3.13 (push) Successful in 58s
CI / Python 3.11 (pull_request) Successful in 58s
CI / Python 3.12 (pull_request) Successful in 1m1s
CI / Python 3.13 (pull_request) Successful in 58s
cfbc993f11
Records the operator decision (option C): a rule may carry `blocking`
severity only if measured precision >= 90% over >= 15 adjudicated
findings, enforced by a required CI test (the blocking-precision
contract, tracked as B-bis). Immediate consequence: missing-runtime-
dependency downgraded blocking -> decision_needed at 3% precision until
#115/#117 raise it.

Lands the frozen ground truth the harness (#119) + contract consume:
benchmarks/precision/golden-adjudication-2026-06-15.json (236 verdicts,
fingerprint-keyed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
All checks were successful
CI / Python 3.11 (push) Successful in 54s
CI / Python 3.12 (push) Successful in 58s
CI / Python 3.13 (push) Successful in 58s
CI / Python 3.11 (pull_request) Successful in 58s
Required
Details
CI / Python 3.12 (pull_request) Successful in 1m1s
Required
Details
CI / Python 3.13 (pull_request) Successful in 58s
Required
Details
This pull request is blocked because it's outdated.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin claude/adr-0013-blocking-precision:claude/adr-0013-blocking-precision
git switch claude/adr-0013-blocking-precision
Sign in to join this conversation.
No description provided.