Dogfood evidence: real-world precision audit (10 repos, 9,771 findings) #113
No reviewers
Labels
No labels
area:ci
area:docs
area:engineering
area:framework-fp
area:test-coverage
dogfood:fn
dogfood:fp
dogfood:friction
dogfood:tp
phase:b
phase:c
severity:critical
severity:high
severity:low
severity:medium
source:deepseek-v4-pro
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
pdurlej/fallow-py!113
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "claude/dogfood-real-world-precision-2026-06-15"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Canary Context Pack
Product story
ADR 0008 gates Phase B/C on real-world dogfood evidence. The 2026-06-10 readout recorded the blocker plainly: "0 parsed report artifacts" and "do not claim validated real-world agent workflow impact yet." This PR closes that gap with the first measured precision number on real code.
What changed
docs/dogfood/real-world-precision-2026-06-15.md— a full precision audit:fallow-py 0.3.0a3run over the 10 pinned real-world repos inbenchmarks/soak/repos.toml(4,916 py files → 9,771 findings), with a 236-finding stratified sample adjudicated true-positive / false-positive under a conservative, rule-aware lens.docs/dogfood-evidence-status.md— appended a "2026-06-15 Readout" section + an operator action item pointing at the four evidence-derived Phase B tickets.No analyzer, CI, packaging, or config changes. Pure evidence documentation.
Why it changed
This is anti-AI-slop posture (ADR 0006) executed literally: the evidence is measured on real repos, not imagined. It does not work any Phase B/C ticket — it produces the data that triages them (ADR 0008 § Triage trigger).
Key findings (full detail in the report)
missing-runtime-dependency(the onlyblockingrule) at 3% precision — fails CI onTYPE_CHECKING/try-except-guarded imports. Adoption-breaker.requirements*.txt-as-runtime, local-module-as-distribution.Runtime evidence
python3 -m compileall -q src tests mcp/src mcp/tests(no code touched; sanity)benchmarks/soak/repos.tomlKnown constraints / honest limitations
Explicit out-of-scope
benchmarks/fp-cases/fixtures yet (follow-up after second-reviewer pass).Requested decision
approve_mergeif the evidence is sound and the readout is a fair, honest characterization. Block on overclaiming, methodology errors, or anything that contradicts ADR 0006 / 0008.Merge blockers
Overclaimed precision, an FP/TP misjudgment that changes a headline, or a readout that claims more than the evidence supports.
Refs: ADR 0006 (anti-slop), ADR 0008 (evidence-gated Phase B/C).
Codex is inspecting this PR and related Claude-created issues. No approval/request-changes decision yet.
Correction: the previous inspection marker was posted prematurely. Treat it as a no-op status note, not a review decision.
approve_merge.
Reviewed after the factual correction commit. The previously found corpus-size mismatch is fixed consistently in the report and PR body: 4,916 Python files and 9,771 findings. Local gates passed before push:
python3 -m pytest -q,python3 -m compileall -q src tests mcp/src mcp/tests, andpython3 -m fallow_py analyze --root . --fail-on warning --min-confidence medium --format text.Remaining gate is Forgejo CI completion on head
56899cab83201d7dce74fd36f3c6cb6b32c99a9c; no review blocker remains.