Make dogfood evidence classification explicit #54

Merged
pdurlej merged 2 commits from codex/dogfood-evidence-hygiene into main 2026-05-16 13:28:40 +02:00
Collaborator

Canary Context Pack

Product story

Dogfood logs are supposed to teach us from real fallow-py usage. If the aggregator guesses action-policy buckets from plain JSON severity, the evidence can look cleaner or more actionable than it really is.

What changed

  • Plain JSON issues without an explicit classification, decision, or group are now counted as unclassified.
  • Explicit classification fields are still honored, including future decision_needed and hyphenated aliases like decision-needed.
  • Weekly JSON/Markdown output now reports classified, unclassified, operator-attention, and warning counts.
  • docs/dogfood.md no longer claims a TestPyPI alpha was already smoke-tested; it documents when to pin published artifacts and recommends agent-fix-plan reports for dogfood evidence.

Why it changed

GPT-5.5 Pro roadmap review correctly flagged that severity=info -> auto_safe was an unsafe evidence shortcut. This PR makes uncertainty visible instead of smuggling policy into telemetry.

Files touched

  • scripts/dogfood/aggregate_evidence.py
  • tests/test_dogfood_aggregator.py
  • docs/dogfood.md

Relevant context

  • ADR 0008: evidence-bounded dogfood window
  • docs/dogfood.md: integration and evidence collection workflow
  • Follow-up to PR #52 dogfood evidence aggregator

Runtime evidence

  • python3.13 -m pytest -q tests/test_dogfood_aggregator.py
  • python3.13 scripts/dogfood/aggregate_evidence.py --repo pdurlej/fallow-py --runs-limit 3 --output /tmp/fallow-dogfood.md --json-output /tmp/fallow-dogfood.json
  • python3.13 -m compileall -q src tests mcp/src mcp/tests scripts/dogfood
  • python3.13 -m pytest -q
  • PYTHONPATH=src python3.13 -m fallow_py analyze --root . --fail-on warning --min-confidence medium
  • PYTHONPATH=src:mcp/src python3.13 -m fallow_py analyze --root mcp --fail-on warning --min-confidence medium
  • git diff --check

Known constraints

This keeps both current review_needed and planned decision_needed names readable so the aggregator survives the classification-contract migration.

Explicit out-of-scope

  • No analyzer classification changes.
  • No MCP schema changes.
  • No new dogfood outcome taxonomy.
  • No TestPyPI upload or release claim.

Requested decision

Approve if the aggregator now reflects uncertainty honestly and the docs stop overclaiming release state.

Merge blockers

  • Plain JSON findings are still auto-classified by severity/confidence.
  • Markdown/JSON summaries hide unclassified evidence.
  • Docs claim an unpublished or unsmoked alpha is tested.
## Canary Context Pack ### Product story Dogfood logs are supposed to teach us from real fallow-py usage. If the aggregator guesses action-policy buckets from plain JSON severity, the evidence can look cleaner or more actionable than it really is. ### What changed - Plain JSON `issues` without an explicit `classification`, `decision`, or `group` are now counted as `unclassified`. - Explicit classification fields are still honored, including future `decision_needed` and hyphenated aliases like `decision-needed`. - Weekly JSON/Markdown output now reports classified, unclassified, operator-attention, and warning counts. - `docs/dogfood.md` no longer claims a TestPyPI alpha was already smoke-tested; it documents when to pin published artifacts and recommends `agent-fix-plan` reports for dogfood evidence. ### Why it changed GPT-5.5 Pro roadmap review correctly flagged that `severity=info -> auto_safe` was an unsafe evidence shortcut. This PR makes uncertainty visible instead of smuggling policy into telemetry. ### Files touched - `scripts/dogfood/aggregate_evidence.py` - `tests/test_dogfood_aggregator.py` - `docs/dogfood.md` ### Relevant context - ADR 0008: evidence-bounded dogfood window - `docs/dogfood.md`: integration and evidence collection workflow - Follow-up to PR #52 dogfood evidence aggregator ### Runtime evidence - `python3.13 -m pytest -q tests/test_dogfood_aggregator.py` - `python3.13 scripts/dogfood/aggregate_evidence.py --repo pdurlej/fallow-py --runs-limit 3 --output /tmp/fallow-dogfood.md --json-output /tmp/fallow-dogfood.json` - `python3.13 -m compileall -q src tests mcp/src mcp/tests scripts/dogfood` - `python3.13 -m pytest -q` - `PYTHONPATH=src python3.13 -m fallow_py analyze --root . --fail-on warning --min-confidence medium` - `PYTHONPATH=src:mcp/src python3.13 -m fallow_py analyze --root mcp --fail-on warning --min-confidence medium` - `git diff --check` ### Known constraints This keeps both current `review_needed` and planned `decision_needed` names readable so the aggregator survives the classification-contract migration. ### Explicit out-of-scope - No analyzer classification changes. - No MCP schema changes. - No new dogfood outcome taxonomy. - No TestPyPI upload or release claim. ### Requested decision Approve if the aggregator now reflects uncertainty honestly and the docs stop overclaiming release state. ### Merge blockers - Plain JSON findings are still auto-classified by severity/confidence. - Markdown/JSON summaries hide unclassified evidence. - Docs claim an unpublished or unsmoked alpha is tested.
Make dogfood evidence classification explicit
All checks were successful
CI / Python 3.11 (push) Successful in 57s
CI / Python 3.12 (push) Successful in 1m0s
CI / Python 3.13 (push) Successful in 1m2s
CI / Python 3.11 (pull_request) Successful in 58s
CI / Python 3.12 (pull_request) Successful in 1m0s
CI / Python 3.13 (pull_request) Successful in 59s
badec90440
Dogfood evidence aggregation now treats plain JSON findings without agent-fix-plan buckets as unclassified instead of guessing action policy from severity. Explicit classification fields are still honored, including decision-needed compatibility.

Verified:

- python3.13 -m pytest -q tests/test_dogfood_aggregator.py

- python3.13 scripts/dogfood/aggregate_evidence.py --repo pdurlej/fallow-py --runs-limit 3 --output /tmp/fallow-dogfood.md --json-output /tmp/fallow-dogfood.json

- python3.13 -m compileall -q src tests mcp/src mcp/tests scripts/dogfood

- python3.13 -m pytest -q

- PYTHONPATH=src python3.13 -m fallow_py analyze --root . --fail-on warning --min-confidence medium

- PYTHONPATH=src:mcp/src python3.13 -m fallow_py analyze --root mcp --fail-on warning --min-confidence medium

- git diff --check
pdurlej approved these changes 2026-05-16 13:18:49 +02:00
pdurlej scheduled this pull request to auto merge when all checks succeed 2026-05-16 13:18:53 +02:00
Merge branch 'main' into codex/dogfood-evidence-hygiene
All checks were successful
CI / Python 3.11 (push) Successful in 56s
CI / Python 3.12 (push) Successful in 1m1s
CI / Python 3.13 (push) Successful in 1m0s
CI / Python 3.11 (pull_request) Successful in 56s
CI / Python 3.12 (pull_request) Successful in 1m2s
CI / Python 3.13 (pull_request) Successful in 58s
2a8efd26ee
Sign in to join this conversation.
No description provided.