Add dogfood evidence cockpit #59

Merged
pdurlej merged 2 commits from codex/dogfood-evidence-cockpit into main 2026-05-19 00:40:50 +02:00
Collaborator

Canary Context Pack

Product story

Dogfood evidence should flow back to the operator without turning into a manual artifact-reading chore. The useful question is not only "did CI pass?" but "is fallow-py helping us, making noise, or revealing recurring patterns?"

What changed

  • Extended scripts/dogfood/aggregate_evidence.py with local dogfood log ingestion via --dogfood-log.
  • Added cockpit counters for run events, runs by repo/status, log categories/rules/repos, friction signals, top recurring fingerprints, and evidence-gate progress.
  • Added an Owner Action Board to Markdown and JSON output.
  • Updated dogfood docs/templates to show gitignored log ingestion and explain the cockpit summary.
  • Added regression tests for log parsing, cockpit JSON, Owner Action Board Markdown, top fingerprints, and CLI log input.

Why it changed

The operator explicitly wants fallow-py to learn from local use without needing to remember every manual observation. This turns .codex/DOGFOOD-LOG.md plus Forgejo run metadata into a weekly operator-readable evidence summary.

Files touched

  • scripts/dogfood/aggregate_evidence.py
  • tests/test_dogfood_aggregator.py
  • docs/dogfood.md
  • docs/dogfood-evidence-status.md
  • docs/dogfood-log-template.md

Relevant context

  • ADR 0008: evidence-bounded dogfood window.
  • PR #52: initial dogfood evidence aggregator.
  • PR #54: classification-explicit evidence hygiene.
  • Fork pack Domain B: Evidence Cockpit.

Runtime evidence

  • python3.13 -m py_compile scripts/dogfood/aggregate_evidence.py
  • python3.13 -m pytest -q tests/test_dogfood_aggregator.py
  • python3.13 scripts/dogfood/aggregate_evidence.py --repo pdurlej/fallow-py --runs-limit 10 --dogfood-log /Users/pd/Developer/fallow-python/.codex/DOGFOOD-LOG.md --output /tmp/fallow-domain-b-cockpit.md --json-output /tmp/fallow-domain-b-cockpit.json
  • python3.13 -m compileall -q src tests mcp/src mcp/tests scripts/dogfood
  • python3.13 -m pytest -q
  • PYTHONPATH=src python3.13 -m fallow_py analyze --root . --fail-on warning --min-confidence medium
  • PYTHONPATH=src:mcp/src python3.13 -m fallow_py analyze --root mcp --fail-on warning --min-confidence medium
  • git diff --check

Known constraints

  • The aggregator reads local log metadata from Markdown headings/fields; raw log text stays private and gitignored.
  • The evidence threshold remains advisory; operator qualitative read still matters.
  • Current and future classification names remain accepted by the aggregator.

Explicit out-of-scope

  • No web dashboard.
  • No analyzer rule changes.
  • No MCP schema changes.
  • No model benchmark execution.
  • No public claims based on incomplete evidence.

Requested decision

Approve if the cockpit makes dogfood evidence easier to consume without hiding uncertainty or adding product claims.

Merge blockers

  • Aggregator misclassifies plain JSON findings by guessing policy.
  • Owner Action Board gives unsafe/overconfident advice.
  • Local dogfood logs would need to be committed to work.
  • The branch includes unrelated Domain D/E changes.
## Canary Context Pack ### Product story Dogfood evidence should flow back to the operator without turning into a manual artifact-reading chore. The useful question is not only "did CI pass?" but "is fallow-py helping us, making noise, or revealing recurring patterns?" ### What changed - Extended `scripts/dogfood/aggregate_evidence.py` with local dogfood log ingestion via `--dogfood-log`. - Added cockpit counters for run events, runs by repo/status, log categories/rules/repos, friction signals, top recurring fingerprints, and evidence-gate progress. - Added an Owner Action Board to Markdown and JSON output. - Updated dogfood docs/templates to show gitignored log ingestion and explain the cockpit summary. - Added regression tests for log parsing, cockpit JSON, Owner Action Board Markdown, top fingerprints, and CLI log input. ### Why it changed The operator explicitly wants fallow-py to learn from local use without needing to remember every manual observation. This turns `.codex/DOGFOOD-LOG.md` plus Forgejo run metadata into a weekly operator-readable evidence summary. ### Files touched - `scripts/dogfood/aggregate_evidence.py` - `tests/test_dogfood_aggregator.py` - `docs/dogfood.md` - `docs/dogfood-evidence-status.md` - `docs/dogfood-log-template.md` ### Relevant context - ADR 0008: evidence-bounded dogfood window. - PR #52: initial dogfood evidence aggregator. - PR #54: classification-explicit evidence hygiene. - Fork pack Domain B: Evidence Cockpit. ### Runtime evidence - `python3.13 -m py_compile scripts/dogfood/aggregate_evidence.py` - `python3.13 -m pytest -q tests/test_dogfood_aggregator.py` - `python3.13 scripts/dogfood/aggregate_evidence.py --repo pdurlej/fallow-py --runs-limit 10 --dogfood-log /Users/pd/Developer/fallow-python/.codex/DOGFOOD-LOG.md --output /tmp/fallow-domain-b-cockpit.md --json-output /tmp/fallow-domain-b-cockpit.json` - `python3.13 -m compileall -q src tests mcp/src mcp/tests scripts/dogfood` - `python3.13 -m pytest -q` - `PYTHONPATH=src python3.13 -m fallow_py analyze --root . --fail-on warning --min-confidence medium` - `PYTHONPATH=src:mcp/src python3.13 -m fallow_py analyze --root mcp --fail-on warning --min-confidence medium` - `git diff --check` ### Known constraints - The aggregator reads local log metadata from Markdown headings/fields; raw log text stays private and gitignored. - The evidence threshold remains advisory; operator qualitative read still matters. - Current and future classification names remain accepted by the aggregator. ### Explicit out-of-scope - No web dashboard. - No analyzer rule changes. - No MCP schema changes. - No model benchmark execution. - No public claims based on incomplete evidence. ### Requested decision Approve if the cockpit makes dogfood evidence easier to consume without hiding uncertainty or adding product claims. ### Merge blockers - Aggregator misclassifies plain JSON findings by guessing policy. - Owner Action Board gives unsafe/overconfident advice. - Local dogfood logs would need to be committed to work. - The branch includes unrelated Domain D/E changes.
Add dogfood evidence cockpit
All checks were successful
CI / Python 3.11 (push) Successful in 1m0s
CI / Python 3.12 (push) Successful in 1m5s
CI / Python 3.13 (push) Successful in 1m2s
CI / Python 3.11 (pull_request) Successful in 58s
CI / Python 3.12 (pull_request) Successful in 1m4s
CI / Python 3.13 (pull_request) Successful in 1m3s
c679796a57
Extend the dogfood aggregator into an operator-readable cockpit: local dogfood log ingestion, run breakdowns, top recurring fingerprints, evidence-gate counters, and an Owner Action Board in Markdown/JSON outputs.

Verified:

- python3.13 -m py_compile scripts/dogfood/aggregate_evidence.py

- python3.13 -m pytest -q tests/test_dogfood_aggregator.py

- python3.13 scripts/dogfood/aggregate_evidence.py --repo pdurlej/fallow-py --runs-limit 10 --dogfood-log /Users/pd/Developer/fallow-python/.codex/DOGFOOD-LOG.md --output /tmp/fallow-domain-b-cockpit.md --json-output /tmp/fallow-domain-b-cockpit.json

- python3.13 -m compileall -q src tests mcp/src mcp/tests scripts/dogfood

- python3.13 -m pytest -q

- PYTHONPATH=src python3.13 -m fallow_py analyze --root . --fail-on warning --min-confidence medium

- PYTHONPATH=src:mcp/src python3.13 -m fallow_py analyze --root mcp --fail-on warning --min-confidence medium

- git diff --check
Merge remote-tracking branch 'origin/main' into codex/dogfood-evidence-cockpit
All checks were successful
CI / Python 3.11 (push) Successful in 58s
CI / Python 3.12 (push) Successful in 1m2s
CI / Python 3.13 (push) Successful in 1m0s
CI / Python 3.11 (pull_request) Successful in 57s
CI / Python 3.12 (pull_request) Successful in 1m0s
CI / Python 3.13 (pull_request) Successful in 59s
cf2fe78eca
pdurlej scheduled this pull request to auto merge when all checks succeed 2026-05-19 00:40:43 +02:00
pdurlej approved these changes 2026-05-19 00:40:47 +02:00
Sign in to join this conversation.
No description provided.