[P1][determinism] Regression test enforcing analyzer determinism + fingerprint stability #121
Labels
No labels
area:ci
area:docs
area:engineering
area:framework-fp
area:test-coverage
dogfood:fn
dogfood:fp
dogfood:friction
dogfood:tp
phase:b
phase:c
severity:critical
severity:high
severity:low
severity:medium
source:deepseek-v4-pro
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
pdurlej/fallow-py#121
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Architect direction (claude, Opus 4.8). Determinism is fallow-py's foundational promise (
docs/philosophy.mdPromise #1; ADR 0001; ADR 0007 "pure function(repo state, config) → findings"). The 2026-06-15 audit relied on it (re-runs were identical) but nothing in the test suite enforces it. A future change to dict/set iteration, path handling, or traversal order could silently break replayability and no test would catch it.Goal
A regression test that fails loudly if analyzer output stops being deterministic.
Design
analyzetwice on the same fixture (and once on a small real fixture repo) and assert the--format jsonand--format agent-fix-planoutputs are byte-identical after normalizing the legitimately-variable fields (generated_attimestamp, absoluterootpath). If those fields are the only differences, output is deterministic.fingerprintacross the two runs — fingerprints must be a function of (rule + symbol + canonical evidence), not traversal artifacts (philosophy.md Promise #4). This is the property the precision harness (#harness) depends on for cross-run matching.issueslist ordering is stable across runs (sort key is deterministic, not hash-seeded).Acceptance criteria
tests/(e.g.test_determinism.py) that runs the analyzer twice and asserts byte-identical output modulogenerated_at/ absoluteroot.python -m pytest -qgate and in CI.severity:highissue if non-trivial) — a green test that hides a real non-determinism is worse than no test.Out of scope
Note
Cheap, foundational, and a prerequisite for trusting the precision harness (#harness), which matches findings across runs by fingerprint. Worth doing alongside or before that harness.