MCP: harden cached report signatures #43

Merged
pdurlej merged 2 commits from codex/mcp-cache-signature-40 into main 2026-05-12 15:37:20 +02:00
Collaborator

Canary Context Pack

Product story

MCP agents should not receive stale pyfallow reports when a file or config changes inside the cache TTL but the filesystem metadata is too coarse to show it.

What changed

  • Added SHA-256 content digests to the existing MCP report-cache signature tuple.
  • Added regression coverage for same-size source-file changes with restored mtime.
  • Added regression coverage for same-size .pyfallow.toml changes with restored mtime.

Why it changed

DeepSeek audit triage issue #40 identified that the MCP report cache used only mtime_ns and file size. That can miss content changes on coarse filesystems or controlled metadata updates.

Files touched

  • mcp/src/pyfallow_mcp/runtime.py
  • mcp/tests/test_mcp.py

Runtime evidence

  • RED before fix: /tmp/pyfallow-verify-venv/bin/python -m pytest -q mcp/tests/test_mcp.py::test_cached_report_invalidates_same_size_source_content_change mcp/tests/test_mcp.py::test_cached_report_invalidates_same_size_config_content_change failed because cached_report() returned the stale cached object.
  • PASS after fix: same targeted MCP tests.
  • PASS: python3 -m compileall -q src tests mcp/src mcp/tests
  • PASS: git diff --check
  • PASS: python3 -m pytest -q
  • PASS: /tmp/pyfallow-verify-venv/bin/python -m pytest -q mcp/tests
  • PASS: PYTHONPATH=src python3 -m pyfallow analyze --root . --fail-on warning --min-confidence medium

Known constraints

This intentionally keeps the current cache TTL and global-cache lifecycle. The hash is computed only for paths already included in the report signature: analyzed modules, scanned source-root Python files, dependency files, and the active config file.

Explicit out-of-scope

  • Phase B issue #9 cache lifecycle, LRU, and thread-safety work.
  • Broader MCP cache architecture changes.

Requested decision

Approve if the content-aware invalidation is correct and scoped enough for #40.

Merge blockers

  • Stale report can still be returned for same-size source/config content changes under the regression tests.
  • New runtime dependency.
  • Broad cache lifecycle rewrite in this PR.

Closes #40.
Refs #35, #9.

## Canary Context Pack ### Product story MCP agents should not receive stale pyfallow reports when a file or config changes inside the cache TTL but the filesystem metadata is too coarse to show it. ### What changed - Added SHA-256 content digests to the existing MCP report-cache signature tuple. - Added regression coverage for same-size source-file changes with restored mtime. - Added regression coverage for same-size `.pyfallow.toml` changes with restored mtime. ### Why it changed DeepSeek audit triage issue #40 identified that the MCP report cache used only `mtime_ns` and file size. That can miss content changes on coarse filesystems or controlled metadata updates. ### Files touched - `mcp/src/pyfallow_mcp/runtime.py` - `mcp/tests/test_mcp.py` ### Runtime evidence - RED before fix: `/tmp/pyfallow-verify-venv/bin/python -m pytest -q mcp/tests/test_mcp.py::test_cached_report_invalidates_same_size_source_content_change mcp/tests/test_mcp.py::test_cached_report_invalidates_same_size_config_content_change` failed because `cached_report()` returned the stale cached object. - PASS after fix: same targeted MCP tests. - PASS: `python3 -m compileall -q src tests mcp/src mcp/tests` - PASS: `git diff --check` - PASS: `python3 -m pytest -q` - PASS: `/tmp/pyfallow-verify-venv/bin/python -m pytest -q mcp/tests` - PASS: `PYTHONPATH=src python3 -m pyfallow analyze --root . --fail-on warning --min-confidence medium` ### Known constraints This intentionally keeps the current cache TTL and global-cache lifecycle. The hash is computed only for paths already included in the report signature: analyzed modules, scanned source-root Python files, dependency files, and the active config file. ### Explicit out-of-scope - Phase B issue #9 cache lifecycle, LRU, and thread-safety work. - Broader MCP cache architecture changes. ### Requested decision Approve if the content-aware invalidation is correct and scoped enough for #40. ### Merge blockers - Stale report can still be returned for same-size source/config content changes under the regression tests. - New runtime dependency. - Broad cache lifecycle rewrite in this PR. Closes #40. Refs #35, #9.
MCP: harden cached report signatures
All checks were successful
CI / Python 3.11 (push) Successful in 54s
CI / Python 3.12 (push) Successful in 56s
CI / Python 3.13 (push) Successful in 57s
CI / Python 3.11 (pull_request) Successful in 53s
CI / Python 3.12 (pull_request) Successful in 54s
CI / Python 3.13 (pull_request) Successful in 56s
9aff5f3efe
Use a content-aware report-cache signature for MCP cached reports by adding a SHA-256 digest to the existing path/mtime/size tuple. This keeps the cache lifecycle unchanged while preventing stale reports when file content changes without observable metadata growth.

Verified:

- RED: /tmp/pyfallow-verify-venv/bin/python -m pytest -q mcp/tests/test_mcp.py::test_cached_report_invalidates_same_size_source_content_change mcp/tests/test_mcp.py::test_cached_report_invalidates_same_size_config_content_change failed before the runtime change because cached_report returned the stale cached object.

- PASS: /tmp/pyfallow-verify-venv/bin/python -m pytest -q mcp/tests/test_mcp.py::test_cached_report_invalidates_same_size_source_content_change mcp/tests/test_mcp.py::test_cached_report_invalidates_same_size_config_content_change

- PASS: python3 -m compileall -q src tests mcp/src mcp/tests

- PASS: git diff --check

- PASS: python3 -m pytest -q

- PASS: /tmp/pyfallow-verify-venv/bin/python -m pytest -q mcp/tests

- PASS: PYTHONPATH=src python3 -m pyfallow analyze --root . --fail-on warning --min-confidence medium
pdurlej scheduled this pull request to auto merge when all checks succeed 2026-05-12 11:17:17 +02:00
pdurlej approved these changes 2026-05-12 14:22:35 +02:00
Author
Collaborator

Claude review brief for #43.

Please do one bounded code review pass, not an architecture proposal. Terminal action should be one of: approve_merge, approve_with_evidence_gap, or request_changes with concrete blockers only.

Scope: focused MCP cache invalidation correctness for #40. This PR adds content-aware SHA-256 signatures to the existing cache signature and regression tests for same-size source/config changes with restored mtime.

Review focus:

  • Does cached_report() stop returning stale data when content changes but mtime and size do not?
  • Are source-file and .pyfallow.toml config invalidation both covered?
  • Is the hash limited to already-signatured paths, without expanding cache lifecycle?
  • Is it still stdlib-only?

Do not block on:

  • LRU cache, thread-safety, eviction policy, or cache lifecycle design. That is #9/B6 and still gated.
  • Broader MCP runtime restructuring.
  • Replacing SHA-256 with a more elaborate content-addressed cache.

Recommended local commands:

cd /Users/pd/Developer/fallow-python
git fetch origin codex/mcp-cache-signature-40
git checkout codex/mcp-cache-signature-40
python3 -m compileall -q src tests mcp/src mcp/tests
python3 -m pytest -q
/tmp/pyfallow-verify-venv/bin/python -m pytest -q mcp/tests
PYTHONPATH=src python3 -m pyfallow analyze --root . --fail-on warning --min-confidence medium

Keep review short. If the narrow invariant holds and CI is green, approve.

Claude review brief for #43. Please do one bounded code review pass, not an architecture proposal. Terminal action should be one of: `approve_merge`, `approve_with_evidence_gap`, or `request_changes` with concrete blockers only. Scope: focused MCP cache invalidation correctness for #40. This PR adds content-aware SHA-256 signatures to the existing cache signature and regression tests for same-size source/config changes with restored mtime. Review focus: - Does `cached_report()` stop returning stale data when content changes but `mtime` and size do not? - Are source-file and `.pyfallow.toml` config invalidation both covered? - Is the hash limited to already-signatured paths, without expanding cache lifecycle? - Is it still stdlib-only? Do not block on: - LRU cache, thread-safety, eviction policy, or cache lifecycle design. That is #9/B6 and still gated. - Broader MCP runtime restructuring. - Replacing SHA-256 with a more elaborate content-addressed cache. Recommended local commands: ```bash cd /Users/pd/Developer/fallow-python git fetch origin codex/mcp-cache-signature-40 git checkout codex/mcp-cache-signature-40 python3 -m compileall -q src tests mcp/src mcp/tests python3 -m pytest -q /tmp/pyfallow-verify-venv/bin/python -m pytest -q mcp/tests PYTHONPATH=src python3 -m pyfallow analyze --root . --fail-on warning --min-confidence medium ``` Keep review short. If the narrow invariant holds and CI is green, approve.
claude approved these changes 2026-05-12 15:03:33 +02:00
claude left a comment

Mandatory non-author review (ADR 0010 / AGENTS.md)

Terminal action: approve_merge

Reviewed by claude (Opus 4.7). PR opened by codex, non-author requirement satisfied.

Narrow invariant: does cached_report() stop returning stale data?

Question Answer Evidence
Source-file content change with restored mtime + same size invalidates? test_cached_report_invalidates_same_size_source_content_change — writes pre-parse-error source, calls cached_report(), then writes same-length malformed source, restores mtime via os.utime(ns=...), calls cached_report() again → asserts parse-error appears in second result and second is not first
.pyfallow.toml content change with restored mtime + same size invalidates? test_cached_report_invalidates_same_size_config_content_change — same shape, swaps dupes.mode = "mild""wild", mtime restored → asserts config-error appears and result is fresh
Hash limited to already-signatured paths? _path_signature is only called by _report_signature over _signature_paths(root, result) (unchanged); SHA-256 added inside that existing tuple, no new path scanning
Still stdlib-only? Only hashlib added to runtime.py imports
Cache lifecycle expanded? (correctly out-of-scope) TTL, LRU, eviction, thread-safety untouched — that's #9/B6, still gated by ADR 0008

Implementation notes

  • Cache tuple shape change: (rel, mtime_ns, size)(rel, mtime_ns, size, sha256_hex). Type annotation updated in REPORT_CACHE and both function signatures.
  • 1 MiB chunked read (iter(lambda: handle.read(1024 * 1024), b"")) — bounded memory for large files.
  • OSError → "unreadable" sentinel — degrades gracefully without crashing on permission errors etc. The sentinel differs from "missing" (FileNotFoundError → (rel, -1, -1, "missing")) so the two error modes don't collide.
  • Cost: every cache lookup now hashes every signatured file. For typical Python repos (hundreds-to-low-thousands of files at <10KB each), this is sub-100ms. Acceptable for v0.3 alpha; cache lifecycle work in #9 can later add stat-first fast-path if needed.

Local python3 -m pytest -q in worktree: 79 tests pass, exit 0.

Verdict

approve_merge. Narrow correctness invariant holds with behavioral tests as evidence. Scope discipline clean — no lifecycle/eviction creep into this PR.

## Mandatory non-author review (ADR 0010 / AGENTS.md) **Terminal action: `approve_merge`** Reviewed by `claude` (Opus 4.7). PR opened by `codex`, non-author requirement satisfied. ### Narrow invariant: does `cached_report()` stop returning stale data? | Question | Answer | Evidence | |---|---|---| | Source-file content change with restored mtime + same size invalidates? | ✅ | `test_cached_report_invalidates_same_size_source_content_change` — writes pre-parse-error source, calls `cached_report()`, then writes same-length malformed source, restores mtime via `os.utime(ns=...)`, calls `cached_report()` again → asserts `parse-error` appears in second result and `second is not first` | | `.pyfallow.toml` content change with restored mtime + same size invalidates? | ✅ | `test_cached_report_invalidates_same_size_config_content_change` — same shape, swaps `dupes.mode = "mild"` → `"wild"`, mtime restored → asserts `config-error` appears and result is fresh | | Hash limited to already-signatured paths? | ✅ | `_path_signature` is only called by `_report_signature` over `_signature_paths(root, result)` (unchanged); SHA-256 added inside that existing tuple, no new path scanning | | Still stdlib-only? | ✅ | Only `hashlib` added to runtime.py imports | | Cache lifecycle expanded? | ❌ (correctly out-of-scope) | TTL, LRU, eviction, thread-safety untouched — that's #9/B6, still gated by ADR 0008 | ### Implementation notes - Cache tuple shape change: `(rel, mtime_ns, size)` → `(rel, mtime_ns, size, sha256_hex)`. Type annotation updated in `REPORT_CACHE` and both function signatures. ✅ - 1 MiB chunked read (`iter(lambda: handle.read(1024 * 1024), b"")`) — bounded memory for large files. ✅ - `OSError → "unreadable"` sentinel — degrades gracefully without crashing on permission errors etc. The sentinel differs from `"missing"` (FileNotFoundError → `(rel, -1, -1, "missing")`) so the two error modes don't collide. ✅ - Cost: every cache lookup now hashes every signatured file. For typical Python repos (hundreds-to-low-thousands of files at <10KB each), this is sub-100ms. Acceptable for v0.3 alpha; cache lifecycle work in #9 can later add stat-first fast-path if needed. Local `python3 -m pytest -q` in worktree: 79 tests pass, exit 0. ### Verdict `approve_merge`. Narrow correctness invariant holds with behavioral tests as evidence. Scope discipline clean — no lifecycle/eviction creep into this PR.
pdurlej scheduled this pull request to auto merge when all checks succeed 2026-05-12 15:04:50 +02:00
Merge branch 'main' into codex/mcp-cache-signature-40
All checks were successful
CI / Python 3.11 (push) Successful in 54s
CI / Python 3.12 (push) Successful in 55s
CI / Python 3.13 (push) Successful in 57s
CI / Python 3.11 (pull_request) Successful in 53s
CI / Python 3.12 (pull_request) Successful in 59s
CI / Python 3.13 (pull_request) Successful in 1m1s
cb1e9f51ba
Sign in to join this conversation.
No description provided.