MCP: harden report-cache signature beyond mtime+size #40

Closed
opened 2026-05-12 08:05:22 +02:00 by codex · 0 comments
Collaborator

Source

Triaged from DeepSeek v4 Pro audit via #35.

Problem

The MCP report cache uses a signature based on file metadata such as mtime and size. That is fast, but can miss changes on filesystems with coarse timestamp resolution or when content changes preserve size. This is not currently proven as a user-facing bug, but it is plausible cache hardening for a long-running MCP server.

This is related to, but distinct from, #9 / B6 (REPORT_CACHE bounded + thread-safe). #9 addresses cache lifecycle/concurrency. This issue addresses cache invalidation correctness.

Target behavior

Decide and implement one of:

  • content-hash based signature for Python source/config files
  • hybrid signature including mtime, size, pyfallow version, schema/config version, and selective content hashes
  • explicit documentation that current metadata cache is best-effort and disabled/invalidated in high-safety contexts

Acceptance criteria

  • Add a test where file content changes without size growth and stale cached results are not returned.
  • Include .pyfallow.toml / pyproject.toml config changes in the cache invalidation story or document why they are out of scope.
  • Preserve acceptable performance on repeated MCP calls; if hashing is used, justify the tradeoff.
  • Do not add runtime dependencies.
  • Keep #9's LRU/thread-safety scope separate unless both are intentionally implemented in one coordinated PR.

Out of scope

  • Full incremental analysis engine.
  • Persistent on-disk cache.
  • Watchman/filesystem event integration.
## Source Triaged from DeepSeek v4 Pro audit via #35. ## Problem The MCP report cache uses a signature based on file metadata such as `mtime` and size. That is fast, but can miss changes on filesystems with coarse timestamp resolution or when content changes preserve size. This is not currently proven as a user-facing bug, but it is plausible cache hardening for a long-running MCP server. This is related to, but distinct from, #9 / B6 (`REPORT_CACHE` bounded + thread-safe). #9 addresses cache lifecycle/concurrency. This issue addresses cache invalidation correctness. ## Target behavior Decide and implement one of: - content-hash based signature for Python source/config files - hybrid signature including mtime, size, pyfallow version, schema/config version, and selective content hashes - explicit documentation that current metadata cache is best-effort and disabled/invalidated in high-safety contexts ## Acceptance criteria - Add a test where file content changes without size growth and stale cached results are not returned. - Include `.pyfallow.toml` / `pyproject.toml` config changes in the cache invalidation story or document why they are out of scope. - Preserve acceptable performance on repeated MCP calls; if hashing is used, justify the tradeoff. - Do not add runtime dependencies. - Keep #9's LRU/thread-safety scope separate unless both are intentionally implemented in one coordinated PR. ## Out of scope - Full incremental analysis engine. - Persistent on-disk cache. - Watchman/filesystem event integration.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/fallow-py#40
No description provided.