Discovery: replace source-root length ordering with explicit specificity policy #38

Closed
opened 2026-05-12 08:04:56 +02:00 by codex · 0 comments
Collaborator

Source

Triaged from DeepSeek v4 Pro audit via #35.

Problem

discover_source_roots() currently uses string length ordering for inferred candidates. That happens to prefer child paths such as src/ over the repository root in common layouts, but the policy is implicit and brittle. The audit's exact example was overstated, but the implementation is still too clever-by-accident for a codebase-intelligence tool.

Target behavior

Source-root ordering should be explicit and testable:

  • explicit config.roots preserve user intent after normalization/filtering
  • inferred package/source directories should be preferred over broad repository root when both are candidates
  • parent/child candidates should be ordered by path depth/specificity, not raw string length
  • duplicate resolved paths should still collapse deterministically
  • namespace-package behavior must remain compatible with namespace_packages

Acceptance criteria

  • Add regression tests for parent/child source roots and mixed inferred candidates.
  • Add regression tests for explicit roots to ensure user order/intent is not unexpectedly rewritten.
  • Replace raw len(p.as_posix()) ordering with a named helper or clearly documented key.
  • Existing demo project and false-positive corpus continue to analyze with the same module names.
  • No analyzer category changes.

Out of scope

  • Full source-root auto-detection redesign.
  • New config knobs.
  • Namespace package architecture changes beyond preserving current behavior.
## Source Triaged from DeepSeek v4 Pro audit via #35. ## Problem `discover_source_roots()` currently uses string length ordering for inferred candidates. That happens to prefer child paths such as `src/` over the repository root in common layouts, but the policy is implicit and brittle. The audit's exact example was overstated, but the implementation is still too clever-by-accident for a codebase-intelligence tool. ## Target behavior Source-root ordering should be explicit and testable: - explicit `config.roots` preserve user intent after normalization/filtering - inferred package/source directories should be preferred over broad repository root when both are candidates - parent/child candidates should be ordered by path depth/specificity, not raw string length - duplicate resolved paths should still collapse deterministically - namespace-package behavior must remain compatible with `namespace_packages` ## Acceptance criteria - Add regression tests for parent/child source roots and mixed inferred candidates. - Add regression tests for explicit `roots` to ensure user order/intent is not unexpectedly rewritten. - Replace raw `len(p.as_posix())` ordering with a named helper or clearly documented key. - Existing demo project and false-positive corpus continue to analyze with the same module names. - No analyzer category changes. ## Out of scope - Full source-root auto-detection redesign. - New config knobs. - Namespace package architecture changes beyond preserving current behavior.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/fallow-py#38
No description provided.