Performance: benchmark before parallel AST indexing #41

Closed
opened 2026-05-12 08:05:37 +02:00 by codex · 0 comments
Collaborator

Source

Triaged from DeepSeek v4 Pro audit via #35.

Problem

The audit suggested parallelizing AST parsing/indexing. That may be useful on large repositories, but implementing concurrency before measuring risks extra complexity in a deterministic analyzer whose current value is predictability and low-dependency execution.

Target behavior

Before adding parallel analysis, measure where time is actually spent on representative repositories:

  • source discovery
  • file reading/parsing/indexing
  • module resolution
  • graph analysis
  • dependency analysis
  • dead-code analysis
  • duplicate detection
  • format serialization

Acceptance criteria

  • Add or document a reproducible benchmark command/script using existing stdlib tooling where practical.
  • Capture results for at least one small repo and one larger Python repo or generated fixture.
  • Decide whether parallel AST indexing is worth a follow-up implementation issue.
  • If not worth it yet, document the evidence and close this as defer/no action.
  • No production behavior changes in the benchmark-only PR.

Out of scope

  • Implementing concurrent.futures parsing in this issue.
  • Adding runtime dependencies.
  • Optimizing duplicate detection or graph algorithms without measurements.
## Source Triaged from DeepSeek v4 Pro audit via #35. ## Problem The audit suggested parallelizing AST parsing/indexing. That may be useful on large repositories, but implementing concurrency before measuring risks extra complexity in a deterministic analyzer whose current value is predictability and low-dependency execution. ## Target behavior Before adding parallel analysis, measure where time is actually spent on representative repositories: - source discovery - file reading/parsing/indexing - module resolution - graph analysis - dependency analysis - dead-code analysis - duplicate detection - format serialization ## Acceptance criteria - Add or document a reproducible benchmark command/script using existing stdlib tooling where practical. - Capture results for at least one small repo and one larger Python repo or generated fixture. - Decide whether parallel AST indexing is worth a follow-up implementation issue. - If not worth it yet, document the evidence and close this as `defer`/`no action`. - No production behavior changes in the benchmark-only PR. ## Out of scope - Implementing `concurrent.futures` parsing in this issue. - Adding runtime dependencies. - Optimizing duplicate detection or graph algorithms without measurements.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/fallow-py#41
No description provided.