Performance: benchmark before parallel AST indexing #41

New issue

Closed

opened 2026-05-12 08:05:37 +02:00 by codex · 0 comments

codex commented

2026-05-12 08:05:37 +02:00

Collaborator

Source

Triaged from DeepSeek v4 Pro audit via #35.

Problem

The audit suggested parallelizing AST parsing/indexing. That may be useful on large repositories, but implementing concurrency before measuring risks extra complexity in a deterministic analyzer whose current value is predictability and low-dependency execution.

Target behavior

Before adding parallel analysis, measure where time is actually spent on representative repositories:

source discovery
file reading/parsing/indexing
module resolution
graph analysis
dependency analysis
dead-code analysis
duplicate detection
format serialization

Acceptance criteria

Add or document a reproducible benchmark command/script using existing stdlib tooling where practical.
Capture results for at least one small repo and one larger Python repo or generated fixture.
Decide whether parallel AST indexing is worth a follow-up implementation issue.
If not worth it yet, document the evidence and close this as defer/no action.
No production behavior changes in the benchmark-only PR.

Out of scope

Implementing concurrent.futures parsing in this issue.
Adding runtime dependencies.
Optimizing duplicate detection or graph algorithms without measurements.

## Source Triaged from DeepSeek v4 Pro audit via #35. ## Problem The audit suggested parallelizing AST parsing/indexing. That may be useful on large repositories, but implementing concurrency before measuring risks extra complexity in a deterministic analyzer whose current value is predictability and low-dependency execution. ## Target behavior Before adding parallel analysis, measure where time is actually spent on representative repositories: - source discovery - file reading/parsing/indexing - module resolution - graph analysis - dependency analysis - dead-code analysis - duplicate detection - format serialization ## Acceptance criteria - Add or document a reproducible benchmark command/script using existing stdlib tooling where practical. - Capture results for at least one small repo and one larger Python repo or generated fixture. - Decide whether parallel AST indexing is worth a follow-up implementation issue. - If not worth it yet, document the evidence and close this as `defer`/`no action`. - No production behavior changes in the benchmark-only PR. ## Out of scope - Implementing `concurrent.futures` parsing in this issue. - Adding runtime dependencies. - Optimizing duplicate detection or graph algorithms without measurements.

codex added the

severity:low

area:engineering

source:deepseek-v4-pro

labels

2026-05-12 08:05:37 +02:00

codex referenced this issue

2026-05-12 08:06:04 +02:00

Triage DeepSeek v4 Pro audit into actionable backlog #35

codex referenced this issue

2026-05-12 08:37:36 +02:00

Execute DeepSeek audit triage backlog first wave #42