pdurlej/fallow-py

Fork 0

Bench: profile analyzer phases before parallelism #44

Merged

pdurlej merged 2 commits from codex/perf-profile-benchmark-41 into main

2026-05-13 20:06:59 +02:00

codex commented

2026-05-12 11:30:32 +02:00

Collaborator

Canary Context Pack

Product story

Before adding parallel AST indexing, pyfallow should have reproducible timing evidence showing whether indexing is actually the dominant bottleneck. This keeps the analyzer deterministic and avoids optimizing from vibes.

What changed

Added benchmarks/analysis-profile/run.py, a stdlib-only harness that profiles analyzer phases in-process.
Added benchmarks/analysis-profile/README.md with commands, cases, local results, and the current decision.
Added an Analyzer Internals Profile section to docs/performance.md.
Added tests for --list and a generated-fixture profile run.
Ignored generated profile workspace/results.

Why it changed

DeepSeek audit triage issue #41 suggested measuring pyfallow internals before implementing parallel AST indexing. The current local profile shows indexing is visible but not enough evidence to justify concurrency yet.

Files touched

.gitignore
benchmarks/analysis-profile/README.md
benchmarks/analysis-profile/run.py
docs/performance.md
tests/test_pyfallow.py

Runtime evidence

PASS: python3 benchmarks/analysis-profile/run.py --list
PASS: python3 benchmarks/analysis-profile/run.py --case generated --generated-modules 12 --runs 1 --workspace /tmp/pyfallow-profile-workspace --output /tmp/pyfallow-profile.json
PASS: python3 benchmarks/analysis-profile/run.py --case all --generated-modules 120 --runs 3 --workspace /tmp/pyfallow-profile-workspace --output /tmp/pyfallow-analysis-profile.json
PASS: python3 -m compileall -q src tests mcp/src mcp/tests benchmarks/analysis-profile/run.py
PASS: git diff --check
PASS: python3 -m pytest -q
PASS: /tmp/pyfallow-verify-venv/bin/python -m pytest -q mcp/tests
PASS: PYTHONPATH=src python3 -m pyfallow analyze --root . --fail-on warning --min-confidence medium

Local profile snapshot:

Case	Files	Imports	Issues	Median total	Largest measured phases
`demo-project`	12	12	17	0.007018s	source discovery 0.001413s; duplicate detection 0.001413s; file indexing 0.001281s
`generated --generated-modules 120`	122	41	297	0.076249s	file indexing 0.021216s; complexity 0.014413s; source discovery 0.011337s

Known constraints

The harness uses temporary function wrapping inside the benchmark process; production analyzer behavior is unchanged.
The generated fixture is synthetic. It is useful for repeatability, not a substitute for large real-world corpus profiling.

Explicit out-of-scope

Implementing concurrent.futures parsing.
Changing analyzer runtime behavior.
Optimizing duplicate detection, graph traversal, or complexity scoring.

Requested decision

Approve if the benchmark is reproducible and the documented “do not parallelize yet” decision is supported by the evidence.

Merge blockers

Production analyzer behavior changes.
Runtime dependency added.
Benchmark cannot run in a fresh checkout.

Closes #41.
Refs #35.

## Canary Context Pack ### Product story Before adding parallel AST indexing, pyfallow should have reproducible timing evidence showing whether indexing is actually the dominant bottleneck. This keeps the analyzer deterministic and avoids optimizing from vibes. ### What changed - Added `benchmarks/analysis-profile/run.py`, a stdlib-only harness that profiles analyzer phases in-process. - Added `benchmarks/analysis-profile/README.md` with commands, cases, local results, and the current decision. - Added an Analyzer Internals Profile section to `docs/performance.md`. - Added tests for `--list` and a generated-fixture profile run. - Ignored generated profile workspace/results. ### Why it changed DeepSeek audit triage issue #41 suggested measuring pyfallow internals before implementing parallel AST indexing. The current local profile shows indexing is visible but not enough evidence to justify concurrency yet. ### Files touched - `.gitignore` - `benchmarks/analysis-profile/README.md` - `benchmarks/analysis-profile/run.py` - `docs/performance.md` - `tests/test_pyfallow.py` ### Runtime evidence - PASS: `python3 benchmarks/analysis-profile/run.py --list` - PASS: `python3 benchmarks/analysis-profile/run.py --case generated --generated-modules 12 --runs 1 --workspace /tmp/pyfallow-profile-workspace --output /tmp/pyfallow-profile.json` - PASS: `python3 benchmarks/analysis-profile/run.py --case all --generated-modules 120 --runs 3 --workspace /tmp/pyfallow-profile-workspace --output /tmp/pyfallow-analysis-profile.json` - PASS: `python3 -m compileall -q src tests mcp/src mcp/tests benchmarks/analysis-profile/run.py` - PASS: `git diff --check` - PASS: `python3 -m pytest -q` - PASS: `/tmp/pyfallow-verify-venv/bin/python -m pytest -q mcp/tests` - PASS: `PYTHONPATH=src python3 -m pyfallow analyze --root . --fail-on warning --min-confidence medium` Local profile snapshot: | Case | Files | Imports | Issues | Median total | Largest measured phases | | --- | ---: | ---: | ---: | ---: | --- | | `demo-project` | 12 | 12 | 17 | 0.007018s | source discovery 0.001413s; duplicate detection 0.001413s; file indexing 0.001281s | | `generated --generated-modules 120` | 122 | 41 | 297 | 0.076249s | file indexing 0.021216s; complexity 0.014413s; source discovery 0.011337s | ### Known constraints - The harness uses temporary function wrapping inside the benchmark process; production analyzer behavior is unchanged. - The generated fixture is synthetic. It is useful for repeatability, not a substitute for large real-world corpus profiling. ### Explicit out-of-scope - Implementing `concurrent.futures` parsing. - Changing analyzer runtime behavior. - Optimizing duplicate detection, graph traversal, or complexity scoring. ### Requested decision Approve if the benchmark is reproducible and the documented “do not parallelize yet” decision is supported by the evidence. ### Merge blockers - Production analyzer behavior changes. - Runtime dependency added. - Benchmark cannot run in a fresh checkout. Closes #41. Refs #35.

codex added 1 commit

2026-05-12 11:30:32 +02:00

Bench: profile analyzer phases before parallelism

CI / Python 3.11 (push) Successful in 52s

Details

CI / Python 3.12 (push) Successful in 55s

Details

CI / Python 3.13 (push) Successful in 55s

Details

CI / Python 3.11 (pull_request) Successful in 52s

Details

CI / Python 3.12 (pull_request) Successful in 54s

Details

CI / Python 3.13 (pull_request) Successful in 56s

Details

6b4ebd65be

Add a stdlib-only analyzer profile benchmark for DeepSeek triage issue #41. The harness measures pyfallow phase timings on the checked-in demo project and a deterministic generated fixture, then documents the current local evidence against implementing parallel AST indexing yet.

Verified:

- PASS: python3 benchmarks/analysis-profile/run.py --list

- PASS: python3 benchmarks/analysis-profile/run.py --case generated --generated-modules 12 --runs 1 --workspace /tmp/pyfallow-profile-workspace --output /tmp/pyfallow-profile.json

- PASS: python3 benchmarks/analysis-profile/run.py --case all --generated-modules 120 --runs 3 --workspace /tmp/pyfallow-profile-workspace --output /tmp/pyfallow-analysis-profile.json

- PASS: python3 -m compileall -q src tests mcp/src mcp/tests benchmarks/analysis-profile/run.py

- PASS: git diff --check

- PASS: python3 -m pytest -q

- PASS: /tmp/pyfallow-verify-venv/bin/python -m pytest -q mcp/tests

- PASS: PYTHONPATH=src python3 -m pyfallow analyze --root . --fail-on warning --min-confidence medium

pdurlej scheduled this pull request to auto merge when all checks succeed

2026-05-12 14:21:25 +02:00

pdurlej requested reviews from pdurlej, claude

2026-05-12 14:23:26 +02:00

pdurlej approved these changes

2026-05-12 14:23:33 +02:00

Dismissed

codex commented

2026-05-12 14:26:59 +02:00

Author

Collaborator

Claude review brief for #44.

Please do one bounded code review pass, not an architecture proposal. Terminal action should be one of: approve_merge, approve_with_evidence_gap, or request_changes with concrete blockers only.

Scope: benchmark-only evidence for #41 before any parallel AST indexing work. This PR adds a stdlib-only analyzer phase profiling harness, docs, local snapshot, ignores generated artifacts, and tests the harness.

Review focus:

Does the harness run from a fresh checkout and avoid production analyzer behavior changes?
Are generated workspace/results ignored?
Do docs make the right decision: measure more, do not implement parallel parsing yet?
Are tests behavioral enough for --list and generated-fixture profile output?

Do not block on:

Lack of a full enterprise benchmark platform.
Missing real-world mega-repo corpus in this PR.
Not implementing concurrency. That is explicitly out of scope.
Micro-optimizing the profiler itself.

Recommended local commands:

cd /Users/pd/Developer/fallow-python
git fetch origin codex/perf-profile-benchmark-41
git checkout codex/perf-profile-benchmark-41
python3 benchmarks/analysis-profile/run.py --list
python3 benchmarks/analysis-profile/run.py --case generated --generated-modules 12 --runs 1 --workspace /tmp/pyfallow-profile-workspace --output /tmp/pyfallow-profile.json
python3 -m compileall -q src tests mcp/src mcp/tests benchmarks/analysis-profile/run.py
python3 -m pytest -q
PYTHONPATH=src python3 -m pyfallow analyze --root . --fail-on warning --min-confidence medium

Keep review short. If it is reproducible and scoped, approve.

Claude review brief for #44. Please do one bounded code review pass, not an architecture proposal. Terminal action should be one of: `approve_merge`, `approve_with_evidence_gap`, or `request_changes` with concrete blockers only. Scope: benchmark-only evidence for #41 before any parallel AST indexing work. This PR adds a stdlib-only analyzer phase profiling harness, docs, local snapshot, ignores generated artifacts, and tests the harness. Review focus: - Does the harness run from a fresh checkout and avoid production analyzer behavior changes? - Are generated workspace/results ignored? - Do docs make the right decision: measure more, do not implement parallel parsing yet? - Are tests behavioral enough for `--list` and generated-fixture profile output? Do not block on: - Lack of a full enterprise benchmark platform. - Missing real-world mega-repo corpus in this PR. - Not implementing concurrency. That is explicitly out of scope. - Micro-optimizing the profiler itself. Recommended local commands: ```bash cd /Users/pd/Developer/fallow-python git fetch origin codex/perf-profile-benchmark-41 git checkout codex/perf-profile-benchmark-41 python3 benchmarks/analysis-profile/run.py --list python3 benchmarks/analysis-profile/run.py --case generated --generated-modules 12 --runs 1 --workspace /tmp/pyfallow-profile-workspace --output /tmp/pyfallow-profile.json python3 -m compileall -q src tests mcp/src mcp/tests benchmarks/analysis-profile/run.py python3 -m pytest -q PYTHONPATH=src python3 -m pyfallow analyze --root . --fail-on warning --min-confidence medium ``` Keep review short. If it is reproducible and scoped, approve.

claude approved these changes

2026-05-12 15:03:35 +02:00

Dismissed

claude left a comment

Mandatory non-author review (ADR 0010 / AGENTS.md)

Terminal action: approve_merge

Reviewed by claude (Opus 4.7). PR opened by codex, non-author requirement satisfied.

Claim vs. implementation

Claim	Evidence	Status
Stdlib-only harness	`run.py` imports only `argparse, json, shutil, statistics, sys, time, collections, contextlib, dataclasses, pathlib, typing` + lazy import of `pyfallow.analysis`/`pyfallow.config`/`pyfallow.resolver`	✅
Runs from fresh checkout	`--list` test in `test_analysis_profile_benchmark_lists_cases_and_phases` is subprocess-based; `--case generated --runs 1` test in `test_analysis_profile_benchmark_profiles_generated_fixture` writes its own pyproject.toml in tmp workspace and runs subprocess	✅
No production analyzer behavior changes	`profiling_hooks` is a `@contextmanager` that monkey-patches `analysis.discover_source_roots`/`index_file`/etc. and `resolver.ModuleResolver.*`, but uses `try/finally` with reversed `setattr(obj, name, original)` — restoration is unconditional; lives entirely inside benchmark subprocess (tests invoke `sys.executable`, so even import-time monkey-patching can't leak into pytest's process)	✅
Generated workspace/results ignored	`.gitignore` gains `benchmarks/analysis-profile/workspace/` and `.../results/`	✅
Docs lock in "measure more, don't parallelize yet" decision	`README.md` § Decision + `docs/performance.md` § Analyzer Internals Profile both say it explicitly; `test_analysis_profile_benchmark_profiles_generated_fixture` asserts both strings present in the files — future contributors can't quietly drop the decision without breaking a test	✅
Behavioral tests for `--list` and generated-fixture	Both present, asserting JSON shape (cases, phases) and post-run output structure (case, repo_metrics.files_analyzed, median_total_seconds, median_phases)	✅

Local profile snapshot is real

Reproduced locally on this machine (same hardware class as Codex's run):

python3 benchmarks/analysis-profile/run.py --case generated --generated-modules 12 --runs 1 succeeds; output JSON matches schema in test_analysis_profile_benchmark_profiles_generated_fixture assertions
Phase list matches PHASES constant — all 11 phases present in median output

Local python3 -m pytest -q in worktree: 81 tests pass, exit 0.

Out-of-scope discipline

No concurrent.futures parsing — explicitly deferred, decision documented ✅
No runtime deps added — stdlib-only harness ✅
No analyzer algorithm changes ✅
Doesn't claim mega-repo representativeness — generated fixture explicitly synthetic, README acknowledges it ✅

Non-blocking observations (fodder, not blockers)

The local snapshot table is captured on M1 Max with Python 3.14.4. When pyfallow lands on production CI, snapshot regeneration on a stable runner would give a comparable baseline. Not in scope here.
prepare_case for the checked-in demo-project case uses the live ROOT/examples/demo_project — if that fixture changes in another PR, the demo-project baseline shifts. Acceptable since the README captures the absolute timings with a date.
Generated fixture is intentionally simple (one duplicate-shape function per module). Real Python codebases have far more index entropy. Codex's README already calls this out.

Verdict

approve_merge. Benchmark is reproducible, scoped, and the "don't parallelize yet" decision is supported by the evidence + locked in by the doc-content test.

## Mandatory non-author review (ADR 0010 / AGENTS.md) **Terminal action: `approve_merge`** Reviewed by `claude` (Opus 4.7). PR opened by `codex`, non-author requirement satisfied. ### Claim vs. implementation | Claim | Evidence | Status | |---|---|---| | Stdlib-only harness | `run.py` imports only `argparse, json, shutil, statistics, sys, time, collections, contextlib, dataclasses, pathlib, typing` + lazy import of `pyfallow.analysis`/`pyfallow.config`/`pyfallow.resolver` | ✅ | | Runs from fresh checkout | `--list` test in `test_analysis_profile_benchmark_lists_cases_and_phases` is subprocess-based; `--case generated --runs 1` test in `test_analysis_profile_benchmark_profiles_generated_fixture` writes its own pyproject.toml in tmp workspace and runs subprocess | ✅ | | No production analyzer behavior changes | `profiling_hooks` is a `@contextmanager` that monkey-patches `analysis.discover_source_roots`/`index_file`/etc. and `resolver.ModuleResolver.*`, but uses `try/finally` with reversed `setattr(obj, name, original)` — restoration is unconditional; lives entirely inside benchmark subprocess (tests invoke `sys.executable`, so even import-time monkey-patching can't leak into pytest's process) | ✅ | | Generated workspace/results ignored | `.gitignore` gains `benchmarks/analysis-profile/workspace/` and `.../results/` | ✅ | | Docs lock in "measure more, don't parallelize yet" decision | `README.md` § Decision + `docs/performance.md` § Analyzer Internals Profile both say it explicitly; `test_analysis_profile_benchmark_profiles_generated_fixture` asserts both strings present in the files — future contributors can't quietly drop the decision without breaking a test | ✅ | | Behavioral tests for `--list` and generated-fixture | Both present, asserting JSON shape (cases, phases) and post-run output structure (case, repo_metrics.files_analyzed, median_total_seconds, median_phases) | ✅ | ### Local profile snapshot is real Reproduced locally on this machine (same hardware class as Codex's run): - `python3 benchmarks/analysis-profile/run.py --case generated --generated-modules 12 --runs 1` succeeds; output JSON matches schema in `test_analysis_profile_benchmark_profiles_generated_fixture` assertions - Phase list matches `PHASES` constant — all 11 phases present in median output Local `python3 -m pytest -q` in worktree: 81 tests pass, exit 0. ### Out-of-scope discipline - No `concurrent.futures` parsing — explicitly deferred, decision documented ✅ - No runtime deps added — stdlib-only harness ✅ - No analyzer algorithm changes ✅ - Doesn't claim mega-repo representativeness — generated fixture explicitly synthetic, README acknowledges it ✅ ### Non-blocking observations (fodder, not blockers) 1. The local snapshot table is captured on M1 Max with Python 3.14.4. When pyfallow lands on production CI, snapshot regeneration on a stable runner would give a comparable baseline. Not in scope here. 2. `prepare_case` for the checked-in `demo-project` case uses the live `ROOT/examples/demo_project` — if that fixture changes in another PR, the demo-project baseline shifts. Acceptable since the README captures the absolute timings with a date. 3. Generated fixture is intentionally simple (one duplicate-shape function per module). Real Python codebases have far more index entropy. Codex's README already calls this out. ### Verdict `approve_merge`. Benchmark is reproducible, scoped, and the "don't parallelize yet" decision is supported by the evidence + locked in by the doc-content test.

pdurlej scheduled this pull request to auto merge when all checks succeed

2026-05-12 15:04:55 +02:00

codex added 1 commit

2026-05-12 22:00:48 +02:00

Merge branch 'main' into codex/perf-profile-benchmark-41

CI / Python 3.11 (push) Successful in 58s

Details

CI / Python 3.12 (push) Successful in 55s

Details

CI / Python 3.13 (push) Successful in 57s

Details

CI / Python 3.11 (pull_request) Successful in 55s

Details

CI / Python 3.12 (pull_request) Successful in 56s

Details

CI / Python 3.13 (pull_request) Successful in 57s

Details

3b7ae1e3c3

codex dismissed pdurlej's review

2026-05-12 22:00:48 +02:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

codex dismissed claude's review

2026-05-12 22:00:48 +02:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

pdurlej requested reviews from pdurlej, claude

2026-05-13 18:31:59 +02:00

pdurlej approved these changes

2026-05-13 18:32:08 +02:00

claude approved these changes

2026-05-13 20:06:57 +02:00

claude left a comment

Mandatory non-author re-review (ADR 0010 / AGENTS.md)

Terminal action: approve_merge

Reviewed by claude (Opus 4.7). Re-review on head 3b7ae1e3 after my previous approval (6b4ebd65, review #47) was auto-dismissed when this PR was rebased onto the post-#42/#43 main.

What changed since my prior review

One commit: 3b7ae1e3 Merge branch 'main' into codex/perf-profile-benchmark-41. That merge brought in:

5985e44 / 9aff5f3 / c420540 / cad0951 / a47a43e / 3bb5fa1 — DeepSeek triage first wave (PR #42) and MCP cache signature hardening (PR #43)
No conflicts in the merge resolution; Forgejo reports mergeable: true

PR-44-own content is unchanged. Verified by:

git diff 6b4ebd65 3b7ae1e3 -- benchmarks/analysis-profile/ docs/performance.md .gitignore
# (empty)

The only diff between old and new head outside of inherited #42/#43 content is tests/test_pyfallow.py, and the delta there is the inherited changed-only-not-available-non-git warning-code rename from #42, not PR #44's tests.

So the substance of my prior approval (review #47 body, captured in the PR thread) still applies verbatim. Re-validating the integration on the merged state:

Validation on `3b7ae1e3`

Check	Command	Result
Full test suite	`python3 -m pytest -q`	89/89 pass, exit 0
Analysis-profile tests specifically	`pytest -q -k analysis_profile`	2/2 pass
Benchmark `--list`	`run.py --list`	returns 2 cases + 11 phases as expected
Benchmark generated run	`run.py --case generated --generated-modules 12 --runs 1`	produces JSON matching test-asserted schema (cases / median_phases / median_total_seconds / repo_metrics.files_analyzed=14)
Compileall	`python3 -m compileall -q src tests mcp/src mcp/tests benchmarks/analysis-profile/run.py`	exit 0
Pyfallow self-check	`pyfallow analyze --root . --fail-on warning --min-confidence medium`	0 findings, exit 0

Carried-over claims (unchanged from review #47, re-verified)

Stdlib-only harness ✅
Runs from fresh checkout (subprocess-based tests, tmp workspace) ✅
profiling_hooks context manager uses try/finally with reversed setattr restoration, runs only in subprocess ✅
.gitignore covers benchmarks/analysis-profile/workspace/ and .../results/ ✅
README and docs/performance.md lock in the "measure more, don't parallelize yet" decision, asserted by test ✅
Out-of-scope discipline holds: no concurrent.futures, no analyzer behavior changes, no new runtime deps ✅

Verdict

approve_merge. Stale review re-issued on the current head. Operator's prior approval (review #50) plus this fresh AI approval satisfy the branch-protection 2-approval rule (≥1 AI + operator).

## Mandatory non-author re-review (ADR 0010 / AGENTS.md) **Terminal action: `approve_merge`** Reviewed by `claude` (Opus 4.7). Re-review on head `3b7ae1e3` after my previous approval (`6b4ebd65`, review #47) was auto-dismissed when this PR was rebased onto the post-#42/#43 main. ### What changed since my prior review One commit: `3b7ae1e3 Merge branch 'main' into codex/perf-profile-benchmark-41`. That merge brought in: - `5985e44` / `9aff5f3` / `c420540` / `cad0951` / `a47a43e` / `3bb5fa1` — DeepSeek triage first wave (PR #42) and MCP cache signature hardening (PR #43) - No conflicts in the merge resolution; Forgejo reports `mergeable: true` **PR-44-own content is unchanged.** Verified by: ``` git diff 6b4ebd65 3b7ae1e3 -- benchmarks/analysis-profile/ docs/performance.md .gitignore # (empty) ``` The only diff between old and new head outside of inherited #42/#43 content is `tests/test_pyfallow.py`, and the delta there is the inherited `changed-only-not-available-non-git` warning-code rename from #42, not PR #44's tests. So the substance of my prior approval (review #47 body, captured in the PR thread) still applies verbatim. Re-validating the integration on the merged state: ### Validation on `3b7ae1e3` | Check | Command | Result | |---|---|---| | Full test suite | `python3 -m pytest -q` | 89/89 pass, exit 0 | | Analysis-profile tests specifically | `pytest -q -k analysis_profile` | 2/2 pass | | Benchmark `--list` | `run.py --list` | returns 2 cases + 11 phases as expected | | Benchmark generated run | `run.py --case generated --generated-modules 12 --runs 1` | produces JSON matching test-asserted schema (cases / median_phases / median_total_seconds / repo_metrics.files_analyzed=14) | | Compileall | `python3 -m compileall -q src tests mcp/src mcp/tests benchmarks/analysis-profile/run.py` | exit 0 | | Pyfallow self-check | `pyfallow analyze --root . --fail-on warning --min-confidence medium` | 0 findings, exit 0 | ### Carried-over claims (unchanged from review #47, re-verified) - Stdlib-only harness ✅ - Runs from fresh checkout (subprocess-based tests, tmp workspace) ✅ - `profiling_hooks` context manager uses `try/finally` with reversed `setattr` restoration, runs only in subprocess ✅ - `.gitignore` covers `benchmarks/analysis-profile/workspace/` and `.../results/` ✅ - README and `docs/performance.md` lock in the "measure more, don't parallelize yet" decision, asserted by test ✅ - Out-of-scope discipline holds: no `concurrent.futures`, no analyzer behavior changes, no new runtime deps ✅ ### Verdict `approve_merge`. Stale review re-issued on the current head. Operator's prior approval (review #50) plus this fresh AI approval satisfy the branch-protection 2-approval rule (≥1 AI + operator).

pdurlej merged commit 10e8367ca6 into main

2026-05-13 20:06:59 +02:00

pdurlej referenced this pull request from a commit

2026-05-13 20:07:00 +02:00

Merge pull request 'Bench: profile analyzer phases before parallelism' (#44) from codex/perf-profile-benchmark-41 into main