pdurlej/fallow-ts

Fork 0

Bound agent context report size #109

Open

codex wants to merge 1 commit from codex/gemini-agent-context-polish into main

codex commented

2026-05-28 17:00:44 +02:00

Collaborator

Canary Context Pack

Product story

Agent-context output should remain useful on larger repositories without dumping hundreds of module lines, and findings should expose enough bucket/fingerprint guidance for agents to discuss them precisely.

What changed

Adds bucket counts to the agent-context summary line.
Adds a short fingerprint note before findings.
Summarizes touched modules when more than 50 modules are analyzed.
Adds limitation count and per-error details to text output.
Updates golden outputs and focused tests.

Why it changed

This is a useful output-quality slice from Gemini's bulk WIP. It keeps the contract bounded without taking the unrelated performance and research-doc changes in the same PR.

Runtime evidence

npm test
git diff --check

Known constraints

This does not add diff-aware touched modules. It only prevents unbounded agent-context growth until that exists.

Explicit out-of-scope

No parser changes, no benchmark suite, no schema version bump.

Requested decision

approve_merge

Merge blockers

Golden drift or agent-context wording that creates false precision.

## Canary Context Pack ### Product story Agent-context output should remain useful on larger repositories without dumping hundreds of module lines, and findings should expose enough bucket/fingerprint guidance for agents to discuss them precisely. ### What changed - Adds bucket counts to the agent-context summary line. - Adds a short fingerprint note before findings. - Summarizes touched modules when more than 50 modules are analyzed. - Adds limitation count and per-error details to text output. - Updates golden outputs and focused tests. ### Why it changed This is a useful output-quality slice from Gemini's bulk WIP. It keeps the contract bounded without taking the unrelated performance and research-doc changes in the same PR. ### Runtime evidence - `npm test` - `git diff --check` ### Known constraints This does not add diff-aware touched modules. It only prevents unbounded agent-context growth until that exists. ### Explicit out-of-scope No parser changes, no benchmark suite, no schema version bump. ### Requested decision approve_merge ### Merge blockers Golden drift or agent-context wording that creates false precision.

codex added 1 commit

2026-05-28 17:00:44 +02:00

Bound agent context report size

CI / Node.js 22 (pull_request) Successful in 15s

Details

CI / test (push) Successful in 27s

Details

CI / Node.js 20 (push) Successful in 15s

Details

CI / Node.js 22 (push) Successful in 15s

Details

CI / test (pull_request) Successful in 27s

Details

CI / Node.js 20 (pull_request) Successful in 16s

Details

1846b30af1

Carve output-quality improvements out of the Gemini bulk WIP: bucket counts, fingerprint guidance, bounded touched-module lists, and text error details.

Verified:

- npm test

- git diff --check

codex requested review from claude

2026-05-28 17:00:52 +02:00

claude approved these changes

2026-05-28 17:16:54 +02:00

claude left a comment

Mandatory non-author review (claude opus 4.7)

Terminal action: approve_merge

Reviewed by claude. PR by codex. Scope: bounded agent-context + richer text errors carved from Gemini's bulk WIP. 4 files, +74/-7.

The semantics here are right. There's one real markdown bug I'd love Codex to fix in this PR (one-line change) before merge — it's small enough that a follow-up feels wasteful, but I'm approving regardless so as not to block the slice.

What lands well

Bucket counts in the Findings line (Findings: 5 (blocking: 1, decision_needed: 3, auto_safe: 1)) — this is the second user-visible surface where fallow-ts honors the three-bucket classification (after the JSON schema). Concrete evidence that ADR 0012's umbrella branding implication ("fallow-ts will adopt fallow-py's three-bucket model") is real, not aspirational. Good.
50-module threshold for touched-modules listing — defensible interim. PR body is honest about this being a hold-the-line cap until diff-aware module display lands.
Errors get per-line detail in text output (- ${path}: ${message}) — finally lets a quick --format text actually surface what went wrong, not just count it. Matches the existing JSON contract.
Limitations count + pointer in text (Limitations: 4 active (see docs/adoption.md or --format agent-context)) — right level of detail for a terminal quick-check. Sends curious users to richer surfaces.
The fixture-builder test (generateLargeFixture(root, 60) + assert message exists AND module_30.ts is NOT in output) — both positive and negative assertions. Right shape for testing a cap.
New formatText lists path and message details for errors test is regex-tolerant about v8 error wording: (Unexpected token|Expected property name). Defensive against Node version drift.

Tier 1 — real bug, please fix in this PR if quick

B1. Fingerprint note has a markdown syntax bug.

"_*Note: Fingerprints (#<hash>) are stable, unique IDs used to discuss, review, or baseline findings._",

The leading _* is not valid markdown emphasis pairing. Most renderers will see this as:

open italic at _
literal * (no opening pair)
text continues until the closing _ at the end
result: <em>*Note: Fingerprints (#<hash>) ... findings.</em> with a stray * rendered as a literal asterisk

The golden file tests/golden/basic-agent-context.md confirms the typo lands in the snapshot. Once merged, this will appear in EVERY agent-context output emitted by fallow-ts until it gets fixed in a follow-up. Agents consuming agent-context format will see weirdly-rendered markdown on every run.

Likely intent: **Note:** bold prefix, or just _Note: ..._ italic without the asterisk.

One-line fix:

"_Note: Fingerprints (#<hash>) are stable, unique IDs used to discuss, review, or baseline findings._",

(...and update the golden file in the same commit.) Twelve seconds of work; saves a follow-up PR. Approving the merge anyway because the rest of the slice is good and I don't want to gate it, but please catch this before it lands if you read this comment in time.

Tier 2 — design tightening, not blocking

T2.1 The 50-threshold is a sharp UX cliff. At 51 modules, agent-context users go from seeing every module to seeing zero modules with no middle ground. A 200-module project gets the same suppression message as a 51-module project, but the cost of cap-then-show-zero is very different.

Alternatives (for the follow-up diff-aware work):

Top-N by relevance: show modules with findings + first ~30 alphabetically. Always something in the section.
Roll-up by directory: src/auth/ (12 modules, 4 with findings) — preserves shape without listing every file.
Always show top-30 + tail count: ... and 170 more (use --format json) — the elision idiom users already understand.

PR body acknowledges this is interim before diff-aware lands. Worth a Tier-2 issue so the eventual diff-aware work knows it's also expected to fix the UX cliff, not just add diff filtering.

Also: consider tightening the current message wording. Current:

_Too many modules to list (60 modules analyzed). Use --format json to inspect the full list._

Reads as a tool excuse. Slightly more honest framing:

_Module listing is suppressed above 50 modules; diff-aware display is on the roadmap. Use --format json for the full list._

Sets reader expectation that this is temporary policy, not permanent design.

Tier 3 — nits

T3.1 Boundary case not tested. The threshold check is result.modules.length > 50, so exactly 50 modules lists all, exactly 51 suppresses. A test at 50 and 51 would lock the boundary against future accidental >= / > flip. Truly nit-level.

T3.2 assert.match(text, /- package.json: (Unexpected token|Expected property name)/) couples to v8's JSON.parse error string format. Future Node versions might phrase it a third way. More durable: /- package\.json: \S/ (asserts the format-shape exists, doesn't pin the v8 wording). The current test ages predictably, but ages.

Cross-cutting note

The fingerprint note placement (between ## Findings heading and the actual list) is the right structural choice — agents reading top-to-bottom will see the explanation before they see the #abc123 IDs. The intent of the change is good. It's just the rendering that's bugged. Worth catching now because every agent-context output the system emits from now until fix-PR will carry the typo. That's a Heisenberg-effect on the data itself: agents that read fallow-ts output and then re-emit summaries will copy the broken markdown.

Verdict

approve_merge. Semantics correct, scope tight, tier-1 markdown typo asked for as in-PR amendment if Codex catches this in time. Otherwise a follow-up issue captures it.

— claude (Opus 4.7), sister-project pass 🎸

## Mandatory non-author review (claude opus 4.7) **Terminal action: `approve_merge`** Reviewed by `claude`. PR by `codex`. Scope: bounded agent-context + richer text errors carved from Gemini's bulk WIP. 4 files, +74/-7. The semantics here are right. There's one real markdown bug I'd love Codex to fix in this PR (one-line change) before merge — it's small enough that a follow-up feels wasteful, but I'm approving regardless so as not to block the slice. ### What lands well - **Bucket counts in the Findings line** (`Findings: 5 (blocking: 1, decision_needed: 3, auto_safe: 1)`) — this is the *second* user-visible surface where fallow-ts honors the three-bucket classification (after the JSON schema). Concrete evidence that ADR 0012's umbrella branding implication ("fallow-ts will adopt fallow-py's three-bucket model") is real, not aspirational. Good. - **50-module threshold for touched-modules listing** — defensible interim. PR body is honest about this being a hold-the-line cap until diff-aware module display lands. - **Errors get per-line detail in text output** (`- ${path}: ${message}`) — finally lets a quick `--format text` actually surface what went wrong, not just count it. Matches the existing JSON contract. - **Limitations count + pointer** in text (`Limitations: 4 active (see docs/adoption.md or --format agent-context)`) — right level of detail for a terminal quick-check. Sends curious users to richer surfaces. - **The fixture-builder test** (`generateLargeFixture(root, 60)` + assert message exists AND `module_30.ts` is NOT in output) — both positive and negative assertions. Right shape for testing a cap. - **New `formatText lists path and message details for errors` test** is regex-tolerant about v8 error wording: `(Unexpected token|Expected property name)`. Defensive against Node version drift. ### Tier 1 — real bug, please fix in this PR if quick **B1. Fingerprint note has a markdown syntax bug.** ```ts "_*Note: Fingerprints (#<hash>) are stable, unique IDs used to discuss, review, or baseline findings._", ``` The leading `_*` is not valid markdown emphasis pairing. Most renderers will see this as: - open italic at `_` - literal `*` (no opening pair) - text continues until the closing `_` at the end - result: `<em>*Note: Fingerprints (#<hash>) ... findings.</em>` with a stray `*` rendered as a literal asterisk The golden file [tests/golden/basic-agent-context.md](https://git.pdurlej.com/pdurlej/fallow-ts/src/commit/1846b30af17d0aa9a3d0ab916ea7d471f13b331d/tests/golden/basic-agent-context.md) confirms the typo lands in the snapshot. Once merged, this will appear in EVERY agent-context output emitted by fallow-ts until it gets fixed in a follow-up. Agents consuming agent-context format will see weirdly-rendered markdown on every run. Likely intent: `**Note:**` bold prefix, or just `_Note: ..._` italic without the asterisk. One-line fix: ```ts "_Note: Fingerprints (#<hash>) are stable, unique IDs used to discuss, review, or baseline findings._", ``` (...and update the golden file in the same commit.) Twelve seconds of work; saves a follow-up PR. Approving the merge anyway because the rest of the slice is good and I don't want to gate it, but please catch this before it lands if you read this comment in time. ### Tier 2 — design tightening, not blocking **T2.1 The 50-threshold is a sharp UX cliff.** At 51 modules, agent-context users go from seeing every module to seeing zero modules with no middle ground. A 200-module project gets the same suppression message as a 51-module project, but the cost of cap-then-show-zero is very different. Alternatives (for the follow-up diff-aware work): 1. **Top-N by relevance**: show modules with findings + first ~30 alphabetically. Always *something* in the section. 2. **Roll-up by directory**: `src/auth/ (12 modules, 4 with findings)` — preserves shape without listing every file. 3. **Always show top-30 + tail count**: `... and 170 more (use --format json)` — the elision idiom users already understand. PR body acknowledges this is interim before diff-aware lands. Worth a Tier-2 issue so the eventual diff-aware work knows it's also expected to fix the UX cliff, not just add diff filtering. Also: consider tightening the current message wording. Current: > `_Too many modules to list (60 modules analyzed). Use --format json to inspect the full list._` Reads as a tool excuse. Slightly more honest framing: > `_Module listing is suppressed above 50 modules; diff-aware display is on the roadmap. Use --format json for the full list._` Sets reader expectation that this is temporary policy, not permanent design. ### Tier 3 — nits **T3.1 Boundary case not tested.** The threshold check is `result.modules.length > 50`, so exactly 50 modules lists all, exactly 51 suppresses. A test at 50 and 51 would lock the boundary against future accidental `>=` / `>` flip. Truly nit-level. **T3.2 `assert.match(text, /- package.json: (Unexpected token|Expected property name)/)`** couples to v8's JSON.parse error string format. Future Node versions might phrase it a third way. More durable: `/- package\.json: \S/` (asserts the format-shape exists, doesn't pin the v8 wording). The current test ages predictably, but ages. ### Cross-cutting note The fingerprint note placement (between `## Findings` heading and the actual list) is the right structural choice — agents reading top-to-bottom will see the explanation before they see the `#abc123` IDs. The intent of the change is good. It's just the rendering that's bugged. Worth catching now because every agent-context output the system emits from now until fix-PR will carry the typo. That's a Heisenberg-effect on the data itself: agents that read fallow-ts output and then re-emit summaries will copy the broken markdown. ### Verdict `approve_merge`. Semantics correct, scope tight, tier-1 markdown typo asked for as in-PR amendment if Codex catches this in time. Otherwise a follow-up issue captures it. — `claude` (Opus 4.7), sister-project pass 🎸

CI / Node.js 22 (pull_request) Successful in 15s

Details

CI / test (push) Successful in 27s

Details

CI / Node.js 20 (push) Successful in 15s

Details

CI / Node.js 22 (push) Successful in 15s

Details

CI / test (pull_request) Successful in 27s

Details

CI / Node.js 20 (pull_request) Successful in 16s

Details

This pull request can be merged automatically.

You are not authorized to merge this pull request.

View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.

git fetch -u origin codex/gemini-agent-context-polish:codex/gemini-agent-context-polish

git switch codex/gemini-agent-context-polish

Merge

Merge the changes and update on Forgejo.

Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.

git switch main

git merge --no-ff codex/gemini-agent-context-polish

git switch codex/gemini-agent-context-polish

git rebase main

git switch main

git merge --ff-only codex/gemini-agent-context-polish

git switch codex/gemini-agent-context-polish

git rebase main

git switch main

git merge --no-ff codex/gemini-agent-context-polish

git switch main

git merge --squash codex/gemini-agent-context-polish