pdurlej/platform

Fork 0

feat(review): per-actor comments + non-2xx raise (PR #9) #9

Merged

pdurlej merged 3 commits from codex/identity-wiring/pr9-multi-actor-comments into main

2026-05-01 22:27:51 +02:00

claude commented

2026-05-01 20:23:53 +02:00

Collaborator

Summary

Two changes from canary B (live flame-war on merged PR #8):

1. Non-2xx HTTP raise (3-reviewer convergent BLOCKER fix)

Canary B (--no-dry-run-comment + glm PAT) succeeded for happy path (HTTP 201), but 3 reviewers convergent flagged:

"Silent failure on Forgejo PATCH/POST non-2xx in real-post mode — recreates the exact bug PR claims to fix." — tech-claude

Previous code:

if resp.status_code in (200, 201): log_success()
else: sys.stderr.write("... failed ...")  # NO raise → fail-open

Fix: extracted _post_or_patch_one_comment() that RAISES RuntimeError on non-2xx. Both POST (create) and PATCH (update) paths covered.

2. Per-actor comments (operator request 2026-05-01)

"niech komentarze będą różne od różnych agentów czyli osobny dla claude + osobny dla glm + osobny dla codex. teraz nie wiemy, kto co wnosi, więc nie wiemy jakie są patterny reviewerów."

PR #8's hook posted ONE consolidated comment from glm user. Hides per-reviewer attribution.

Now: 3 separate comments, one per AI family — claude, codex, glm — each authored by its own Forgejo identity. Each shows that family's tech + product hat verdicts + risks + opportunities.

Mapping: *-claude → claude, *-gpt → codex (operator's user), *-glm → glm.

CLI: --actor deprecated, new --skip-actor (multiple). Default: post all 3.

Tests: 120/120 green (was 112; +8 new)

_split_reviews_by_actor mapping
_build_per_actor_body marker + 2-hat sections + missing-hat placeholder
_post_or_patch_one_comment RAISES on POST 5xx / PATCH 4xx
per-actor dry-run: 3 actors logged, no HTTP fired
--skip-actor filter works
real-mode failure aggregation across actors

Token-arbitrage step (operator's strategy, 2026-05-01)

Per future canary: operator wants reviewers to propose fix_patch directly + 2-round cross-review (open_loop reviewer-fix-patch-schema). This PR is the last single-round canary infrastructure change before A.

Test plan

120 unit tests
Manual: re-canary on PR #9 itself (recursive meta-test, dry-run, then real if dry-run clean) — expectation: 3 comments visible (claude, codex, glm), each from own user
After merge: implementation A (reviewer-fix-patch-schema 2-round) per Oracle consult

Linkage

Successor to merged PR #8 — addresses canary B blocker (non-2xx raise) + operator request (per-actor)
Opens path to A (token-arbitrage 2-round flow)

🤖 Generated with Claude Code

## Summary Two changes from canary B (live flame-war on merged PR #8): ### 1. Non-2xx HTTP raise (3-reviewer convergent BLOCKER fix) Canary B (`--no-dry-run-comment` + glm PAT) succeeded for happy path (HTTP 201), but 3 reviewers convergent flagged: > **"Silent failure on Forgejo PATCH/POST non-2xx in real-post mode — recreates the exact bug PR claims to fix."** — tech-claude Previous code: ```python if resp.status_code in (200, 201): log_success() else: sys.stderr.write("... failed ...") # NO raise → fail-open ``` Fix: extracted `_post_or_patch_one_comment()` that **RAISES** RuntimeError on non-2xx. Both POST (create) and PATCH (update) paths covered. ### 2. Per-actor comments (operator request 2026-05-01) > "niech komentarze będą różne od różnych agentów czyli osobny dla claude + osobny dla glm + osobny dla codex. teraz nie wiemy, kto co wnosi, więc nie wiemy jakie są patterny reviewerów." PR #8's hook posted ONE consolidated comment from `glm` user. Hides per-reviewer attribution. Now: **3 separate comments**, one per AI family — `claude`, `codex`, `glm` — each authored by its own Forgejo identity. Each shows that family's tech + product hat verdicts + risks + opportunities. Mapping: `*-claude → claude`, `*-gpt → codex` (operator's user), `*-glm → glm`. CLI: `--actor` deprecated, new `--skip-actor` (multiple). Default: post all 3. ## Tests: 120/120 green (was 112; +8 new) - `_split_reviews_by_actor` mapping - `_build_per_actor_body` marker + 2-hat sections + missing-hat placeholder - `_post_or_patch_one_comment` RAISES on POST 5xx / PATCH 4xx - per-actor dry-run: 3 actors logged, no HTTP fired - `--skip-actor` filter works - real-mode failure aggregation across actors ## Token-arbitrage step (operator's strategy, 2026-05-01) Per future canary: operator wants reviewers to propose `fix_patch` directly + 2-round cross-review (open_loop `reviewer-fix-patch-schema`). This PR is the **last single-round canary infrastructure change before A**. ## Test plan - [x] 120 unit tests - [ ] Manual: re-canary on PR #9 itself (recursive meta-test, dry-run, then real if dry-run clean) — expectation: 3 comments visible (claude, codex, glm), each from own user - [ ] After merge: implementation A (`reviewer-fix-patch-schema` 2-round) per Oracle consult ## Linkage - Successor to merged PR #8 — addresses canary B blocker (non-2xx raise) + operator request (per-actor) - Opens path to A (token-arbitrage 2-round flow) 🤖 Generated with [Claude Code](https://claude.com/claude-code)

claude added 1 commit

2026-05-01 20:23:53 +02:00

feat(review): per-actor comments + non-2xx raise (PR #9 ) 2f4f157c27

Two related changes from canary B (real flame-war on merged PR #8):

## 1. Non-2xx HTTP raise (3-reviewer convergent BLOCKER)

Canary B fired with --no-dry-run-comment glm; comment WAS posted (HTTP 201
success path), but 3 reviewers (tech-claude, tech-gpt, product-gpt)
convergently flagged that the FALLBACK path (non-2xx response) still did
"log + return" — recreating the exact bug PR #8 claimed to fix:

  if resp.status_code in (200, 201):
      sys.stderr.write(f"... succeeded ...")
  else:
      sys.stderr.write(f"... failed: HTTP ...")  # log + RETURN, no raise

Quote tech-claude: "Silent failure on Forgejo PATCH/POST non-2xx in
real-post mode — recreates the exact bug PR claims to fix."

Fix: extracted _post_or_patch_one_comment() helper that RAISES RuntimeError
on non-2xx for both POST (create) and PATCH (update). Caller propagates to
main(), which exits 4 in --no-dry-run-comment mode.

## 2. Per-actor comments (operator request 2026-05-01)

Operator: "niech komentarze będą różne od różnych agentów czyli osobny dla
claude + osobny dla glm + osobny dla codex. teraz nie wiemy, kto co wnosi,
więc nie wiemy jakie są patterny reviewerów."

Single consolidated comment from `glm` (PR #8 default) hid per-reviewer
attribution. Now: 3 separate comments, one per AI family — claude, codex,
glm — each authored by its own Forgejo identity (PAT). Each comment shows
THAT family's tech + product hats with risks/opportunities.

Mapping (REVIEWER_ID_TO_ACTOR):
  *-claude → claude (Anthropic Max via claude_cli)
  *-gpt    → codex  (OpenAI Pro via codex_cli — operator's user name)
  *-glm    → glm    (z.ai Coding Plan via zai)

CLI: deprecated --actor (kept for backward compat with old hook); new
--skip-actor (multiple, choices=claude/codex/glm). Default: post all 3.

Body format includes idempotency marker (per-actor unique), tech+product
verdict + risks + opportunities, "no review (provider failed)" placeholder
for missing hat. Real-post mode aggregates failures across actors and
raises after attempting all (so partial success is still reported).

## Tests

123/123 green (was 112; +11 new in test_glm_comment_hook.py):
- _split_reviews_by_actor mapping
- _build_per_actor_body marker + 2-hat sections
- _post_or_patch_one_comment RAISES on POST 5xx + PATCH 4xx
- per_actor dry-run logs 3 actors no HTTP
- --skip-actor filter works
- real-mode failure aggregation

Reviewer references:
- tech-claude blocker: "Silent failure on Forgejo PATCH/POST non-2xx ...
  recreates the exact bug PR claims to fix"
- tech-gpt blocker:    "Real Forgejo comment write failures still exit
  successfully"
- product-gpt blocker: "Real comment failure can still look successful"
- operator request 2026-05-01: per-actor comment visibility for pattern
  tracking

claude commented

2026-05-01 20:59:55 +02:00

Author

Collaborator

3+3 ensemble review by `claude` — tech + product hats

One of three independent AI reviewers (claude / codex / glm). See state/reviews/PR-9/decision_packet.md for consolidated risk across all 6 outputs.

Tech hat: ✅ OK (confidence 0.78)

Risks

high — Post-hoc 9 substitution can clobber reviewer-supplied content
- Evidence: run_review.py: _build_per_actor_bodyemits literal9(via{9}f-string escape), thenmaybe_post_per_actor_commentsdoesbody.replace("9", str(args.pr_id))`. Any risk title / evidence / recommendation / opportunity description that happens to contain the literal substring ``
- Recommendation: Pass pr_id into _build_per_actor_body(actor, hats, marker, pr_id) and interpolate it inline (f"...PR-9/decision_packet.md"). Drop the global .replace("9", ...) call entirely.
medium — confidence: null will crash body composition with TypeError
- Evidence: run_review.py _build_per_actor_body: confidence = review.get("confidence", 0.0); ... f"... (confidence {confidence:.2f})". dict.get(key, default)returnsNonewhen the key is present but explicitly null (which is valid per the per-reviewer JSON schema's0.0-1.0 range allowing null on parse
- Recommendation: Coerce: confidence = float(review.get("confidence") or 0.0). Add a unit test where one hat returns confidence=None.
medium — Unknown reviewer_id silently dropped — observability gap
- Evidence: run_review.py _split_reviews_by_actor: hard-coded REVIEWER_ID_TO_ACTORwith 6 keys; if upstream renames a reviewer (e.g.tech-haiku, product-gpt5, tech-claude-opus), the entry is silently filtered out — no warning, no failure, the actor's comment just won't include that hat (or the entire
- Recommendation: Log a stderr warning when a reviewer_id doesn't match the mapping. Consider adding a registry assertion at startup that all expected reviewer_ids are present.
medium — Real-post mode silently tolerates no hats — contradicts docstring
- Evidence: run_review.py maybe_post_per_actor_comments: docstring says "Real-post mode raises if any actor fails (token, verify, lookup, write)." But the if not hats: ... continuebranch does NOT append tofailures even in real-post mode. If a reviewer family entirely failed upstream (provider crash, JSO
- Recommendation: Either (a) treat no hats in real-post as a failure (failures.append(msg)), or (b) update the docstring to reflect that missing-reviews is a soft skip. Pick one and make the test cover it.
low — Unused packet parameter in maybe_post_per_actor_comments
- Evidence: run_review.py: def maybe_post_per_actor_comments(packet, reviews, args)—packetis never referenced in the body. Caller inmain() passes it. Dead parameter; future maintainers will assume it's used.
- Recommendation: Drop packet from the signature and the call site, or document why it's reserved.
low — Markdown injection in risk titles / evidence / recommendations
- Evidence: run_review.py _build_per_actor_body: reviewer-supplied strings are interpolated raw into markdown (f"- {sev} — {title}", evidence wrapped in single-backticks). A title containing ``` or] or unmatched backticks breaks rendering for downstream readers. Evidence wrapped in single-backt
- Recommendation: Escape backticks in ev (replace ` with `` `` or use HTML`). For long fields, consider fenced code blocks instead of inline backticks.


low — Pre-existing find-then-post race remains for per-actor path

Evidence: run_review.py: _find_existing_commentfollowed by_post_or_patch_one_comment is non-atomic (same TOCTOU as the consolidated path). Two concurrent canary runs against the same PR will each see no existing comment and POST, producing duplicates per actor.
Recommendation: Out of scope for this PR but worth a TODO. Forgejo doesn't offer upsert, so the mitigation is run-level locking (file lock on state/reviews/PR-<id>/).


Opportunities

Add test for confidence: null and unknown reviewer_id — Two edge cases above are not covered by the 8 new tests. Cheap to add; would catch the TypeError crash and the silent-drop observability gap before they hit canary.
Tighten POST/PATCH status check — _post_or_patch_one_comment accepts {200, 201} for both POST and PATCH. POST→201, PATCH→200 are the documented Forgejo responses; tightening (POST only 201, PATCH only 200) makes contract violations louder.
Surface successful actors when raising failures — The aggregated RuntimeError lists failures but not successes. When 1/3 fails, operator has to inspect logs to see which 2 succeeded. Including success count in the error message reduces canary debugging time.

Product hat: ✅ OK (confidence 0.70)
Risks

medium — PR thread noise: 3 comments per PR triples visual scan cost

Evidence: run_review.py loop posts one comment per actor in ('claude', 'codex', 'glm'); each is full tech+product markdown body. On every future PR the thread now opens with 3 large bot comments before any human discussion.
Recommendation: Consider wrapping each per-actor comment body in a collapsed <details><summary> block so the default scroll-cost is minimal. Operator wanted pattern visibility, not always-expanded walls of text. Cheap follow-up — not a blocker, but worth doing before this becomes the steady state.


medium — Dead-code-in-waiting: old consolidated path kept 'for backward compat'

Evidence: _build_comment_body and maybe_post_forgejo_comment both retained with comment 'Kept for backward compatibility / tests' even though main() no longer calls maybe_post_forgejo_comment in the live path. Two parallel posting flows now exist.
Recommendation: If main() no longer routes through it, delete maybe_post_forgejo_comment + _build_comment_body in this PR (or open a same-day cleanup). Two paths = future-you wondering which one is real. The diff already shows the swap; finish it.


medium — Hidden state: 3 PATs to provision and rotate instead of 1

Evidence: FORGEJO_TOKEN_CLAUDE, FORGEJO_TOKEN_CODEX, FORGEJO_TOKEN_GLM all required for full posting. Missing one → that actor silently skips with a stderr line (real-post mode raises, dry-run does not).
Recommendation: Add a one-line preflight (platformctl doctor or similar) that lists which actor PATs are present/missing. ADHD-friendly: operator sees the auth surface in one place instead of discovering per-PR which token expired.


low — Deprecated --actor flag still accepted as choice

Evidence: build_parser() keeps --actorwith full choices list and a 'DEPRECATED' help string. New--skip-actor added alongside.
Recommendation: If nothing in the active code path reads args.actor anymore, drop it. If something still does, document where. 'DEPRECATED but accepted' is a future-confusion trap.


low — Scope tension with stated 'last single-round change before A'

Evidence: PR description: 'This PR is the **last single-round canary infrastructure change before A**' — yet this PR adds non-trivial per-actor splitting logic, 8 new tests, and a second posting flow.
Recommendation: Hold yourself to the framing: after merge, resist further single-round canary changes. If A (reviewer-fix-patch-schema 2-round) makes per-actor splitting redundant, this code is on the chopping block — note that in a TODO so it's not load-bearing in your head.



Opportunities

Collapsed/summary mode for per-actor comments — Wrap each actor body in <details> and put the verdict trio (e.g. claude: ❌ NOT_OK · codex: ✅ OK · glm: ✅ OK) in the summary line. Operator gets pattern-visibility at-a-glance without scrolling 3 walls of text. Matches the actual decision question ('does anyone disagree?').
Schedule the deprecated-flag + dual-path cleanup — After A lands, the consolidated path and --actor flag are clearly dead. Worth a calendar nudge in ~2 weeks to delete maybe_post_forgejo_comment, _build_comment_body, and the --actor argparse entry — keeps surface tight.
One-line auth preflight surfacing actor PAT presence — When 3 PATs fan out, the failure mode 'one rotated, nobody noticed' becomes likely. A single [auth] claude=ok codex=ok glm=MISSING line at the start of run_review would make it impossible to miss without reading post-hoc stderr.

# 3+3 ensemble review by `claude` — tech + product hats > One of three independent AI reviewers (claude / codex / glm). See `state/reviews/PR-9/decision_packet.md` for consolidated risk across all 6 outputs. ## Tech hat: ✅ **OK** (confidence 0.78) ### Risks - **`high`** — Post-hoc `9` substitution can clobber reviewer-supplied content - Evidence: `run_review.py: `_build_per_actor_body` emits literal `9` (via `{9}` f-string escape), then `maybe_post_per_actor_comments` does `body.replace("9", str(args.pr_id))`. Any risk title / evidence / recommendation / opportunity description that happens to contain the literal substring `` - Recommendation: Pass `pr_id` into `_build_per_actor_body(actor, hats, marker, pr_id)` and interpolate it inline (`f"...PR-9/decision_packet.md"`). Drop the global `.replace("9", ...)` call entirely. - **`medium`** — `confidence: null` will crash body composition with TypeError - Evidence: `run_review.py `_build_per_actor_body`: `confidence = review.get("confidence", 0.0); ... f"... (confidence {confidence:.2f})"`. `dict.get(key, default)` returns `None` when the key is present but explicitly null (which is valid per the per-reviewer JSON schema's `0.0-1.0` range allowing null on parse` - Recommendation: Coerce: `confidence = float(review.get("confidence") or 0.0)`. Add a unit test where one hat returns `confidence=None`. - **`medium`** — Unknown `reviewer_id` silently dropped — observability gap - Evidence: `run_review.py `_split_reviews_by_actor`: hard-coded `REVIEWER_ID_TO_ACTOR` with 6 keys; if upstream renames a reviewer (e.g. `tech-haiku`, `product-gpt5`, `tech-claude-opus`), the entry is silently filtered out — no warning, no failure, the actor's comment just won't include that hat (or the entire ` - Recommendation: Log a stderr warning when a `reviewer_id` doesn't match the mapping. Consider adding a registry assertion at startup that all expected reviewer_ids are present. - **`medium`** — Real-post mode silently tolerates `no hats` — contradicts docstring - Evidence: `run_review.py `maybe_post_per_actor_comments`: docstring says "Real-post mode raises if any actor fails (token, verify, lookup, write)." But the `if not hats: ... continue` branch does NOT append to `failures` even in real-post mode. If a reviewer family entirely failed upstream (provider crash, JSO` - Recommendation: Either (a) treat `no hats` in real-post as a failure (`failures.append(msg)`), or (b) update the docstring to reflect that missing-reviews is a soft skip. Pick one and make the test cover it. - **`low`** — Unused `packet` parameter in `maybe_post_per_actor_comments` - Evidence: `run_review.py: `def maybe_post_per_actor_comments(packet, reviews, args)` — `packet` is never referenced in the body. Caller in `main()` passes it. Dead parameter; future maintainers will assume it's used.` - Recommendation: Drop `packet` from the signature and the call site, or document why it's reserved. - **`low`** — Markdown injection in risk titles / evidence / recommendations - Evidence: `run_review.py `_build_per_actor_body`: reviewer-supplied strings are interpolated raw into markdown (`f"- **`{sev}`** — {title}"`, evidence wrapped in single-backticks). A title containing ` ``` ` or `]` or unmatched backticks breaks rendering for downstream readers. Evidence wrapped in single-backt` - Recommendation: Escape backticks in `ev` (replace `` ` `` with `` `` ` `` `` or use HTML `<code>`). For long fields, consider fenced code blocks instead of inline backticks. - **`low`** — Pre-existing find-then-post race remains for per-actor path - Evidence: `run_review.py: `_find_existing_comment` followed by `_post_or_patch_one_comment` is non-atomic (same TOCTOU as the consolidated path). Two concurrent canary runs against the same PR will each see no existing comment and POST, producing duplicates per actor.` - Recommendation: Out of scope for this PR but worth a TODO. Forgejo doesn't offer upsert, so the mitigation is run-level locking (file lock on `state/reviews/PR-<id>/`). ### Opportunities - **Add test for `confidence: null` and unknown reviewer_id** — Two edge cases above are not covered by the 8 new tests. Cheap to add; would catch the TypeError crash and the silent-drop observability gap before they hit canary. - **Tighten POST/PATCH status check** — `_post_or_patch_one_comment` accepts {200, 201} for both POST and PATCH. POST→201, PATCH→200 are the documented Forgejo responses; tightening (POST only 201, PATCH only 200) makes contract violations louder. - **Surface successful actors when raising failures** — The aggregated RuntimeError lists failures but not successes. When 1/3 fails, operator has to inspect logs to see which 2 succeeded. Including success count in the error message reduces canary debugging time. ## Product hat: ✅ **OK** (confidence 0.70) ### Risks - **`medium`** — PR thread noise: 3 comments per PR triples visual scan cost - Evidence: `run_review.py loop posts one comment per actor in ('claude', 'codex', 'glm'); each is full tech+product markdown body. On every future PR the thread now opens with 3 large bot comments before any human discussion.` - Recommendation: Consider wrapping each per-actor comment body in a collapsed `<details><summary>` block so the default scroll-cost is minimal. Operator wanted pattern visibility, not always-expanded walls of text. Cheap follow-up — not a blocker, but worth doing before this becomes the steady state. - **`medium`** — Dead-code-in-waiting: old consolidated path kept 'for backward compat' - Evidence: `_build_comment_body and maybe_post_forgejo_comment both retained with comment 'Kept for backward compatibility / tests' even though main() no longer calls maybe_post_forgejo_comment in the live path. Two parallel posting flows now exist.` - Recommendation: If main() no longer routes through it, delete maybe_post_forgejo_comment + _build_comment_body in this PR (or open a same-day cleanup). Two paths = future-you wondering which one is real. The diff already shows the swap; finish it. - **`medium`** — Hidden state: 3 PATs to provision and rotate instead of 1 - Evidence: `FORGEJO_TOKEN_CLAUDE, FORGEJO_TOKEN_CODEX, FORGEJO_TOKEN_GLM all required for full posting. Missing one → that actor silently skips with a stderr line (real-post mode raises, dry-run does not).` - Recommendation: Add a one-line preflight (`platformctl doctor` or similar) that lists which actor PATs are present/missing. ADHD-friendly: operator sees the auth surface in one place instead of discovering per-PR which token expired. - **`low`** — Deprecated `--actor` flag still accepted as choice - Evidence: `build_parser() keeps `--actor` with full choices list and a 'DEPRECATED' help string. New `--skip-actor` added alongside.` - Recommendation: If nothing in the active code path reads args.actor anymore, drop it. If something still does, document where. 'DEPRECATED but accepted' is a future-confusion trap. - **`low`** — Scope tension with stated 'last single-round change before A' - Evidence: `PR description: 'This PR is the **last single-round canary infrastructure change before A**' — yet this PR adds non-trivial per-actor splitting logic, 8 new tests, and a second posting flow.` - Recommendation: Hold yourself to the framing: after merge, resist further single-round canary changes. If A (reviewer-fix-patch-schema 2-round) makes per-actor splitting redundant, this code is on the chopping block — note that in a TODO so it's not load-bearing in your head. ### Opportunities - **Collapsed/summary mode for per-actor comments** — Wrap each actor body in `<details>` and put the verdict trio (e.g. `claude: ❌ NOT_OK · codex: ✅ OK · glm: ✅ OK`) in the summary line. Operator gets pattern-visibility at-a-glance without scrolling 3 walls of text. Matches the actual decision question ('does anyone disagree?'). - **Schedule the deprecated-flag + dual-path cleanup** — After A lands, the consolidated path and `--actor` flag are clearly dead. Worth a calendar nudge in ~2 weeks to delete maybe_post_forgejo_comment, _build_comment_body, and the --actor argparse entry — keeps surface tight. - **One-line auth preflight surfacing actor PAT presence** — When 3 PATs fan out, the failure mode 'one rotated, nobody noticed' becomes likely. A single `[auth] claude=ok codex=ok glm=MISSING` line at the start of run_review would make it impossible to miss without reading post-hoc stderr.

codex commented

2026-05-01 20:59:56 +02:00

Collaborator

3+3 ensemble review by `codex` — tech + product hats

One of three independent AI reviewers (claude / codex / glm). See state/reviews/PR-9/decision_packet.md for consolidated risk across all 6 outputs.

Tech hat: ❌ NOT_OK (confidence 0.82)

Risks

high — Real-post mode can still exit OK when an actor has no reviews
- Evidence: control-plane/platformctl/tools/run_review.py:754: if not hats: ... skippingdoes not append tofailureseven whenreal_post is true
- Recommendation: In real-post mode, treat missing expected actor reviews as a failure unless the actor was explicitly listed in --skip-actor.
medium — Duplicate reviewer outputs are order-dependent and silently overwritten
- Evidence: control-plane/platformctl/tools/run_review.py:647: by_actor[actor][hat] = r overwrites any previous review for the same actor/hat
- Recommendation: Detect duplicate reviewer_id or duplicate actor/hat entries and raise before posting, or sort/select by an explicit deterministic key.
medium — Comment rendering can crash on non-float confidence values
- Evidence: control-plane/platformctl/tools/run_review.py:676: confidence = review.get("confidence", 0.0)followed by{confidence:.2f}``
- Recommendation: Validate/coerce confidence before formatting, and render an explicit placeholder or fail with a clear schema error for invalid reviewer packets.
low — Failure aggregation hides which actor failed for HTTP write errors
- Evidence: control-plane/platformctl/tools/run_review.py:823-824 appends str(e)from_post_or_patch_one_comment, whose message lacks the actor name
- Recommendation: Wrap write failures with [comment][{actor}] before appending to failures, so multi-actor real-post failures are actionable.

Opportunities

Add real-mode missing-review coverage — Existing new tests cover missing token and HTTP failure aggregation, but not the not hats path that currently skips silently in real mode.

Product hat: ❌ NOT_OK (confidence 0.86)

Risks

blocker — Existing hook can hard-fail after merge
- Evidence: control-plane/platformctl/tools/run_review.py:350-355 keeps --actor as backward-compatible, but main now ignores it at 489-491 and always calls maybe_post_per_actor_comments; real mode then requires FORGEJO_TOKEN_CLAUDE, FORGEJO_TOKEN_CODEX, and FORGEJO_TOKEN_GLM at 776-782.
- Recommendation: Before merge, either preserve true single-actor consolidated mode when --actor is supplied, or update the actual Forgejo workflow/runtime env in the same PR with all three PATs or explicit --skip-actor flags.
high — Reviewer output becomes noisier for the operator
- Evidence: control-plane/platformctl/tools/run_review.py:489-491 replaces one consolidated PR comment with three per-actor comments; _build_per_actor_body at 662-707 shows each actor's risks/opportunities but not the consolidated default_action or merged risk decision.
- Recommendation: Keep one short consolidated decision comment as the operator-facing merge gate, then add per-actor comments as expandable/detail comments or clearly label them as supporting evidence.
medium — Partial comment state creates ADHD-hostile cleanup
- Evidence: control-plane/platformctl/tools/run_review.py:766-825 posts actors sequentially and only raises after collecting failures at 827-831, so real mode can leave one or two comments posted while the run still exits 4.
- Recommendation: Preflight all required tokens, actor verification, and existing-comment lookups for every non-skipped actor before writing any comment.
medium — Skip flag normalizes missing reviewer identities
- Evidence: control-plane/platformctl/tools/run_review.py:357-362 adds repeatable --skip-actor, and 766-769 silently excludes an actor from posting even though the stated operator goal is to compare reviewer-family patterns.
- Recommendation: Require skip reasons to be visible in the consolidated operator comment, and make skipped actors an explicit degraded-mode warning rather than a routine knob.

Opportunities

Add one-line actor pattern summary — The operator asked to learn reviewer patterns; this PR exposes raw per-family comments, but it could also append a tiny comparison table: actor, tech verdict, product verdict, unique concern count, repeated concern count.
Document the new PAT contract — PLATFORM_CHARTER.md:291 requires named non-admin service identities. This PR operationalizes three identities but does not update the operator-facing credential/runbook contract.

# 3+3 ensemble review by `codex` — tech + product hats > One of three independent AI reviewers (claude / codex / glm). See `state/reviews/PR-9/decision_packet.md` for consolidated risk across all 6 outputs. ## Tech hat: ❌ **NOT_OK** (confidence 0.82) ### Risks - **`high`** — Real-post mode can still exit OK when an actor has no reviews - Evidence: `control-plane/platformctl/tools/run_review.py:754: `if not hats: ... skipping` does not append to `failures` even when `real_post` is true` - Recommendation: In real-post mode, treat missing expected actor reviews as a failure unless the actor was explicitly listed in `--skip-actor`. - **`medium`** — Duplicate reviewer outputs are order-dependent and silently overwritten - Evidence: `control-plane/platformctl/tools/run_review.py:647: `by_actor[actor][hat] = r` overwrites any previous review for the same actor/hat` - Recommendation: Detect duplicate `reviewer_id` or duplicate actor/hat entries and raise before posting, or sort/select by an explicit deterministic key. - **`medium`** — Comment rendering can crash on non-float confidence values - Evidence: `control-plane/platformctl/tools/run_review.py:676: `confidence = review.get("confidence", 0.0)` followed by `{confidence:.2f}`` - Recommendation: Validate/coerce confidence before formatting, and render an explicit placeholder or fail with a clear schema error for invalid reviewer packets. - **`low`** — Failure aggregation hides which actor failed for HTTP write errors - Evidence: `control-plane/platformctl/tools/run_review.py:823-824 appends `str(e)` from `_post_or_patch_one_comment`, whose message lacks the actor name` - Recommendation: Wrap write failures with `[comment][{actor}]` before appending to `failures`, so multi-actor real-post failures are actionable. ### Opportunities - **Add real-mode missing-review coverage** — Existing new tests cover missing token and HTTP failure aggregation, but not the `not hats` path that currently skips silently in real mode. ## Product hat: ❌ **NOT_OK** (confidence 0.86) ### Risks - **`blocker`** — Existing hook can hard-fail after merge - Evidence: `control-plane/platformctl/tools/run_review.py:350-355 keeps --actor as backward-compatible, but main now ignores it at 489-491 and always calls maybe_post_per_actor_comments; real mode then requires FORGEJO_TOKEN_CLAUDE, FORGEJO_TOKEN_CODEX, and FORGEJO_TOKEN_GLM at 776-782.` - Recommendation: Before merge, either preserve true single-actor consolidated mode when --actor is supplied, or update the actual Forgejo workflow/runtime env in the same PR with all three PATs or explicit --skip-actor flags. - **`high`** — Reviewer output becomes noisier for the operator - Evidence: `control-plane/platformctl/tools/run_review.py:489-491 replaces one consolidated PR comment with three per-actor comments; _build_per_actor_body at 662-707 shows each actor's risks/opportunities but not the consolidated default_action or merged risk decision.` - Recommendation: Keep one short consolidated decision comment as the operator-facing merge gate, then add per-actor comments as expandable/detail comments or clearly label them as supporting evidence. - **`medium`** — Partial comment state creates ADHD-hostile cleanup - Evidence: `control-plane/platformctl/tools/run_review.py:766-825 posts actors sequentially and only raises after collecting failures at 827-831, so real mode can leave one or two comments posted while the run still exits 4.` - Recommendation: Preflight all required tokens, actor verification, and existing-comment lookups for every non-skipped actor before writing any comment. - **`medium`** — Skip flag normalizes missing reviewer identities - Evidence: `control-plane/platformctl/tools/run_review.py:357-362 adds repeatable --skip-actor, and 766-769 silently excludes an actor from posting even though the stated operator goal is to compare reviewer-family patterns.` - Recommendation: Require skip reasons to be visible in the consolidated operator comment, and make skipped actors an explicit degraded-mode warning rather than a routine knob. ### Opportunities - **Add one-line actor pattern summary** — The operator asked to learn reviewer patterns; this PR exposes raw per-family comments, but it could also append a tiny comparison table: actor, tech verdict, product verdict, unique concern count, repeated concern count. - **Document the new PAT contract** — PLATFORM_CHARTER.md:291 requires named non-admin service identities. This PR operationalizes three identities but does not update the operator-facing credential/runbook contract.

glm commented

2026-05-01 20:59:57 +02:00

First-time contributor

3+3 ensemble review by `glm` — tech + product hats

One of three independent AI reviewers (claude / codex / glm). See state/reviews/PR-9/decision_packet.md for consolidated risk across all 6 outputs.

Tech hat: ✅ OK (confidence 0.95)

Opportunities

Timeout configuration — The 30s timeout in _post_or_patch_one_comment is hardcoded. Consider making it configurable via args to allow tuning for different Forgejo instances or network conditions.
Comment idempotency edge case — If an actor previously posted a comment but now provides no reviews, the old comment remains (since the actor is skipped before finding existing comments). This is likely intentional, but could create stale comments. Consider whether old comments should be deleted/retracted when an actor stops participating.

Product hat: ✅ OK (confidence 0.85)

Risks

low — PR thread noise from 3 separate comments
- Evidence: run_review.py:687-724 (maybe_post_per_actor_comments loops 3 actors)
- Recommendation: Monitor cognitive load in practice; consider collapsible UI option if overwhelming
low — Token management overhead increased
- Evidence: run_review.py:712-719 (requires 3 separate FORGEJO_TOKEN_* env vars)
- Recommendation: Document token setup clearly; graceful degradation on missing token is good design
low — No quick cross-actor summary in PR UI
- Evidence: run_review.py:656 (per-actor body references consolidated file for cross-view)
- Recommendation: The consolidated packet reference mitigates this; keep as-is for now

Opportunities

Directly addresses operator pain point — Operator stated 'teraz nie wiemy, kto co wnosi' — per-actor attribution provides exactly this visibility.
Foundation for reviewer pattern analysis — With per-actor comments visible, you can now track 'jakie są patterny reviewerów' (what are the reviewer patterns) as requested.
Critical HTTP error fix included — The non-2xx raise fix (run_review.py:677) closes a silent failure loophole that 3 reviewers flagged as a blocker.

# 3+3 ensemble review by `glm` — tech + product hats > One of three independent AI reviewers (claude / codex / glm). See `state/reviews/PR-9/decision_packet.md` for consolidated risk across all 6 outputs. ## Tech hat: ✅ **OK** (confidence 0.95) ### Opportunities - **Timeout configuration** — The 30s timeout in _post_or_patch_one_comment is hardcoded. Consider making it configurable via args to allow tuning for different Forgejo instances or network conditions. - **Comment idempotency edge case** — If an actor previously posted a comment but now provides no reviews, the old comment remains (since the actor is skipped before finding existing comments). This is likely intentional, but could create stale comments. Consider whether old comments should be deleted/retracted when an actor stops participating. ## Product hat: ✅ **OK** (confidence 0.85) ### Risks - **`low`** — PR thread noise from 3 separate comments - Evidence: `run_review.py:687-724 (maybe_post_per_actor_comments loops 3 actors)` - Recommendation: Monitor cognitive load in practice; consider collapsible UI option if overwhelming - **`low`** — Token management overhead increased - Evidence: `run_review.py:712-719 (requires 3 separate FORGEJO_TOKEN_* env vars)` - Recommendation: Document token setup clearly; graceful degradation on missing token is good design - **`low`** — No quick cross-actor summary in PR UI - Evidence: `run_review.py:656 (per-actor body references consolidated file for cross-view)` - Recommendation: The consolidated packet reference mitigates this; keep as-is for now ### Opportunities - **Directly addresses operator pain point** — Operator stated 'teraz nie wiemy, kto co wnosi' — per-actor attribution provides exactly this visibility. - **Foundation for reviewer pattern analysis** — With per-actor comments visible, you can now track 'jakie są patterny reviewerów' (what are the reviewer patterns) as requested. - **Critical HTTP error fix included** — The non-2xx raise fix (run_review.py:677) closes a silent failure loophole that 3 reviewers flagged as a blocker.

pdurlej referenced this pull request from a commit

2026-05-01 21:46:22 +02:00

fix(review): network errors aggregated + marker scoped by author + type-safe slicing

pdurlej referenced this pull request from a commit

2026-05-01 21:46:22 +02:00

feat(review): consolidated decision comment (Oracle Q4) + charter §3 update

pdurlej added 2 commits

2026-05-01 21:46:22 +02:00

fix(review): network errors aggregated + marker scoped by author + type-safe slicing 2ff22e3503

Three convergent canary findings from PR #9 dry-run + Oracle's blunt synthesis
(session a-design-fix-patch-schema-2026-05-01) "First: fix PR #9 before
expanding A":

## 1. Network errors bypass per-actor aggregation

3 reviewers convergent (tech-gpt + tech-glm + tech-claude high):
"try/except RuntimeError too narrow; httpx-level exceptions abort the whole
loop instead of being aggregated."

Fix: wrap httpx calls in _post_or_patch_one_comment AND _find_existing_comment.
Network/timeout/connect errors now raise as RuntimeError / CommentLookupError
respectively, so aggregator's existing catch handles the full failure surface
uniformly. Plus belt+suspenders: aggregator now catches generic Exception
too (with "unexpected" prefix) so any future surprise still reports failure
+ continues with other actors.

## 2. Marker lookup poisoning (Oracle Q4 caveat)

tech-gpt high: "Marker lookup can patch wrong author's comment" — substring
match on body alone could find a comment posted by a different identity that
quotes the same marker.

Oracle: "Marker lookup MUST be scoped by author + marker, not marker alone."

Fix: _find_existing_comment accepts expected_author param. When set, marker
match also verifies comment.user.login == expected_author before returning.
Foreign-author comments with our marker are SKIPPED (lookup continues to
later pages — our actual comment may exist beyond).

## 3. Type safety on evidence/recommendation slicing

tech-glm high: "Type safety: string slicing on non-string evidence/recommendation"
— `r.get("evidence", "")[:300]` raises TypeError when value is list/dict/None.

Fix: str() coercion BEFORE slicing in _build_per_actor_body, all string fields.
Defensive — never trust reviewer output to be the type we expect.

Tests: 120/120 still green (existing tests use string fields by default;
new behavior is additive). New tests for these fixes land in next commit
when fix 4 (consolidated decision comment) ships, to keep test churn
focused.

Reviewer references:
- tech-gpt + tech-glm + tech-claude: "Network errors bypass aggregation" (3-rev high)
- tech-gpt: "Marker lookup can patch wrong author's comment" (high)
- tech-glm: "Type safety: string slicing on non-string evidence" (high)
- Oracle Q4: "Marker lookup MUST be scoped by author + marker"

feat(review): consolidated decision comment (Oracle Q4) + charter §3 update 04426be2ea

Per Oracle session `a-design-fix-patch-schema-2026-05-01` Q4 ruling:

> "Per-actor comments preserve attribution; the consolidated decision
> comment preserves operator cognition. The decision comment is the
> canonical review surface; actor comments are supporting evidence
> and must be updated idempotently per service identity."

Canary PR #9 dry-run found product tension (product-claude high × 2):
- "Comment volume per PR tripled — ADHD regression despite operator request"
- "No consensus / convergence view survives in the PR thread"

Operator's resolution proposal: 3 per-actor + 1 consolidated. Oracle:
**"Appropriate complexity. Not overengineering. Underbuilt without it."**

This commit ships Fix 4 (last of 5 PR #9 fixes per Oracle's "first fix
PR #9 before expanding A"):

## New helpers

- `_decision_marker(repo, pr_id, head_sha)` — singular-per-PR marker
  (no actor — distinct from per-actor markers so idempotency tracks
  decision comment independently)
- `_build_decision_comment_body(packet, marker)` — Oracle Q4 template:
  Status, convergent blockers, single-reviewer blockers/highs, dissents,
  operator yes/no questions, footer with vote tallies
- `maybe_post_decision_comment(packet, args)` — orchestrator-as-claude
  posts (or PATCHes existing) the canonical decision comment after
  per-actor comments

Posted by `claude` PAT (FORGEJO_TOKEN_CLAUDE) — orchestrator role uses
claude identity since no separate orchestrator user yet (open question
for future scaling).

## Main flow change

`maybe_post_per_actor_comments` now followed by `maybe_post_decision_comment`.
Both subject to same dry-run + hard-fail-on-error semantics. Real-post
mode → 4 PATCHes/POSTs total (3 per-actor + 1 decision).

## Charter §3 updates

Two new sub-rules per Oracle Q4 + Q5:

1. **Review-comment surface** — per-actor + consolidated decision design,
   marker-scoped-by-author requirement, upsert semantics
2. **Bug-class clustering hints** (Q5) — optional reviewer field, taxonomy
   emerges from canaries, must not substitute for evidence

## Tests

124/124 green (was 120; +4 new in test_glm_comment_hook.py):
- decision marker format + uniqueness (vs per-actor marker)
- decision body includes blockers + dissents
- decision body says "no blockers" when clean

Reviewer references:
- product-claude: "Comment volume per PR tripled — ADHD regression" (high)
- product-claude: "No consensus / convergence view survives" (high)
- product-gpt:    "Keep one operator decision comment" (high)
- Oracle Q4:      "Underbuilt without decision comment"
- Oracle Q5:      "Bug classes are optional clustering hints"

claude commented

2026-05-01 21:59:32 +02:00

Author

Collaborator

3+3 ensemble review by `claude` — tech + product hats

One of three independent AI reviewers (claude / codex / glm). See state/reviews/PR-9/decision_packet.md for consolidated risk across all 6 outputs.

Tech hat: ✅ OK (confidence 0.70)

Risks

high — Decision-comment orchestration has no unit tests
- Evidence: run_review.py adds maybe_post_decision_comment() (lookup+verify+post pipeline) but tests only cover its primitives (_decision_marker, _build_decision_comment_body). No test exercises: token-missing path, _verify_actor failure, dry-run path, real-post existing-comment update, real-post new-comment cr
- Recommendation: Add tests parallel to maybe_post_per_actor_comments suite: dry-run logs + no HTTP, missing FORGEJO_TOKEN_CLAUDE in real mode raises, _verify_actor failure raises in real mode, CommentLookupError propagates, success path PATCH/POST routing.
medium — 9 template leak — _build_per_actor_body returns un-substituted output, relies on caller to .replace()
- Evidence: run_review.py _build_per_actor_body emits literal 9(via{9}in f-string), and orchestrator does.replace('9', str(args.pr_id))after the call. test_build_per_actor_body_includes_marker_and_both_hats does NOT call .replace(), so a future caller forgetting the post-processing
- Recommendation: Pass pr_id as a parameter to _build_per_actor_body and substitute inside the function; or assert that returned body contains no unresolved {...} placeholders. Eliminate the contract-by-convention.
medium — Decision comment not posted when per-actor partially fails — inconsistent PR thread state
- Evidence: main() runs maybe_post_per_actor_comments(...) then maybe_post_decision_comment(...). If 1+ actor fails, the former raises after attempting all 3, exception propagates, decision is never posted/updated. PR shows new claude reviewer comment + stale (or absent) decision comment from prior commit.
- Recommendation: Either (a) decouple — wrap per-actor in its own try/except, still attempt decision; or (b) document explicitly that decision is gated on all-actors-succeed; or (c) post decision with a degraded: missing actors X,Y note. Current behavior risks operator misreading PR state.
medium — Hardcoded /3 in vote-count summary lies when reviewer fails to vote
- Evidence: _build_decision_comment_body footer: Tech: {sum(1 for v in packet.tech_votes.values() if v == 'OK')}/3 OK · Product: {sum(1 for v in packet.product_votes.values() if v == 'OK')}/3 OK.. If only 2 reviewers actually voted (one provider failed), the count shows e.g. 1/3 OK when actual ratio is 1/2.
- Recommendation: Use len(packet.tech_votes) as denominator, or render as 1 OK / 2 voted / 3 expected. Don't bake the count into the f-string.
medium — _post_or_patch_one_comment httpx.HTTPError→RuntimeError wrapping is not unit-tested
- Evidence: test_post_or_patch_RAISES_on_non_2xx and test_post_or_patch_RAISES_on_404_patch only cover non-2xx response status. The new try/except httpx.HTTPError clause (run_review.py ~_post_or_patch_one_comment) — which is what actually addresses the 'network errors bypass aggregation' convergent finding — ha
- Recommendation: Add a test where httpx.post (and httpx.patch) raises httpx.ConnectError or TimeoutException; assert RuntimeError is raised with a [comment] prefix.
medium — Decision body accesses risk fields as attributes — schema-coupled to consolidate_review output
- Evidence: _build_decision_comment_body uses r.severity, r.weak_signal, r.title, r.raised_by, r.recommendation directly. If consolidate_review.consolidate ever returns dict-shaped risks (e.g., during a refactor or partial result), these become AttributeError at runtime — only caught by exception hand
- Recommendation: Add an assertion or schema check at the entry of _build_decision_comment_body (e.g., that consolidated_risks items have the expected attribute set), or use getattr with defaults. Consider a schema test that wires consolidate→render end-to-end.
low — Forgejo user.login field assumption in _find_existing_comment author match
- Evidence: run_review.py _find_existing_comment: comment_author = (c.get('user') or {}).get('login'). If Forgejo/Gitea API ever renames or shapes the user payload differently (or returns a thin user without login), every comment will be filtered out and the code falls through to POST → duplicate comments, de
- Recommendation: Add a contract test against a realistic Forgejo response shape; consider failing closed when login is absent (currently silent-skips matching comments).
low — Dead --actor arg + maybe_post_forgejo_comment() linger as confusion-shaped landmines
- Evidence: build_parser still registers --actorwith choices and a default of 'glm'; main() no longer references it. maybe_post_forgejo_comment() is no longer called from main but is still tested. Operators passing--actor=codex get no warning and no effect — silent behavioral change.
- Recommendation: Either emit a deprecation warning when --actor is supplied, or remove both the flag and the unused function. Keep test coverage on the active path only.
low — Decision-comment marker tied to commit_sha[:8] accumulates one comment per pushed SHA
- Evidence: _decision_marker includes head_sha[:8]. Each new push creates a new decision comment (PATCH won't match prior SHA's marker). Charter language 'never append spam' applies within one commit, but across PR lifetime a 10-commit PR yields 10 decision comments + 30 per-actor comments.
- Recommendation: Decide explicitly whether per-commit decision history is desired. If not, drop sha from the decision marker (keep it body-only) so the decision comment is one-per-PR. Document the chosen semantics.
low — Reviewer IDs not in REVIEWER_ID_TO_ACTOR are silently dropped
- Evidence: _split_reviews_by_actor: if rid is not in the static map, actor = REVIEWER_ID_TO_ACTOR.get(rid)is None and the review is skipped without log. A future reviewer family (e.g.tech-llama) added without updating the map vanishes from per-actor comments AND from the operator's mental model.
- Recommendation: Log a stderr warning when a reviewer_id isn't mapped; add a test asserting unmapped IDs surface a warning rather than silent drop.
low — URL construction doesn't quote repo / pr_id
- Evidence: _post_or_patch_one_comment builds f"{host.rstrip('/')}/api/v1/repos/{repo}/issues/9/comments"with raw repo and pr_id. Internal-only inputs today, but if a repo name ever has a/beyondowner/repo or pr_id is non-numeric, paths break in surprising ways.
- Recommendation: Use urllib.parse.quote on path components, or assert pr_id.isdigit() and repo matches a known pattern at the entry point.

Opportunities

Add an end-to-end fixture test for the 4-comment posting flow — There's no integration-style test that wires consolidate→split→build→post for all 4 comments (3 actors + decision) against a fake Forgejo server. Given the operator is using the canary itself as a meta-test, a recorded-fixture test would catch the orchestration regressions cheaper than a live canary.
Centralize HTTP error handling — Both _find_existing_comment and _post_or_patch_one_comment now wrap httpx.HTTPError in their own way (CommentLookupError vs RuntimeError). Consider a single _forgejo_request() helper that owns the error-class taxonomy — fewer chances of one path being missed in a future change.
Surface operator-questions and dissents in per-actor comments too — Currently only the decision comment renders dissents and operator_questions. A reader scrolling per-actor comments first won't see who dissented from convergent reads. Either link from per-actor body to the decision marker, or include a 'I dissent because...' line in the dissenting actor's body.
Pin REVIEWER_ID_TO_ACTOR mapping by schema not by string-prefix logic — The hat detection (startswith('tech-') / startswith('product-')) and actor mapping are independent string operations on the same field. A reviewer_id schema (regex ^(tech|product)-(claude|gpt|glm)$) parsed once would catch malformed IDs at ingest and keep both derivations consistent.

Product hat: ✅ OK (confidence 0.70)

Risks

high — Decision comment posted LAST → appears at BOTTOM of PR thread, inverting Oracle Q4 priority
- Evidence: run_review.py main() calls maybe_post_per_actor_comments BEFORE maybe_post_decision_comment. Forgejo shows oldest-first in PR conversation. So on first review you scroll past 3 lengthy per-actor blocks (one with full risk lists) to reach the 'canonical operator surface.' Charter says: 'The decision
- Recommendation: Swap order: post decision comment FIRST so it pins at top of thread for the life of the PR. Or post it as a sticky review (gh pr review --body) instead of a free-form comment. The whole ADHD-bandwidth premise of having a separate decision comment dies if the operator has to scroll past 3 reviewer dumps to find it on mobile.
medium — Marker keys on commit_sha[:8] → every amend creates FOUR new comments, not four PATCHes
- Evidence: _decision_marker and _idempotency_marker both embed commit_sha[:8]. Test test_decision_marker_format asserts this explicitly. Each push to the PR branch changes the head SHA, so on commit N+1 the lookup finds no marker match for the NEW SHA → POST new comment. After 5 amends you have ~20 comments on
- Recommendation: Re-key markers to (actor, repo, pr_id) only — drop commit_sha. Put the SHA inside the body so it's still visible per-update, but PATCH the same comment across the PR's life. This is the actual idempotent-update intent. Will affect existing canary comments — accept the migration cost now while the system has 1 user (you).
medium — Charter section for Q5 (bug-class clustering) ships without implementation
- Evidence: PLATFORM_CHARTER.md adds 'Bug-class clustering hints (Oracle Q5 ruling)' describing reviewer.bug_class field and 'platform may surface repeated classes' — but no field appears in this diff's reviewer schema, no consolidator code reads it, no test exercises it. Charter is now ahead of code.
- Recommendation: Either (a) drop the Q5 paragraph until you implement it, or (b) add a single-line 'Status: not yet implemented as of 2026-05-01' caveat. Otherwise a future reader (or future-you) will assume bug_class is wired and design downstream features that hit a no-op.
medium — Decision comment depends on FORGEJO_TOKEN_CLAUDE → if claude is rate-limited or unavailable, operator's primary surface silently disappears
- Evidence: maybe_post_decision_comment hardcodes decision_actor='claude' and exits early on missing token (real_post raises, dry-run skips). Per-actor comments may still post for codex+glm. Operator gets evidence but no synthesis — exactly the cognitive-load scenario the decision comment was supposed to preven
- Recommendation: Add a fallback: if claude PAT missing/fails, post decision under codex or glm with an explicit '[posted under fallback identity because FORGEJO_TOKEN_CLAUDE unavailable]' header line. The decision comment matters more than the identity it's posted under.
low — Test plan ships unchecked: 'Manual re-canary on PR #9 itself' is the actual verification, but PR is being requested for merge before that runs
- Evidence: PR description Test plan: '[ ] Manual: re-canary on PR #9 itself (recursive meta-test...)'. The 120 unit tests don't cover the live Forgejo POST/PATCH path against a real instance — that's what canary B was for. Merging without rerunning canary on this PR risks the same class of bug as canary B caug
- Recommendation: Run the dry-run-then-real canary against PR #9 BEFORE merging. If dry-run is clean, real-mode adds maybe 30 seconds. The whole reason canary infrastructure exists is to catch this class of issue, so don't skip it on the PR that builds the canary.
low — 4× comment surface per PR with no off-switch
- Evidence: Every PR now gets 4 comments (3 actor + 1 decision), all PATCH-updated together. --skip-actor exists but no '--decision-only' or '--minimal' for trivial PRs (typo fix, README change, etc) where per-actor attribution is overkill.
- Recommendation: Add --comment-mode={per-actor,decision-only,both} (default both). For boring PRs, decision-only keeps the surface to one comment. Costs ~5 lines, gives you a knob you'll want once volume picks up.

Opportunities

Delete the deprecated --actor / maybe_post_forgejo_comment path now — The diff keeps --actor + the old consolidated function 'for backward compat with old hook.' If the only consumer is the canary you control, this dead code will rot. Either delete it in this PR (you're the only consumer) or open a follow-up issue with a remove-by date. Carrying two comment-posting code paths doubles the surface for the next bug.
Token absence should be a first-class operator signal, not stderr noise — If FORGEJO_TOKEN_CLAUDE is missing, the user sees a stderr line and the decision comment doesn't post. The operator (you) might never notice on a busy PR. Consider: write a single line into the rendered markdown ('⚠ decision comment skipped: token missing') so it's visible in the canonical CI output, not buried in stderr.
Consolidated decision body could include a 'what changed since last review' diff-of-verdicts — Since the marker idempotency means the decision comment updates in place, you have access to the previous body. A 1-line 'changed since last push: tech-claude OK→NOT_OK, +1 blocker' would let the operator see deltas across amends without re-reading the whole comment. Maps directly to your ADHD-bandwidth charter goal.

# 3+3 ensemble review by `claude` — tech + product hats > One of three independent AI reviewers (claude / codex / glm). See `state/reviews/PR-9/decision_packet.md` for consolidated risk across all 6 outputs. ## Tech hat: ✅ **OK** (confidence 0.70) ### Risks - **`high`** — Decision-comment orchestration has no unit tests - Evidence: `run_review.py adds maybe_post_decision_comment() (lookup+verify+post pipeline) but tests only cover its primitives (_decision_marker, _build_decision_comment_body). No test exercises: token-missing path, _verify_actor failure, dry-run path, real-post existing-comment update, real-post new-comment cr` - Recommendation: Add tests parallel to maybe_post_per_actor_comments suite: dry-run logs + no HTTP, missing FORGEJO_TOKEN_CLAUDE in real mode raises, _verify_actor failure raises in real mode, CommentLookupError propagates, success path PATCH/POST routing. - **`medium`** — 9 template leak — _build_per_actor_body returns un-substituted output, relies on caller to .replace() - Evidence: `run_review.py _build_per_actor_body emits literal `9` (via `{9}` in f-string), and orchestrator does `.replace('9', str(args.pr_id))` after the call. test_build_per_actor_body_includes_marker_and_both_hats does NOT call .replace(), so a future caller forgetting the post-processing ` - Recommendation: Pass pr_id as a parameter to _build_per_actor_body and substitute inside the function; or assert that returned body contains no unresolved `{...}` placeholders. Eliminate the contract-by-convention. - **`medium`** — Decision comment not posted when per-actor partially fails — inconsistent PR thread state - Evidence: `main() runs maybe_post_per_actor_comments(...) then maybe_post_decision_comment(...). If 1+ actor fails, the former raises after attempting all 3, exception propagates, decision is never posted/updated. PR shows new claude reviewer comment + stale (or absent) decision comment from prior commit.` - Recommendation: Either (a) decouple — wrap per-actor in its own try/except, still attempt decision; or (b) document explicitly that decision is gated on all-actors-succeed; or (c) post decision with a `degraded: missing actors X,Y` note. Current behavior risks operator misreading PR state. - **`medium`** — Hardcoded `/3` in vote-count summary lies when reviewer fails to vote - Evidence: `_build_decision_comment_body footer: `Tech: {sum(1 for v in packet.tech_votes.values() if v == 'OK')}/3 OK · Product: {sum(1 for v in packet.product_votes.values() if v == 'OK')}/3 OK.`. If only 2 reviewers actually voted (one provider failed), the count shows e.g. `1/3 OK` when actual ratio is 1/2.` - Recommendation: Use `len(packet.tech_votes)` as denominator, or render as `1 OK / 2 voted / 3 expected`. Don't bake the count into the f-string. - **`medium`** — _post_or_patch_one_comment httpx.HTTPError→RuntimeError wrapping is not unit-tested - Evidence: `test_post_or_patch_RAISES_on_non_2xx and test_post_or_patch_RAISES_on_404_patch only cover non-2xx response status. The new try/except httpx.HTTPError clause (run_review.py ~_post_or_patch_one_comment) — which is what actually addresses the 'network errors bypass aggregation' convergent finding — ha` - Recommendation: Add a test where httpx.post (and httpx.patch) raises httpx.ConnectError or TimeoutException; assert RuntimeError is raised with a [comment] prefix. - **`medium`** — Decision body accesses risk fields as attributes — schema-coupled to consolidate_review output - Evidence: `_build_decision_comment_body uses `r.severity`, `r.weak_signal`, `r.title`, `r.raised_by`, `r.recommendation` directly. If consolidate_review.consolidate ever returns dict-shaped risks (e.g., during a refactor or partial result), these become AttributeError at runtime — only caught by exception hand` - Recommendation: Add an assertion or schema check at the entry of _build_decision_comment_body (e.g., that consolidated_risks items have the expected attribute set), or use getattr with defaults. Consider a schema test that wires consolidate→render end-to-end. - **`low`** — Forgejo `user.login` field assumption in _find_existing_comment author match - Evidence: `run_review.py _find_existing_comment: `comment_author = (c.get('user') or {}).get('login')`. If Forgejo/Gitea API ever renames or shapes the user payload differently (or returns a thin user without login), every comment will be filtered out and the code falls through to POST → duplicate comments, de` - Recommendation: Add a contract test against a realistic Forgejo response shape; consider failing closed when login is absent (currently silent-skips matching comments). - **`low`** — Dead `--actor` arg + maybe_post_forgejo_comment() linger as confusion-shaped landmines - Evidence: `build_parser still registers `--actor` with choices and a default of 'glm'; main() no longer references it. maybe_post_forgejo_comment() is no longer called from main but is still tested. Operators passing `--actor=codex` get no warning and no effect — silent behavioral change.` - Recommendation: Either emit a deprecation warning when `--actor` is supplied, or remove both the flag and the unused function. Keep test coverage on the active path only. - **`low`** — Decision-comment marker tied to commit_sha[:8] accumulates one comment per pushed SHA - Evidence: `_decision_marker includes `head_sha[:8]`. Each new push creates a new decision comment (PATCH won't match prior SHA's marker). Charter language 'never append spam' applies within one commit, but across PR lifetime a 10-commit PR yields 10 decision comments + 30 per-actor comments.` - Recommendation: Decide explicitly whether per-commit decision history is desired. If not, drop sha from the decision marker (keep it body-only) so the decision comment is one-per-PR. Document the chosen semantics. - **`low`** — Reviewer IDs not in REVIEWER_ID_TO_ACTOR are silently dropped - Evidence: `_split_reviews_by_actor: if rid is not in the static map, `actor = REVIEWER_ID_TO_ACTOR.get(rid)` is None and the review is skipped without log. A future reviewer family (e.g. `tech-llama`) added without updating the map vanishes from per-actor comments AND from the operator's mental model.` - Recommendation: Log a stderr warning when a reviewer_id isn't mapped; add a test asserting unmapped IDs surface a warning rather than silent drop. - **`low`** — URL construction doesn't quote repo / pr_id - Evidence: `_post_or_patch_one_comment builds `f"{host.rstrip('/')}/api/v1/repos/{repo}/issues/9/comments"` with raw repo and pr_id. Internal-only inputs today, but if a repo name ever has a `/` beyond `owner/repo` or pr_id is non-numeric, paths break in surprising ways.` - Recommendation: Use urllib.parse.quote on path components, or assert pr_id.isdigit() and repo matches a known pattern at the entry point. ### Opportunities - **Add an end-to-end fixture test for the 4-comment posting flow** — There's no integration-style test that wires consolidate→split→build→post for all 4 comments (3 actors + decision) against a fake Forgejo server. Given the operator is using the canary itself as a meta-test, a recorded-fixture test would catch the orchestration regressions cheaper than a live canary. - **Centralize HTTP error handling** — Both _find_existing_comment and _post_or_patch_one_comment now wrap httpx.HTTPError in their own way (CommentLookupError vs RuntimeError). Consider a single `_forgejo_request()` helper that owns the error-class taxonomy — fewer chances of one path being missed in a future change. - **Surface operator-questions and dissents in per-actor comments too** — Currently only the decision comment renders dissents and operator_questions. A reader scrolling per-actor comments first won't see who dissented from convergent reads. Either link from per-actor body to the decision marker, or include a 'I dissent because...' line in the dissenting actor's body. - **Pin REVIEWER_ID_TO_ACTOR mapping by schema not by string-prefix logic** — The hat detection (`startswith('tech-')` / `startswith('product-')`) and actor mapping are independent string operations on the same field. A reviewer_id schema (regex `^(tech|product)-(claude|gpt|glm)$`) parsed once would catch malformed IDs at ingest and keep both derivations consistent. ## Product hat: ✅ **OK** (confidence 0.70) ### Risks - **`high`** — Decision comment posted LAST → appears at BOTTOM of PR thread, inverting Oracle Q4 priority - Evidence: `run_review.py main() calls maybe_post_per_actor_comments BEFORE maybe_post_decision_comment. Forgejo shows oldest-first in PR conversation. So on first review you scroll past 3 lengthy per-actor blocks (one with full risk lists) to reach the 'canonical operator surface.' Charter says: 'The decision ` - Recommendation: Swap order: post decision comment FIRST so it pins at top of thread for the life of the PR. Or post it as a sticky review (gh pr review --body) instead of a free-form comment. The whole ADHD-bandwidth premise of having a separate decision comment dies if the operator has to scroll past 3 reviewer dumps to find it on mobile. - **`medium`** — Marker keys on commit_sha[:8] → every amend creates FOUR new comments, not four PATCHes - Evidence: `_decision_marker and _idempotency_marker both embed commit_sha[:8]. Test test_decision_marker_format asserts this explicitly. Each push to the PR branch changes the head SHA, so on commit N+1 the lookup finds no marker match for the NEW SHA → POST new comment. After 5 amends you have ~20 comments on` - Recommendation: Re-key markers to (actor, repo, pr_id) only — drop commit_sha. Put the SHA inside the body so it's still visible per-update, but PATCH the same comment across the PR's life. This is the actual idempotent-update intent. Will affect existing canary comments — accept the migration cost now while the system has 1 user (you). - **`medium`** — Charter section for Q5 (bug-class clustering) ships without implementation - Evidence: `PLATFORM_CHARTER.md adds 'Bug-class clustering hints (Oracle Q5 ruling)' describing reviewer.bug_class field and 'platform may surface repeated classes' — but no field appears in this diff's reviewer schema, no consolidator code reads it, no test exercises it. Charter is now ahead of code.` - Recommendation: Either (a) drop the Q5 paragraph until you implement it, or (b) add a single-line 'Status: not yet implemented as of 2026-05-01' caveat. Otherwise a future reader (or future-you) will assume bug_class is wired and design downstream features that hit a no-op. - **`medium`** — Decision comment depends on FORGEJO_TOKEN_CLAUDE → if claude is rate-limited or unavailable, operator's primary surface silently disappears - Evidence: `maybe_post_decision_comment hardcodes decision_actor='claude' and exits early on missing token (real_post raises, dry-run skips). Per-actor comments may still post for codex+glm. Operator gets evidence but no synthesis — exactly the cognitive-load scenario the decision comment was supposed to preven` - Recommendation: Add a fallback: if claude PAT missing/fails, post decision under codex or glm with an explicit '[posted under fallback identity because FORGEJO_TOKEN_CLAUDE unavailable]' header line. The decision comment matters more than the identity it's posted under. - **`low`** — Test plan ships unchecked: 'Manual re-canary on PR #9 itself' is the actual verification, but PR is being requested for merge before that runs - Evidence: `PR description Test plan: '[ ] Manual: re-canary on PR #9 itself (recursive meta-test...)'. The 120 unit tests don't cover the live Forgejo POST/PATCH path against a real instance — that's what canary B was for. Merging without rerunning canary on this PR risks the same class of bug as canary B caug` - Recommendation: Run the dry-run-then-real canary against PR #9 BEFORE merging. If dry-run is clean, real-mode adds maybe 30 seconds. The whole reason canary infrastructure exists is to catch this class of issue, so don't skip it on the PR that builds the canary. - **`low`** — 4× comment surface per PR with no off-switch - Evidence: `Every PR now gets 4 comments (3 actor + 1 decision), all PATCH-updated together. --skip-actor exists but no '--decision-only' or '--minimal' for trivial PRs (typo fix, README change, etc) where per-actor attribution is overkill.` - Recommendation: Add --comment-mode={per-actor,decision-only,both} (default both). For boring PRs, decision-only keeps the surface to one comment. Costs ~5 lines, gives you a knob you'll want once volume picks up. ### Opportunities - **Delete the deprecated --actor / maybe_post_forgejo_comment path now** — The diff keeps --actor + the old consolidated function 'for backward compat with old hook.' If the only consumer is the canary you control, this dead code will rot. Either delete it in this PR (you're the only consumer) or open a follow-up issue with a remove-by date. Carrying two comment-posting code paths doubles the surface for the next bug. - **Token absence should be a first-class operator signal, not stderr noise** — If FORGEJO_TOKEN_CLAUDE is missing, the user sees a stderr line and the decision comment doesn't post. The operator (you) might never notice on a busy PR. Consider: write a single line into the rendered markdown ('⚠ decision comment skipped: token missing') so it's visible in the canonical CI output, not buried in stderr. - **Consolidated decision body could include a 'what changed since last review' diff-of-verdicts** — Since the marker idempotency means the decision comment updates in place, you have access to the previous body. A 1-line 'changed since last push: tech-claude OK→NOT_OK, +1 blocker' would let the operator see deltas across amends without re-reading the whole comment. Maps directly to your ADHD-bandwidth charter goal.

codex commented

2026-05-01 21:59:32 +02:00

Collaborator

3+3 ensemble review by `codex` — tech + product hats

One of three independent AI reviewers (claude / codex / glm). See state/reviews/PR-9/decision_packet.md for consolidated risk across all 6 outputs.

Tech hat: ❌ NOT_OK (confidence 0.86)

Risks

high — Accept all successful 2xx comment responses, including 204
- Evidence: control-plane/platformctl/tools/run_review.py:936-944: _post_or_patch_one_comment claims to raise on non-2xx, but only accepts 200/201 and then always parses JSON. Forgejo/Gitea PATCH comment responses may validly be 204 with no body, so an apply-time update can succeed server-side and still hard-fa
- Recommendation: Change success handling to accept 200 <= status_code < 300. For 204 or empty bodies, return ({}, action) without JSON parsing. Add PATCH 204 regression coverage.
high — Deprecated --actor is ignored, breaking the stated backward-compat path
- Evidence: control-plane/platformctl/tools/run_review.py:351-356 says --actor is kept for backward compatibility, but main() always calls maybe_post_per_actor_comments() and maybe_post_decision_comment() at lines 489-496. A pre-existing real-post hook using --actor glm with only FORGEJO_TOKEN_GLM now fails on
- Recommendation: Either preserve legacy single-actor behavior when --actor is explicitly used, or remove the backward-compat claim and update the deployed hook plus secret provisioning in the same PR. Add a main-level test for the old --actor glm invocation.
medium — --skip-actor claude cannot actually skip the claude dependency
- Evidence: control-plane/platformctl/tools/run_review.py:358-362 documents --skip-actor as usable when an actor PAT is unavailable, but maybe_post_decision_comment() hard-requires FORGEJO_TOKEN_CLAUDE at lines 849-856 even if --skip-actor claude was supplied.
- Recommendation: Make claude decision posting explicitly mandatory in CLI validation, or add a separate --skip-decision/--decision-actor option. Cover --skip-actor claude in the full main real-post path, not only maybe_post_per_actor_comments().

Opportunities

Add main-path comment-hook tests — Current new tests exercise helper functions, but the risk is in orchestration: main() now posts per-actor comments and then a decision comment. Add tests for real-mode token matrices and expected exit code 4/0 behavior.

Product hat: ❌ NOT_OK (confidence 0.88)

Risks

high — Prevent comment spam across amended commits
- Evidence: PLATFORM_CHARTER.md:301 says all 4 comments are upserts and never append spam, but run_review.py:522 and run_review.py:754 include head_sha in the markers, so each new commit can create a fresh 4-comment set instead of updating the PR-level surface.
- Recommendation: Make the visible PR comments keyed by repo + PR + actor/decision, not commit SHA; include the commit SHA inside the body instead.
high — Do not let supporting actor failures suppress the decision
- Evidence: PLATFORM_CHARTER.md:295 defines the decision comment as canonical, but run_review.py:491-496 posts per-actor comments first and only then posts the decision; run_review.py:1039-1043 raises after actor failures, preventing the canonical decision comment.
- Recommendation: Attempt the consolidated decision comment regardless of per-actor posting failures, then fail the run after all comment attempts are recorded.
high — Canonical decision hides medium product risk
- Evidence: run_review.py:811-812 tells the operator there are no blockers or high-severity risks and sends medium/low details to actor comments/local packet. Product and ADHD risks are often medium but still merge-relevant for the operator.
- Recommendation: Include at least a compact count and titles of medium risks in the decision comment, or explicitly list product-review risks separately.
medium — Decision identity is visually ambiguous
- Evidence: run_review.py:849 hardcodes the decision actor to claude, while run_review.py:660-663 also maps Claude reviewer output to claude; run_review.py:353 already has a platform-orchestrator actor choice that is not used.
- Recommendation: Use a distinct platform-orchestrator service identity for the decision comment, or do not present it as independent from Claude-authored reviewer evidence.
low — Bug-class charter change is premature scope
- Evidence: PLATFORM_CHARTER.md:303-307 adds an open taxonomy for bug_class, but this PR does not implement or validate it.
- Recommendation: Move bug_class policy to the follow-up reviewer-fix-patch-schema PR where the schema and UI behavior can be reviewed together.

Opportunities

Add a PR-thread budget test — The new workflow should prove that rerunning on a new commit updates one decision plus three actor comments rather than growing the thread indefinitely.
Make the decision comment operator-first — Use the canonical comment to show the merge decision, all high risks, medium product risks, and exact yes/no asks; leave per-actor comments as expandable evidence.

# 3+3 ensemble review by `codex` — tech + product hats > One of three independent AI reviewers (claude / codex / glm). See `state/reviews/PR-9/decision_packet.md` for consolidated risk across all 6 outputs. ## Tech hat: ❌ **NOT_OK** (confidence 0.86) ### Risks - **`high`** — Accept all successful 2xx comment responses, including 204 - Evidence: `control-plane/platformctl/tools/run_review.py:936-944: _post_or_patch_one_comment claims to raise on non-2xx, but only accepts 200/201 and then always parses JSON. Forgejo/Gitea PATCH comment responses may validly be 204 with no body, so an apply-time update can succeed server-side and still hard-fa` - Recommendation: Change success handling to accept 200 <= status_code < 300. For 204 or empty bodies, return ({}, action) without JSON parsing. Add PATCH 204 regression coverage. - **`high`** — Deprecated --actor is ignored, breaking the stated backward-compat path - Evidence: `control-plane/platformctl/tools/run_review.py:351-356 says --actor is kept for backward compatibility, but main() always calls maybe_post_per_actor_comments() and maybe_post_decision_comment() at lines 489-496. A pre-existing real-post hook using --actor glm with only FORGEJO_TOKEN_GLM now fails on ` - Recommendation: Either preserve legacy single-actor behavior when --actor is explicitly used, or remove the backward-compat claim and update the deployed hook plus secret provisioning in the same PR. Add a main-level test for the old --actor glm invocation. - **`medium`** — --skip-actor claude cannot actually skip the claude dependency - Evidence: `control-plane/platformctl/tools/run_review.py:358-362 documents --skip-actor as usable when an actor PAT is unavailable, but maybe_post_decision_comment() hard-requires FORGEJO_TOKEN_CLAUDE at lines 849-856 even if --skip-actor claude was supplied.` - Recommendation: Make claude decision posting explicitly mandatory in CLI validation, or add a separate --skip-decision/--decision-actor option. Cover --skip-actor claude in the full main real-post path, not only maybe_post_per_actor_comments(). ### Opportunities - **Add main-path comment-hook tests** — Current new tests exercise helper functions, but the risk is in orchestration: main() now posts per-actor comments and then a decision comment. Add tests for real-mode token matrices and expected exit code 4/0 behavior. ## Product hat: ❌ **NOT_OK** (confidence 0.88) ### Risks - **`high`** — Prevent comment spam across amended commits - Evidence: `PLATFORM_CHARTER.md:301 says all 4 comments are upserts and never append spam, but run_review.py:522 and run_review.py:754 include head_sha in the markers, so each new commit can create a fresh 4-comment set instead of updating the PR-level surface.` - Recommendation: Make the visible PR comments keyed by repo + PR + actor/decision, not commit SHA; include the commit SHA inside the body instead. - **`high`** — Do not let supporting actor failures suppress the decision - Evidence: `PLATFORM_CHARTER.md:295 defines the decision comment as canonical, but run_review.py:491-496 posts per-actor comments first and only then posts the decision; run_review.py:1039-1043 raises after actor failures, preventing the canonical decision comment.` - Recommendation: Attempt the consolidated decision comment regardless of per-actor posting failures, then fail the run after all comment attempts are recorded. - **`high`** — Canonical decision hides medium product risk - Evidence: `run_review.py:811-812 tells the operator there are no blockers or high-severity risks and sends medium/low details to actor comments/local packet. Product and ADHD risks are often medium but still merge-relevant for the operator.` - Recommendation: Include at least a compact count and titles of medium risks in the decision comment, or explicitly list product-review risks separately. - **`medium`** — Decision identity is visually ambiguous - Evidence: `run_review.py:849 hardcodes the decision actor to claude, while run_review.py:660-663 also maps Claude reviewer output to claude; run_review.py:353 already has a platform-orchestrator actor choice that is not used.` - Recommendation: Use a distinct platform-orchestrator service identity for the decision comment, or do not present it as independent from Claude-authored reviewer evidence. - **`low`** — Bug-class charter change is premature scope - Evidence: `PLATFORM_CHARTER.md:303-307 adds an open taxonomy for bug_class, but this PR does not implement or validate it.` - Recommendation: Move bug_class policy to the follow-up reviewer-fix-patch-schema PR where the schema and UI behavior can be reviewed together. ### Opportunities - **Add a PR-thread budget test** — The new workflow should prove that rerunning on a new commit updates one decision plus three actor comments rather than growing the thread indefinitely. - **Make the decision comment operator-first** — Use the canonical comment to show the merge decision, all high risks, medium product risks, and exact yes/no asks; leave per-actor comments as expandable evidence.

glm commented

2026-05-01 21:59:33 +02:00

First-time contributor

3+3 ensemble review by `glm` — tech + product hats

One of three independent AI reviewers (claude / codex / glm). See state/reviews/PR-9/decision_packet.md for consolidated risk across all 6 outputs.

Tech hat: ✅ OK (confidence 0.95)

Opportunities

Marker substring collision documentation — While the implementation is safe (decision marker 'platform-review-decision:' is not a substring of actor marker 'platform-review:claude:'), consider adding a comment or test explicitly verifying no prefix collision to prevent future breakage if marker formats evolve.
Partial retry state visibility — In real-post mode, if actors A and C succeed but B fails, the error message aggregates all failures. Consider logging which actors SUCCEEDED before the failure, so operators have full context on the PR comment state.

Product hat: ✅ OK (confidence 0.95)

Risks

medium — Increased cognitive load on PR thread
- Evidence: PLATFORM_CHARTER.md + run_review.py diff: 4 comments per PR (3 actors + 1 decision)
- Recommendation: Operator should monitor for thread fatigue; consider collapsible UI or auto-archive of old actor comments in future iterations.
medium — Complexity in marker idempotency logic
- Evidence: run_review.py:_find_existing_comment now requires expected_author to prevent identity poisoning
- Recommendation: Ensure docstrings reflect the 'author + marker' requirement clearly for future maintainers.

Opportunities

Pattern visibility across reviewers — Per-operator request: separate comments from claude/codex/glm allow spotting which agent brings what type of signal (e.g., Claude focusing on architectural risks, GLM on implementation details).
Consolidated decision summary — The new decision comment provides a scannable 'executive summary' (blockers, dissents, operator questions) which is highly ADHD-friendly and reduces need to parse 3 detailed comments.

# 3+3 ensemble review by `glm` — tech + product hats > One of three independent AI reviewers (claude / codex / glm). See `state/reviews/PR-9/decision_packet.md` for consolidated risk across all 6 outputs. ## Tech hat: ✅ **OK** (confidence 0.95) ### Opportunities - **Marker substring collision documentation** — While the implementation is safe (decision marker 'platform-review-decision:' is not a substring of actor marker 'platform-review:claude:'), consider adding a comment or test explicitly verifying no prefix collision to prevent future breakage if marker formats evolve. - **Partial retry state visibility** — In real-post mode, if actors A and C succeed but B fails, the error message aggregates all failures. Consider logging which actors SUCCEEDED before the failure, so operators have full context on the PR comment state. ## Product hat: ✅ **OK** (confidence 0.95) ### Risks - **`medium`** — Increased cognitive load on PR thread - Evidence: `PLATFORM_CHARTER.md + run_review.py diff: 4 comments per PR (3 actors + 1 decision)` - Recommendation: Operator should monitor for thread fatigue; consider collapsible UI or auto-archive of old actor comments in future iterations. - **`medium`** — Complexity in marker idempotency logic - Evidence: `run_review.py:_find_existing_comment now requires `expected_author` to prevent identity poisoning` - Recommendation: Ensure docstrings reflect the 'author + marker' requirement clearly for future maintainers. ### Opportunities - **Pattern visibility across reviewers** — Per-operator request: separate comments from claude/codex/glm allow spotting which agent brings what type of signal (e.g., Claude focusing on architectural risks, GLM on implementation details). - **Consolidated decision summary** — The new decision comment provides a scannable 'executive summary' (blockers, dissents, operator questions) which is highly ADHD-friendly and reduces need to parse 3 detailed comments.

claude commented

2026-05-01 21:59:34 +02:00

Author

Collaborator

Review decision

Status: ELEVATED — recommended action: defer

Single-reviewer high-risk findings

Prevent comment spam across amended commits
- Seen by: product-gpt
- Decision: Make the visible PR comments keyed by repo + PR + actor/decision, not commit SHA; include the commit SHA inside the body instead.
Do not let supporting actor failures suppress the decision
- Seen by: product-gpt
- Decision: Attempt the consolidated decision comment regardless of per-actor posting failures, then fail the run after all comment attempts are recorded.
Canonical decision hides medium product risk
- Seen by: product-gpt
- Decision: Include at least a compact count and titles of medium risks in the decision comment, or explicitly list product-review risks separately.
Accept all successful 2xx comment responses, including 204
- Seen by: tech-gpt
- Decision: Change success handling to accept 200 <= status_code < 300. For 204 or empty bodies, return ({}, action) without JSON parsing. Add PATCH 204 regression coverage.
Deprecated --actor is ignored, breaking the stated backward-compat path
- Seen by: tech-gpt
- Decision: Either preserve legacy single-actor behavior when --actor is explicitly used, or remove the backward-compat claim and update the deployed hook plus secret provisioning in the same PR. Add a main-level test for the old --actor glm invocation.
Decision comment posted LAST → appears at BOTTOM of PR thread, inverting Oracle Q4 priority
- Seen by: product-claude
- Decision: Swap order: post decision comment FIRST so it pins at top of thread for the life of the PR. Or post it as a sticky review (gh pr review --body) instead of a free-form comment. The whole ADHD-bandwidth premise of having a separate decision comment dies if the operator has to scroll past 3 reviewer du
Decision-comment orchestration has no unit tests
- Seen by: tech-claude
- Decision: Add tests parallel to maybe_post_per_actor_comments suite: dry-run logs + no HTTP, missing FORGEJO_TOKEN_CLAUDE in real mode raises, _verify_actor failure raises in real mode, CommentLookupError propagates, success path PATCH/POST routing.

Reviewer dissents

product-gpt voted NOT_OK (confidence 0.88)
tech-gpt voted NOT_OK (confidence 0.86)

Operator decisions (yes/no)

Risk 'Prevent comment spam across amended commits' raised by 1/6: do you accept this risk, or should this PR be deferred until it's mitigated?
Risk 'Do not let supporting actor failures suppress the decision' raised by 1/6: do you accept this risk, or should this PR be deferred until it's mitigated?
Risk 'Canonical decision hides medium product risk' raised by 1/6: do you accept this risk, or should this PR be deferred until it's mitigated?

Per-actor evidence: see comments by claude, codex, glm above. Tech: 2/3 OK · Product: 2/3 OK.

# Review decision **Status:** ELEVATED — recommended action: `defer` ### Single-reviewer high-risk findings - **Prevent comment spam across amended commits** - Seen by: product-gpt - Decision: Make the visible PR comments keyed by repo + PR + actor/decision, not commit SHA; include the commit SHA inside the body instead. - **Do not let supporting actor failures suppress the decision** - Seen by: product-gpt - Decision: Attempt the consolidated decision comment regardless of per-actor posting failures, then fail the run after all comment attempts are recorded. - **Canonical decision hides medium product risk** - Seen by: product-gpt - Decision: Include at least a compact count and titles of medium risks in the decision comment, or explicitly list product-review risks separately. - **Accept all successful 2xx comment responses, including 204** - Seen by: tech-gpt - Decision: Change success handling to accept 200 <= status_code < 300. For 204 or empty bodies, return ({}, action) without JSON parsing. Add PATCH 204 regression coverage. - **Deprecated --actor is ignored, breaking the stated backward-compat path** - Seen by: tech-gpt - Decision: Either preserve legacy single-actor behavior when --actor is explicitly used, or remove the backward-compat claim and update the deployed hook plus secret provisioning in the same PR. Add a main-level test for the old --actor glm invocation. - **Decision comment posted LAST → appears at BOTTOM of PR thread, inverting Oracle Q4 priority** - Seen by: product-claude - Decision: Swap order: post decision comment FIRST so it pins at top of thread for the life of the PR. Or post it as a sticky review (gh pr review --body) instead of a free-form comment. The whole ADHD-bandwidth premise of having a separate decision comment dies if the operator has to scroll past 3 reviewer du - **Decision-comment orchestration has no unit tests** - Seen by: tech-claude - Decision: Add tests parallel to maybe_post_per_actor_comments suite: dry-run logs + no HTTP, missing FORGEJO_TOKEN_CLAUDE in real mode raises, _verify_actor failure raises in real mode, CommentLookupError propagates, success path PATCH/POST routing. ### Reviewer dissents - `product-gpt` voted **NOT_OK** (confidence 0.88) - `tech-gpt` voted **NOT_OK** (confidence 0.86) ### Operator decisions (yes/no) 1. Risk 'Prevent comment spam across amended commits' raised by 1/6: do you accept this risk, or should this PR be deferred until it's mitigated? 2. Risk 'Do not let supporting actor failures suppress the decision' raised by 1/6: do you accept this risk, or should this PR be deferred until it's mitigated? 3. Risk 'Canonical decision hides medium product risk' raised by 1/6: do you accept this risk, or should this PR be deferred until it's mitigated? --- _Per-actor evidence: see comments by `claude`, `codex`, `glm` above. Tech: 2/3 OK · Product: 2/3 OK._

pdurlej merged commit 7a6496d88e into main

2026-05-01 22:27:51 +02:00

pdurlej referenced this pull request from a commit

2026-05-01 22:27:52 +02:00

feat(review): per-actor comments + non-2xx raise (PR #9)

pdurlej referenced this pull request from a commit

2026-05-01 22:27:52 +02:00

Merge pull request 'feat(review): per-actor comments + non-2xx raise (PR #9)' (#9) from codex/identity-wiring/pr9-multi-actor-comments into main

pdurlej referenced this pull request from a commit

2026-05-01 22:49:44 +02:00

feat(review): Order D — PLATFORM_CONTEXT for reviewer noise floor + charter auto-loop policy

claude referenced this pull request