feat(review): per-actor comments + non-2xx raise (PR #9) #9
No reviewers
Labels
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
pdurlej/platform!9
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "codex/identity-wiring/pr9-multi-actor-comments"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Two changes from canary B (live flame-war on merged PR #8):
1. Non-2xx HTTP raise (3-reviewer convergent BLOCKER fix)
Canary B (
--no-dry-run-comment+ glm PAT) succeeded for happy path (HTTP 201), but 3 reviewers convergent flagged:Previous code:
Fix: extracted
_post_or_patch_one_comment()that RAISES RuntimeError on non-2xx. Both POST (create) and PATCH (update) paths covered.2. Per-actor comments (operator request 2026-05-01)
PR #8's hook posted ONE consolidated comment from
glmuser. Hides per-reviewer attribution.Now: 3 separate comments, one per AI family —
claude,codex,glm— each authored by its own Forgejo identity. Each shows that family's tech + product hat verdicts + risks + opportunities.Mapping:
*-claude → claude,*-gpt → codex(operator's user),*-glm → glm.CLI:
--actordeprecated, new--skip-actor(multiple). Default: post all 3.Tests: 120/120 green (was 112; +8 new)
_split_reviews_by_actormapping_build_per_actor_bodymarker + 2-hat sections + missing-hat placeholder_post_or_patch_one_commentRAISES on POST 5xx / PATCH 4xx--skip-actorfilter worksToken-arbitrage step (operator's strategy, 2026-05-01)
Per future canary: operator wants reviewers to propose
fix_patchdirectly + 2-round cross-review (open_loopreviewer-fix-patch-schema). This PR is the last single-round canary infrastructure change before A.Test plan
reviewer-fix-patch-schema2-round) per Oracle consultLinkage
🤖 Generated with Claude Code
3+3 ensemble review by
claude— tech + product hatsTech hat: ✅ OK (confidence 0.78)
Risks
high— Post-hoc9substitution can clobber reviewer-supplied contentrun_review.py:_build_per_actor_bodyemits literal9(via{9}f-string escape), thenmaybe_post_per_actor_commentsdoesbody.replace("9", str(args.pr_id))`. Any risk title / evidence / recommendation / opportunity description that happens to contain the literal substring ``pr_idinto_build_per_actor_body(actor, hats, marker, pr_id)and interpolate it inline (f"...PR-9/decision_packet.md"). Drop the global.replace("9", ...)call entirely.medium—confidence: nullwill crash body composition with TypeErrorrun_review.py_build_per_actor_body:confidence = review.get("confidence", 0.0); ... f"... (confidence {confidence:.2f})".dict.get(key, default)returnsNonewhen the key is present but explicitly null (which is valid per the per-reviewer JSON schema's0.0-1.0range allowing null on parseconfidence = float(review.get("confidence") or 0.0). Add a unit test where one hat returnsconfidence=None.medium— Unknownreviewer_idsilently dropped — observability gaprun_review.py_split_reviews_by_actor: hard-codedREVIEWER_ID_TO_ACTORwith 6 keys; if upstream renames a reviewer (e.g.tech-haiku,product-gpt5,tech-claude-opus), the entry is silently filtered out — no warning, no failure, the actor's comment just won't include that hat (or the entirereviewer_iddoesn't match the mapping. Consider adding a registry assertion at startup that all expected reviewer_ids are present.medium— Real-post mode silently toleratesno hats— contradicts docstringrun_review.pymaybe_post_per_actor_comments: docstring says "Real-post mode raises if any actor fails (token, verify, lookup, write)." But theif not hats: ... continuebranch does NOT append tofailureseven in real-post mode. If a reviewer family entirely failed upstream (provider crash, JSOno hatsin real-post as a failure (failures.append(msg)), or (b) update the docstring to reflect that missing-reviews is a soft skip. Pick one and make the test cover it.low— Unusedpacketparameter inmaybe_post_per_actor_commentsrun_review.py:def maybe_post_per_actor_comments(packet, reviews, args)—packetis never referenced in the body. Caller inmain()passes it. Dead parameter; future maintainers will assume it's used.packetfrom the signature and the call site, or document why it's reserved.low— Markdown injection in risk titles / evidence / recommendationsrun_review.py_build_per_actor_body: reviewer-supplied strings are interpolated raw into markdown (f"-{sev}— {title}", evidence wrapped in single-backticks). A title containing```or]or unmatched backticks breaks rendering for downstream readers. Evidence wrapped in single-backtev(replace`with`` `` or use HTML`). For long fields, consider fenced code blocks instead of inline backticks.low— Pre-existing find-then-post race remains for per-actor pathrun_review.py:_find_existing_commentfollowed by_post_or_patch_one_commentis non-atomic (same TOCTOU as the consolidated path). Two concurrent canary runs against the same PR will each see no existing comment and POST, producing duplicates per actor.state/reviews/PR-<id>/).Opportunities
confidence: nulland unknown reviewer_id — Two edge cases above are not covered by the 8 new tests. Cheap to add; would catch the TypeError crash and the silent-drop observability gap before they hit canary._post_or_patch_one_commentaccepts {200, 201} for both POST and PATCH. POST→201, PATCH→200 are the documented Forgejo responses; tightening (POST only 201, PATCH only 200) makes contract violations louder.Product hat: ✅ OK (confidence 0.70)
Risks
medium— PR thread noise: 3 comments per PR triples visual scan costrun_review.py loop posts one comment per actor in ('claude', 'codex', 'glm'); each is full tech+product markdown body. On every future PR the thread now opens with 3 large bot comments before any human discussion.<details><summary>block so the default scroll-cost is minimal. Operator wanted pattern visibility, not always-expanded walls of text. Cheap follow-up — not a blocker, but worth doing before this becomes the steady state.medium— Dead-code-in-waiting: old consolidated path kept 'for backward compat'_build_comment_body and maybe_post_forgejo_comment both retained with comment 'Kept for backward compatibility / tests' even though main() no longer calls maybe_post_forgejo_comment in the live path. Two parallel posting flows now exist.medium— Hidden state: 3 PATs to provision and rotate instead of 1FORGEJO_TOKEN_CLAUDE, FORGEJO_TOKEN_CODEX, FORGEJO_TOKEN_GLM all required for full posting. Missing one → that actor silently skips with a stderr line (real-post mode raises, dry-run does not).platformctl doctoror similar) that lists which actor PATs are present/missing. ADHD-friendly: operator sees the auth surface in one place instead of discovering per-PR which token expired.low— Deprecated--actorflag still accepted as choicebuild_parser() keeps--actorwith full choices list and a 'DEPRECATED' help string. New--skip-actoradded alongside.low— Scope tension with stated 'last single-round change before A'PR description: 'This PR is the **last single-round canary infrastructure change before A**' — yet this PR adds non-trivial per-actor splitting logic, 8 new tests, and a second posting flow.Opportunities
<details>and put the verdict trio (e.g.claude: ❌ NOT_OK · codex: ✅ OK · glm: ✅ OK) in the summary line. Operator gets pattern-visibility at-a-glance without scrolling 3 walls of text. Matches the actual decision question ('does anyone disagree?').--actorflag are clearly dead. Worth a calendar nudge in ~2 weeks to delete maybe_post_forgejo_comment, _build_comment_body, and the --actor argparse entry — keeps surface tight.[auth] claude=ok codex=ok glm=MISSINGline at the start of run_review would make it impossible to miss without reading post-hoc stderr.3+3 ensemble review by
codex— tech + product hatsTech hat: ❌ NOT_OK (confidence 0.82)
Risks
high— Real-post mode can still exit OK when an actor has no reviewscontrol-plane/platformctl/tools/run_review.py:754:if not hats: ... skippingdoes not append tofailureseven whenreal_postis true--skip-actor.medium— Duplicate reviewer outputs are order-dependent and silently overwrittencontrol-plane/platformctl/tools/run_review.py:647:by_actor[actor][hat] = roverwrites any previous review for the same actor/hatreviewer_idor duplicate actor/hat entries and raise before posting, or sort/select by an explicit deterministic key.medium— Comment rendering can crash on non-float confidence valuescontrol-plane/platformctl/tools/run_review.py:676:confidence = review.get("confidence", 0.0)followed by{confidence:.2f}``low— Failure aggregation hides which actor failed for HTTP write errorscontrol-plane/platformctl/tools/run_review.py:823-824 appendsstr(e)from_post_or_patch_one_comment, whose message lacks the actor name[comment][{actor}]before appending tofailures, so multi-actor real-post failures are actionable.Opportunities
not hatspath that currently skips silently in real mode.Product hat: ❌ NOT_OK (confidence 0.86)
Risks
blocker— Existing hook can hard-fail after mergecontrol-plane/platformctl/tools/run_review.py:350-355 keeps --actor as backward-compatible, but main now ignores it at 489-491 and always calls maybe_post_per_actor_comments; real mode then requires FORGEJO_TOKEN_CLAUDE, FORGEJO_TOKEN_CODEX, and FORGEJO_TOKEN_GLM at 776-782.high— Reviewer output becomes noisier for the operatorcontrol-plane/platformctl/tools/run_review.py:489-491 replaces one consolidated PR comment with three per-actor comments; _build_per_actor_body at 662-707 shows each actor's risks/opportunities but not the consolidated default_action or merged risk decision.medium— Partial comment state creates ADHD-hostile cleanupcontrol-plane/platformctl/tools/run_review.py:766-825 posts actors sequentially and only raises after collecting failures at 827-831, so real mode can leave one or two comments posted while the run still exits 4.medium— Skip flag normalizes missing reviewer identitiescontrol-plane/platformctl/tools/run_review.py:357-362 adds repeatable --skip-actor, and 766-769 silently excludes an actor from posting even though the stated operator goal is to compare reviewer-family patterns.Opportunities
3+3 ensemble review by
glm— tech + product hatsTech hat: ✅ OK (confidence 0.95)
Opportunities
Product hat: ✅ OK (confidence 0.85)
Risks
low— PR thread noise from 3 separate commentsrun_review.py:687-724 (maybe_post_per_actor_comments loops 3 actors)low— Token management overhead increasedrun_review.py:712-719 (requires 3 separate FORGEJO_TOKEN_* env vars)low— No quick cross-actor summary in PR UIrun_review.py:656 (per-actor body references consolidated file for cross-view)Opportunities
3+3 ensemble review by
claude— tech + product hatsTech hat: ✅ OK (confidence 0.70)
Risks
high— Decision-comment orchestration has no unit testsrun_review.py adds maybe_post_decision_comment() (lookup+verify+post pipeline) but tests only cover its primitives (_decision_marker, _build_decision_comment_body). No test exercises: token-missing path, _verify_actor failure, dry-run path, real-post existing-comment update, real-post new-comment crmedium— 9 template leak — _build_per_actor_body returns un-substituted output, relies on caller to .replace()run_review.py _build_per_actor_body emits literal9(via{9}in f-string), and orchestrator does.replace('9', str(args.pr_id))after the call. test_build_per_actor_body_includes_marker_and_both_hats does NOT call .replace(), so a future caller forgetting the post-processing{...}placeholders. Eliminate the contract-by-convention.medium— Decision comment not posted when per-actor partially fails — inconsistent PR thread statemain() runs maybe_post_per_actor_comments(...) then maybe_post_decision_comment(...). If 1+ actor fails, the former raises after attempting all 3, exception propagates, decision is never posted/updated. PR shows new claude reviewer comment + stale (or absent) decision comment from prior commit.degraded: missing actors X,Ynote. Current behavior risks operator misreading PR state.medium— Hardcoded/3in vote-count summary lies when reviewer fails to vote_build_decision_comment_body footer:Tech: {sum(1 for v in packet.tech_votes.values() if v == 'OK')}/3 OK · Product: {sum(1 for v in packet.product_votes.values() if v == 'OK')}/3 OK.. If only 2 reviewers actually voted (one provider failed), the count shows e.g.1/3 OKwhen actual ratio is 1/2.len(packet.tech_votes)as denominator, or render as1 OK / 2 voted / 3 expected. Don't bake the count into the f-string.medium— _post_or_patch_one_comment httpx.HTTPError→RuntimeError wrapping is not unit-testedtest_post_or_patch_RAISES_on_non_2xx and test_post_or_patch_RAISES_on_404_patch only cover non-2xx response status. The new try/except httpx.HTTPError clause (run_review.py ~_post_or_patch_one_comment) — which is what actually addresses the 'network errors bypass aggregation' convergent finding — hamedium— Decision body accesses risk fields as attributes — schema-coupled to consolidate_review output_build_decision_comment_body usesr.severity,r.weak_signal,r.title,r.raised_by,r.recommendationdirectly. If consolidate_review.consolidate ever returns dict-shaped risks (e.g., during a refactor or partial result), these become AttributeError at runtime — only caught by exception handlow— Forgejouser.loginfield assumption in _find_existing_comment author matchrun_review.py _find_existing_comment:comment_author = (c.get('user') or {}).get('login'). If Forgejo/Gitea API ever renames or shapes the user payload differently (or returns a thin user without login), every comment will be filtered out and the code falls through to POST → duplicate comments, delow— Dead--actorarg + maybe_post_forgejo_comment() linger as confusion-shaped landminesbuild_parser still registers--actorwith choices and a default of 'glm'; main() no longer references it. maybe_post_forgejo_comment() is no longer called from main but is still tested. Operators passing--actor=codexget no warning and no effect — silent behavioral change.--actoris supplied, or remove both the flag and the unused function. Keep test coverage on the active path only.low— Decision-comment marker tied to commit_sha[:8] accumulates one comment per pushed SHA_decision_marker includeshead_sha[:8]. Each new push creates a new decision comment (PATCH won't match prior SHA's marker). Charter language 'never append spam' applies within one commit, but across PR lifetime a 10-commit PR yields 10 decision comments + 30 per-actor comments.low— Reviewer IDs not in REVIEWER_ID_TO_ACTOR are silently dropped_split_reviews_by_actor: if rid is not in the static map,actor = REVIEWER_ID_TO_ACTOR.get(rid)is None and the review is skipped without log. A future reviewer family (e.g.tech-llama) added without updating the map vanishes from per-actor comments AND from the operator's mental model.low— URL construction doesn't quote repo / pr_id_post_or_patch_one_comment buildsf"{host.rstrip('/')}/api/v1/repos/{repo}/issues/9/comments"with raw repo and pr_id. Internal-only inputs today, but if a repo name ever has a/beyondowner/repoor pr_id is non-numeric, paths break in surprising ways.Opportunities
_forgejo_request()helper that owns the error-class taxonomy — fewer chances of one path being missed in a future change.startswith('tech-')/startswith('product-')) and actor mapping are independent string operations on the same field. A reviewer_id schema (regex^(tech|product)-(claude|gpt|glm)$) parsed once would catch malformed IDs at ingest and keep both derivations consistent.Product hat: ✅ OK (confidence 0.70)
Risks
high— Decision comment posted LAST → appears at BOTTOM of PR thread, inverting Oracle Q4 priorityrun_review.py main() calls maybe_post_per_actor_comments BEFORE maybe_post_decision_comment. Forgejo shows oldest-first in PR conversation. So on first review you scroll past 3 lengthy per-actor blocks (one with full risk lists) to reach the 'canonical operator surface.' Charter says: 'The decisionmedium— Marker keys on commit_sha[:8] → every amend creates FOUR new comments, not four PATCHes_decision_marker and _idempotency_marker both embed commit_sha[:8]. Test test_decision_marker_format asserts this explicitly. Each push to the PR branch changes the head SHA, so on commit N+1 the lookup finds no marker match for the NEW SHA → POST new comment. After 5 amends you have ~20 comments onmedium— Charter section for Q5 (bug-class clustering) ships without implementationPLATFORM_CHARTER.md adds 'Bug-class clustering hints (Oracle Q5 ruling)' describing reviewer.bug_class field and 'platform may surface repeated classes' — but no field appears in this diff's reviewer schema, no consolidator code reads it, no test exercises it. Charter is now ahead of code.medium— Decision comment depends on FORGEJO_TOKEN_CLAUDE → if claude is rate-limited or unavailable, operator's primary surface silently disappearsmaybe_post_decision_comment hardcodes decision_actor='claude' and exits early on missing token (real_post raises, dry-run skips). Per-actor comments may still post for codex+glm. Operator gets evidence but no synthesis — exactly the cognitive-load scenario the decision comment was supposed to prevenlow— Test plan ships unchecked: 'Manual re-canary on PR #9 itself' is the actual verification, but PR is being requested for merge before that runsPR description Test plan: '[ ] Manual: re-canary on PR #9 itself (recursive meta-test...)'. The 120 unit tests don't cover the live Forgejo POST/PATCH path against a real instance — that's what canary B was for. Merging without rerunning canary on this PR risks the same class of bug as canary B cauglow— 4× comment surface per PR with no off-switchEvery PR now gets 4 comments (3 actor + 1 decision), all PATCH-updated together. --skip-actor exists but no '--decision-only' or '--minimal' for trivial PRs (typo fix, README change, etc) where per-actor attribution is overkill.Opportunities
3+3 ensemble review by
codex— tech + product hatsTech hat: ❌ NOT_OK (confidence 0.86)
Risks
high— Accept all successful 2xx comment responses, including 204control-plane/platformctl/tools/run_review.py:936-944: _post_or_patch_one_comment claims to raise on non-2xx, but only accepts 200/201 and then always parses JSON. Forgejo/Gitea PATCH comment responses may validly be 204 with no body, so an apply-time update can succeed server-side and still hard-fahigh— Deprecated --actor is ignored, breaking the stated backward-compat pathcontrol-plane/platformctl/tools/run_review.py:351-356 says --actor is kept for backward compatibility, but main() always calls maybe_post_per_actor_comments() and maybe_post_decision_comment() at lines 489-496. A pre-existing real-post hook using --actor glm with only FORGEJO_TOKEN_GLM now fails onmedium— --skip-actor claude cannot actually skip the claude dependencycontrol-plane/platformctl/tools/run_review.py:358-362 documents --skip-actor as usable when an actor PAT is unavailable, but maybe_post_decision_comment() hard-requires FORGEJO_TOKEN_CLAUDE at lines 849-856 even if --skip-actor claude was supplied.Opportunities
Product hat: ❌ NOT_OK (confidence 0.88)
Risks
high— Prevent comment spam across amended commitsPLATFORM_CHARTER.md:301 says all 4 comments are upserts and never append spam, but run_review.py:522 and run_review.py:754 include head_sha in the markers, so each new commit can create a fresh 4-comment set instead of updating the PR-level surface.high— Do not let supporting actor failures suppress the decisionPLATFORM_CHARTER.md:295 defines the decision comment as canonical, but run_review.py:491-496 posts per-actor comments first and only then posts the decision; run_review.py:1039-1043 raises after actor failures, preventing the canonical decision comment.high— Canonical decision hides medium product riskrun_review.py:811-812 tells the operator there are no blockers or high-severity risks and sends medium/low details to actor comments/local packet. Product and ADHD risks are often medium but still merge-relevant for the operator.medium— Decision identity is visually ambiguousrun_review.py:849 hardcodes the decision actor to claude, while run_review.py:660-663 also maps Claude reviewer output to claude; run_review.py:353 already has a platform-orchestrator actor choice that is not used.low— Bug-class charter change is premature scopePLATFORM_CHARTER.md:303-307 adds an open taxonomy for bug_class, but this PR does not implement or validate it.Opportunities
3+3 ensemble review by
glm— tech + product hatsTech hat: ✅ OK (confidence 0.95)
Opportunities
Product hat: ✅ OK (confidence 0.95)
Risks
medium— Increased cognitive load on PR threadPLATFORM_CHARTER.md + run_review.py diff: 4 comments per PR (3 actors + 1 decision)medium— Complexity in marker idempotency logicrun_review.py:_find_existing_comment now requiresexpected_authorto prevent identity poisoningOpportunities
Review decision
Status: ELEVATED — recommended action:
deferSingle-reviewer high-risk findings
Reviewer dissents
product-gptvoted NOT_OK (confidence 0.88)tech-gptvoted NOT_OK (confidence 0.86)Operator decisions (yes/no)
Per-actor evidence: see comments by
claude,codex,glmabove. Tech: 2/3 OK · Product: 2/3 OK.