feat(review-pipeline): 3+3 orchestrator + 6 provider adapters #4
No reviewers
Labels
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
pdurlej/platform!4
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "codex/review-pipeline/run-review-orchestrator"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Implements
prompts/01.5c-pm-decision-packet.md §7per OQ1-OQ3 locks (operator answers 2026-05-01).This is the orchestration layer for the 3+3+consolidator review pipeline. It dispatches a PR's diff to 6 reviewers in parallel (3 technical + 3 product), waits for all 6 (per OQ3 wait-mode), then hands the bundle to the deterministic consolidator from PR #3.
What's in this PR
New files (8):
control-plane/platformctl/tools/run_review.py— orchestrator (parallel dispatch, wait-mode, escalation, CLI)control-plane/platformctl/tools/providers/_base.py— Provider ABC + per-reviewer system prompts + robust JSON parsercontrol-plane/platformctl/tools/providers/{zai,anthropic,openai}.py— real adapters (env-driven config)control-plane/platformctl/tools/providers/_mock.py— env-controlled mock for testscontrol-plane/platformctl/tools/providers/__init__.py— lazy provider registrycontrol-plane/platformctl/tests/test_run_review.py— 29 testsModified files (5):
control-plane/pyproject.toml— addshttpx>=0.27depprompts/01.5c-pm-decision-packet.md— model spec table locked + OQ1-OQ3 answersprompts/02-catalog.md— Appendix A.5 swapped from single-reviewer to 3+3 pipelinestate/AUDIT_LOG.jsonl— appended OQ lock entries + PR coded entrystate/STATUS_NOW.md— updated for post-compact + PR #4 in flight + operator-mobile modeOQ locks (2026-05-01, operator-confirmed)
tech-glm/product-glmglm-4.6tech-claude/product-claudeclaude-opus-4-7tech-gpt/product-gptgpt-5.5reasoning_effort=high("xhigh")OQ2 (cost cap): none — all subscriptions; monitor via Codex CLI status bar.
OQ3 (failure mode): WAIT — orchestrator refuses to consolidate if any reviewer fails; writes
state/DECISION_REQUIRED.md+ exits non-zero.Wait-mode policy (OQ3 in code)
tools/run_review.pywaits for all 6 reviewers (per-reviewer timeout default 600s). If any reviewer fails or times out:state/reviews/PR-<id>/<reviewer>.json(audit trail)state/DECISION_REQUIRED.mdwritten with: failed reviewer(s), kind (timeout/error), remaining successes, 4 recovery optionsTests
52/52 green:
NOT in this PR (intentional)
zai.py,anthropic.py,openai.py) ship untested against live APIs. First canary 3+3 run on a real Phase 02 PR will validate. Adapters are env-configurable so model strings can be tuned without re-deploy.validate.yaml/plan.yamlnot yet callingrun_review.py. Follow-up PR.tech-claudealways uses Messages API today; plugin route planned viaPLATFORMCTL_USE_CLAUDE_PLUGINflag (open loopclaude-plugin-route-2026-05).Test plan
pytest platformctl/tests/→ all 52 greenpython -m platformctl.tools.run_review --helpshows full arg list🤖 Generated with Claude Code
Implements `prompts/01.5c-pm-decision-packet.md §7` per OQ1-OQ3 locks (operator answers 2026-05-01). What this adds: - `tools/run_review.py` — parallel dispatch of 6 reviewers, wait-mode policy, escalation via `state/DECISION_REQUIRED.md` if any reviewer fails (OQ3 lock). Exit codes: 0=approve, 1=defer, 2=escalation, 3=usage error. - `tools/providers/_base.py` — Provider ABC, ReviewRequest dataclass, per-reviewer system prompts (tech vs product, 6 specialized "lenses"), robust JSON-from-prose parser with ABSTAIN fallback. - `tools/providers/{zai,anthropic,openai}.py` — real adapters; env-driven config (model strings, reasoning effort, thinking budget per OQ1). Untested against live APIs in this PR; first canary run validates. - `tools/providers/_mock.py` — env-controlled mock for tests. - `tests/test_run_review.py` — 29 tests covering parsing, MockProvider, parallel dispatch, wait-mode escalation, CLI exit codes. OQ locks (per `state/AUDIT_LOG.jsonl#oq_answers`): - OQ1: glm-4.6 / claude-opus-4-7 + max thinking / gpt-5.5 + high reasoning - OQ2: no hard cost cap (all subscriptions; monitor via Codex CLI) - OQ3: WAIT mode, no soft-fail; operator escalation only Spec docs updated: - `prompts/01.5c-pm-decision-packet.md` — model spec table + OQ answers locked + acceptance criteria checked off - `prompts/02-catalog.md` Appendix A.5 — replaced single-reviewer GLM/Claude path with 3+3+consolidator pipeline Tests: 52/52 green - 29 new (this PR) - 14 consolidator (PR #3) - 4 negative-control (PR #1) - 5 smoke (existing) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>