feat(review-pipeline): 3+3 orchestrator + 6 provider adapters #4

Merged
pdurlej merged 1 commit from codex/review-pipeline/run-review-orchestrator into main 2026-05-01 11:42:32 +02:00
Owner

Summary

Implements prompts/01.5c-pm-decision-packet.md §7 per OQ1-OQ3 locks (operator answers 2026-05-01).

This is the orchestration layer for the 3+3+consolidator review pipeline. It dispatches a PR's diff to 6 reviewers in parallel (3 technical + 3 product), waits for all 6 (per OQ3 wait-mode), then hands the bundle to the deterministic consolidator from PR #3.

What's in this PR

New files (8):

  • control-plane/platformctl/tools/run_review.py — orchestrator (parallel dispatch, wait-mode, escalation, CLI)
  • control-plane/platformctl/tools/providers/_base.py — Provider ABC + per-reviewer system prompts + robust JSON parser
  • control-plane/platformctl/tools/providers/{zai,anthropic,openai}.py — real adapters (env-driven config)
  • control-plane/platformctl/tools/providers/_mock.py — env-controlled mock for tests
  • control-plane/platformctl/tools/providers/__init__.py — lazy provider registry
  • control-plane/platformctl/tests/test_run_review.py — 29 tests

Modified files (5):

  • control-plane/pyproject.toml — adds httpx>=0.27 dep
  • prompts/01.5c-pm-decision-packet.md — model spec table locked + OQ1-OQ3 answers
  • prompts/02-catalog.md — Appendix A.5 swapped from single-reviewer to 3+3 pipeline
  • state/AUDIT_LOG.jsonl — appended OQ lock entries + PR coded entry
  • state/STATUS_NOW.md — updated for post-compact + PR #4 in flight + operator-mobile mode

OQ locks (2026-05-01, operator-confirmed)

Slot Provider Model Reasoning/Thinking
tech-glm / product-glm z.ai glm-4.6 (none — temperature 0.2)
tech-claude / product-claude Anthropic claude-opus-4-7 max thinking (32k tokens)
tech-gpt / product-gpt OpenAI gpt-5.5 reasoning_effort=high ("xhigh")

OQ2 (cost cap): none — all subscriptions; monitor via Codex CLI status bar.

OQ3 (failure mode): WAIT — orchestrator refuses to consolidate if any reviewer fails; writes state/DECISION_REQUIRED.md + exits non-zero.

Wait-mode policy (OQ3 in code)

tools/run_review.py waits for all 6 reviewers (per-reviewer timeout default 600s). If any reviewer fails or times out:

  • Successful reviews still saved to state/reviews/PR-<id>/<reviewer>.json (audit trail)
  • state/DECISION_REQUIRED.md written with: failed reviewer(s), kind (timeout/error), remaining successes, 4 recovery options
  • Exit code 2 signals escalation
  • Consolidator NOT called; no decision packet emitted
  • NO silent partial decisions

Tests

52/52 green:

  • 29 new (this PR — parsing, MockProvider, parallel dispatch, wait-mode escalation, CLI exit codes)
  • 14 consolidator (PR #3)
  • 4 negative-control gate (PR #1)
  • 5 platformctl smoke
pytest platformctl/tests/ → 52 passed in 4.40s

NOT in this PR (intentional)

  • No live API exercise. Real provider adapters (zai.py, anthropic.py, openai.py) ship untested against live APIs. First canary 3+3 run on a real Phase 02 PR will validate. Adapters are env-configurable so model strings can be tuned without re-deploy.
  • No Forgejo Actions wiring. validate.yaml / plan.yaml not yet calling run_review.py. Follow-up PR.
  • No Claude Code Review plugin route. tech-claude always uses Messages API today; plugin route planned via PLATFORMCTL_USE_CLAUDE_PLUGIN flag (open loop claude-plugin-route-2026-05).

Test plan

  • pytest platformctl/tests/ → all 52 green
  • python -m platformctl.tools.run_review --help shows full arg list
  • Manual: real canary run on a small Phase 02 PR (post-merge)
  • Manual: induce one provider failure, verify DECISION_REQUIRED.md surfaces correctly

🤖 Generated with Claude Code

## Summary Implements `prompts/01.5c-pm-decision-packet.md §7` per OQ1-OQ3 locks (operator answers 2026-05-01). This is the orchestration layer for the 3+3+consolidator review pipeline. It dispatches a PR's diff to 6 reviewers in parallel (3 technical + 3 product), waits for all 6 (per OQ3 wait-mode), then hands the bundle to the deterministic consolidator from PR #3. ## What's in this PR **New files (8):** - `control-plane/platformctl/tools/run_review.py` — orchestrator (parallel dispatch, wait-mode, escalation, CLI) - `control-plane/platformctl/tools/providers/_base.py` — Provider ABC + per-reviewer system prompts + robust JSON parser - `control-plane/platformctl/tools/providers/{zai,anthropic,openai}.py` — real adapters (env-driven config) - `control-plane/platformctl/tools/providers/_mock.py` — env-controlled mock for tests - `control-plane/platformctl/tools/providers/__init__.py` — lazy provider registry - `control-plane/platformctl/tests/test_run_review.py` — 29 tests **Modified files (5):** - `control-plane/pyproject.toml` — adds `httpx>=0.27` dep - `prompts/01.5c-pm-decision-packet.md` — model spec table locked + OQ1-OQ3 answers - `prompts/02-catalog.md` — Appendix A.5 swapped from single-reviewer to 3+3 pipeline - `state/AUDIT_LOG.jsonl` — appended OQ lock entries + PR coded entry - `state/STATUS_NOW.md` — updated for post-compact + PR #4 in flight + operator-mobile mode ## OQ locks (2026-05-01, operator-confirmed) | Slot | Provider | Model | Reasoning/Thinking | |---|---|---|---| | `tech-glm` / `product-glm` | z.ai | `glm-4.6` | (none — temperature 0.2) | | `tech-claude` / `product-claude` | Anthropic | `claude-opus-4-7` | max thinking (32k tokens) | | `tech-gpt` / `product-gpt` | OpenAI | `gpt-5.5` | `reasoning_effort=high` ("xhigh") | **OQ2 (cost cap):** none — all subscriptions; monitor via Codex CLI status bar. **OQ3 (failure mode):** WAIT — orchestrator refuses to consolidate if any reviewer fails; writes `state/DECISION_REQUIRED.md` + exits non-zero. ## Wait-mode policy (OQ3 in code) `tools/run_review.py` waits for all 6 reviewers (per-reviewer timeout default 600s). If any reviewer fails or times out: - ✅ Successful reviews still saved to `state/reviews/PR-<id>/<reviewer>.json` (audit trail) - ✅ `state/DECISION_REQUIRED.md` written with: failed reviewer(s), kind (timeout/error), remaining successes, 4 recovery options - ✅ Exit code **2** signals escalation - ❌ Consolidator NOT called; no decision packet emitted - ❌ NO silent partial decisions ## Tests **52/52 green:** - 29 new (this PR — parsing, MockProvider, parallel dispatch, wait-mode escalation, CLI exit codes) - 14 consolidator (PR #3) - 4 negative-control gate (PR #1) - 5 platformctl smoke ``` pytest platformctl/tests/ → 52 passed in 4.40s ``` ## NOT in this PR (intentional) - **No live API exercise.** Real provider adapters (`zai.py`, `anthropic.py`, `openai.py`) ship untested against live APIs. First canary 3+3 run on a real Phase 02 PR will validate. Adapters are env-configurable so model strings can be tuned without re-deploy. - **No Forgejo Actions wiring.** `validate.yaml` / `plan.yaml` not yet calling `run_review.py`. Follow-up PR. - **No Claude Code Review plugin route.** `tech-claude` always uses Messages API today; plugin route planned via `PLATFORMCTL_USE_CLAUDE_PLUGIN` flag (open loop `claude-plugin-route-2026-05`). ## Test plan - [x] `pytest platformctl/tests/` → all 52 green - [x] `python -m platformctl.tools.run_review --help` shows full arg list - [ ] Manual: real canary run on a small Phase 02 PR (post-merge) - [ ] Manual: induce one provider failure, verify DECISION_REQUIRED.md surfaces correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Implements `prompts/01.5c-pm-decision-packet.md §7` per OQ1-OQ3 locks
(operator answers 2026-05-01).

What this adds:
- `tools/run_review.py` — parallel dispatch of 6 reviewers, wait-mode
  policy, escalation via `state/DECISION_REQUIRED.md` if any reviewer
  fails (OQ3 lock). Exit codes: 0=approve, 1=defer, 2=escalation,
  3=usage error.
- `tools/providers/_base.py` — Provider ABC, ReviewRequest dataclass,
  per-reviewer system prompts (tech vs product, 6 specialized "lenses"),
  robust JSON-from-prose parser with ABSTAIN fallback.
- `tools/providers/{zai,anthropic,openai}.py` — real adapters; env-driven
  config (model strings, reasoning effort, thinking budget per OQ1).
  Untested against live APIs in this PR; first canary run validates.
- `tools/providers/_mock.py` — env-controlled mock for tests.
- `tests/test_run_review.py` — 29 tests covering parsing, MockProvider,
  parallel dispatch, wait-mode escalation, CLI exit codes.

OQ locks (per `state/AUDIT_LOG.jsonl#oq_answers`):
- OQ1: glm-4.6 / claude-opus-4-7 + max thinking / gpt-5.5 + high reasoning
- OQ2: no hard cost cap (all subscriptions; monitor via Codex CLI)
- OQ3: WAIT mode, no soft-fail; operator escalation only

Spec docs updated:
- `prompts/01.5c-pm-decision-packet.md` — model spec table + OQ
  answers locked + acceptance criteria checked off
- `prompts/02-catalog.md` Appendix A.5 — replaced single-reviewer
  GLM/Claude path with 3+3+consolidator pipeline

Tests: 52/52 green
- 29 new (this PR)
- 14 consolidator (PR #3)
- 4 negative-control (PR #1)
- 5 smoke (existing)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign in to join this conversation.
No reviewers
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform!4
No description provided.