feat(review-pipeline): 3+3 orchestrator + 6 provider adapters #4

Merged

pdurlej merged 1 commit from codex/review-pipeline/run-review-orchestrator into main

2026-05-01 11:42:32 +02:00

pdurlej commented

2026-05-01 11:41:32 +02:00

Owner

Summary

Implements prompts/01.5c-pm-decision-packet.md §7 per OQ1-OQ3 locks (operator answers 2026-05-01).

This is the orchestration layer for the 3+3+consolidator review pipeline. It dispatches a PR's diff to 6 reviewers in parallel (3 technical + 3 product), waits for all 6 (per OQ3 wait-mode), then hands the bundle to the deterministic consolidator from PR #3.

What's in this PR

New files (8):

control-plane/platformctl/tools/run_review.py — orchestrator (parallel dispatch, wait-mode, escalation, CLI)
control-plane/platformctl/tools/providers/_base.py — Provider ABC + per-reviewer system prompts + robust JSON parser
control-plane/platformctl/tools/providers/{zai,anthropic,openai}.py — real adapters (env-driven config)
control-plane/platformctl/tools/providers/_mock.py — env-controlled mock for tests
control-plane/platformctl/tools/providers/__init__.py — lazy provider registry
control-plane/platformctl/tests/test_run_review.py — 29 tests

Modified files (5):

control-plane/pyproject.toml — adds httpx>=0.27 dep
prompts/01.5c-pm-decision-packet.md — model spec table locked + OQ1-OQ3 answers
prompts/02-catalog.md — Appendix A.5 swapped from single-reviewer to 3+3 pipeline
state/AUDIT_LOG.jsonl — appended OQ lock entries + PR coded entry
state/STATUS_NOW.md — updated for post-compact + PR #4 in flight + operator-mobile mode

OQ locks (2026-05-01, operator-confirmed)

Slot	Provider	Model	Reasoning/Thinking
`tech-glm` / `product-glm`	z.ai	`glm-4.6`	(none — temperature 0.2)
`tech-claude` / `product-claude`	Anthropic	`claude-opus-4-7`	max thinking (32k tokens)
`tech-gpt` / `product-gpt`	OpenAI	`gpt-5.5`	`reasoning_effort=high` ("xhigh")

OQ2 (cost cap): none — all subscriptions; monitor via Codex CLI status bar.

OQ3 (failure mode): WAIT — orchestrator refuses to consolidate if any reviewer fails; writes state/DECISION_REQUIRED.md + exits non-zero.

Wait-mode policy (OQ3 in code)

tools/run_review.py waits for all 6 reviewers (per-reviewer timeout default 600s). If any reviewer fails or times out:

✅ Successful reviews still saved to state/reviews/PR-<id>/<reviewer>.json (audit trail)
✅ state/DECISION_REQUIRED.md written with: failed reviewer(s), kind (timeout/error), remaining successes, 4 recovery options
✅ Exit code 2 signals escalation
❌ Consolidator NOT called; no decision packet emitted
❌ NO silent partial decisions

Tests

52/52 green:

29 new (this PR — parsing, MockProvider, parallel dispatch, wait-mode escalation, CLI exit codes)
14 consolidator (PR #3)
4 negative-control gate (PR #1)
5 platformctl smoke

pytest platformctl/tests/ → 52 passed in 4.40s

NOT in this PR (intentional)

No live API exercise. Real provider adapters (zai.py, anthropic.py, openai.py) ship untested against live APIs. First canary 3+3 run on a real Phase 02 PR will validate. Adapters are env-configurable so model strings can be tuned without re-deploy.
No Forgejo Actions wiring. validate.yaml / plan.yaml not yet calling run_review.py. Follow-up PR.
No Claude Code Review plugin route. tech-claude always uses Messages API today; plugin route planned via PLATFORMCTL_USE_CLAUDE_PLUGIN flag (open loop claude-plugin-route-2026-05).

Test plan

pytest platformctl/tests/ → all 52 green
python -m platformctl.tools.run_review --help shows full arg list
Manual: real canary run on a small Phase 02 PR (post-merge)
Manual: induce one provider failure, verify DECISION_REQUIRED.md surfaces correctly

🤖 Generated with Claude Code

## Summary Implements `prompts/01.5c-pm-decision-packet.md §7` per OQ1-OQ3 locks (operator answers 2026-05-01). This is the orchestration layer for the 3+3+consolidator review pipeline. It dispatches a PR's diff to 6 reviewers in parallel (3 technical + 3 product), waits for all 6 (per OQ3 wait-mode), then hands the bundle to the deterministic consolidator from PR #3. ## What's in this PR **New files (8):** - `control-plane/platformctl/tools/run_review.py` — orchestrator (parallel dispatch, wait-mode, escalation, CLI) - `control-plane/platformctl/tools/providers/_base.py` — Provider ABC + per-reviewer system prompts + robust JSON parser - `control-plane/platformctl/tools/providers/{zai,anthropic,openai}.py` — real adapters (env-driven config) - `control-plane/platformctl/tools/providers/_mock.py` — env-controlled mock for tests - `control-plane/platformctl/tools/providers/__init__.py` — lazy provider registry - `control-plane/platformctl/tests/test_run_review.py` — 29 tests **Modified files (5):** - `control-plane/pyproject.toml` — adds `httpx>=0.27` dep - `prompts/01.5c-pm-decision-packet.md` — model spec table locked + OQ1-OQ3 answers - `prompts/02-catalog.md` — Appendix A.5 swapped from single-reviewer to 3+3 pipeline - `state/AUDIT_LOG.jsonl` — appended OQ lock entries + PR coded entry - `state/STATUS_NOW.md` — updated for post-compact + PR #4 in flight + operator-mobile mode ## OQ locks (2026-05-01, operator-confirmed) | Slot | Provider | Model | Reasoning/Thinking | |---|---|---|---| | `tech-glm` / `product-glm` | z.ai | `glm-4.6` | (none — temperature 0.2) | | `tech-claude` / `product-claude` | Anthropic | `claude-opus-4-7` | max thinking (32k tokens) | | `tech-gpt` / `product-gpt` | OpenAI | `gpt-5.5` | `reasoning_effort=high` ("xhigh") | **OQ2 (cost cap):** none — all subscriptions; monitor via Codex CLI status bar. **OQ3 (failure mode):** WAIT — orchestrator refuses to consolidate if any reviewer fails; writes `state/DECISION_REQUIRED.md` + exits non-zero. ## Wait-mode policy (OQ3 in code) `tools/run_review.py` waits for all 6 reviewers (per-reviewer timeout default 600s). If any reviewer fails or times out: - ✅ Successful reviews still saved to `state/reviews/PR-<id>/<reviewer>.json` (audit trail) - ✅ `state/DECISION_REQUIRED.md` written with: failed reviewer(s), kind (timeout/error), remaining successes, 4 recovery options - ✅ Exit code **2** signals escalation - ❌ Consolidator NOT called; no decision packet emitted - ❌ NO silent partial decisions ## Tests **52/52 green:** - 29 new (this PR — parsing, MockProvider, parallel dispatch, wait-mode escalation, CLI exit codes) - 14 consolidator (PR #3) - 4 negative-control gate (PR #1) - 5 platformctl smoke ``` pytest platformctl/tests/ → 52 passed in 4.40s ``` ## NOT in this PR (intentional) - **No live API exercise.** Real provider adapters (`zai.py`, `anthropic.py`, `openai.py`) ship untested against live APIs. First canary 3+3 run on a real Phase 02 PR will validate. Adapters are env-configurable so model strings can be tuned without re-deploy. - **No Forgejo Actions wiring.** `validate.yaml` / `plan.yaml` not yet calling `run_review.py`. Follow-up PR. - **No Claude Code Review plugin route.** `tech-claude` always uses Messages API today; plugin route planned via `PLATFORMCTL_USE_CLAUDE_PLUGIN` flag (open loop `claude-plugin-route-2026-05`). ## Test plan - [x] `pytest platformctl/tests/` → all 52 green - [x] `python -m platformctl.tools.run_review --help` shows full arg list - [ ] Manual: real canary run on a small Phase 02 PR (post-merge) - [ ] Manual: induce one provider failure, verify DECISION_REQUIRED.md surfaces correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code)

pdurlej added 1 commit

2026-05-01 11:41:32 +02:00

feat(review-pipeline): 3+3 orchestrator + 6 provider adapters (PR #4 ) 95fe2918aa

Implements `prompts/01.5c-pm-decision-packet.md §7` per OQ1-OQ3 locks
(operator answers 2026-05-01).

What this adds:
- `tools/run_review.py` — parallel dispatch of 6 reviewers, wait-mode
  policy, escalation via `state/DECISION_REQUIRED.md` if any reviewer
  fails (OQ3 lock). Exit codes: 0=approve, 1=defer, 2=escalation,
  3=usage error.
- `tools/providers/_base.py` — Provider ABC, ReviewRequest dataclass,
  per-reviewer system prompts (tech vs product, 6 specialized "lenses"),
  robust JSON-from-prose parser with ABSTAIN fallback.
- `tools/providers/{zai,anthropic,openai}.py` — real adapters; env-driven
  config (model strings, reasoning effort, thinking budget per OQ1).
  Untested against live APIs in this PR; first canary run validates.
- `tools/providers/_mock.py` — env-controlled mock for tests.
- `tests/test_run_review.py` — 29 tests covering parsing, MockProvider,
  parallel dispatch, wait-mode escalation, CLI exit codes.

OQ locks (per `state/AUDIT_LOG.jsonl#oq_answers`):
- OQ1: glm-4.6 / claude-opus-4-7 + max thinking / gpt-5.5 + high reasoning
- OQ2: no hard cost cap (all subscriptions; monitor via Codex CLI)
- OQ3: WAIT mode, no soft-fail; operator escalation only

Spec docs updated:
- `prompts/01.5c-pm-decision-packet.md` — model spec table + OQ
  answers locked + acceptance criteria checked off
- `prompts/02-catalog.md` Appendix A.5 — replaced single-reviewer
  GLM/Claude path with 3+3+consolidator pipeline

Tests: 52/52 green
- 29 new (this PR)
- 14 consolidator (PR #3)
- 4 negative-control (PR #1)
- 5 smoke (existing)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pdurlej merged commit 4290dc3260 into main

2026-05-01 11:42:32 +02:00

pdurlej referenced this pull request from a commit

2026-05-01 11:42:33 +02:00

feat(review-pipeline): 3+3 orchestrator + 6 provider adapters (PR #4)

pdurlej referenced this pull request from a commit

2026-05-01 11:42:33 +02:00

Merge pull request 'feat(review-pipeline): 3+3 orchestrator + 6 provider adapters' (#4) from codex/review-pipeline/run-review-orchestrator into main

pdurlej referenced this pull request

2026-05-01 11:54:20 +02:00

fix(providers): replace direct-API with local-CLI invocations #5

pdurlej referenced this pull request from a commit