fix(providers): replace direct-API with local-CLI invocations #5
No reviewers
Labels
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
pdurlej/platform!5
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "codex/review-pipeline/local-cli-providers"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Architectural correction post PR #4. Operator clarified (2026-05-01, after PR #4 merged): claude + codex reviewers use local CLI subscriptions, not direct API. Only z.ai goes via HTTPS (key from Infisical).
PR #4 shipped
anthropic.pyandopenai.pyas direct-API HTTP adapters. Wrong design — would have required managing 3 API keys, paying per-call instead of using existing Max/Pro subscriptions, and missing that Codex CLI hascodex exec --sandbox read-onlypurpose-built for review.This PR fixes that.
What changes
New providers:
providers/claude_cli.py—claude -p --model opus --effort max --system-prompt … --no-session-persistence. Uses subscription auth via Keychain. Fresh Claude session per reviewer call → independent reasoning.providers/codex_cli.py—codex exec --sandbox read-only --ignore-user-config -m gpt-5.5 -c model_reasoning_effort=high. Uses subscription auth via codex's own login. Fresh Codex session per reviewer call.Registry update (
run_review.pyREVIEWER_REGISTRY):tech-claudeanthropic(HTTPS)claude_cli(local)product-claudeanthropicclaude_clitech-gptopenai(HTTPS)codex_cli(local)product-gptopenaicodex_clitech-glm/product-glmzaizai(unchanged)Fallbacks preserved:
anthropic.py+openai.pystay inproviders/for edge cases (CI runners without CLIs, or operator preference for billed API on a specific reviewer). Override via--provider-override tech-claude=anthropic.z.ai docs: Infisical path documented inline:
With
infisical secrets get …snippet for export.Spec rewrite:
prompts/01.5c §2— auth model section + invocation tables. Cost stance updated: claude/gpt have $0 per-PR cost (subscriptions); only z.ai trickles cents.Operator inbox convention:
state/OPERATOR_INBOX.md— new file for operator → claude inbound messages while operator is mobile. ScheduleWakeup loop set up (10-min interval) to check.Tests
57/57 green:
test_run_review.py(was 29; +5 wire-up tests):test_get_provider_claude_cli/test_get_provider_codex_clitest_default_registry_uses_local_cli_for_claude_and_gpttest_get_provider_anthropic_fallback_still_availabletest_get_provider_openai_fallback_still_availableNOT in this PR (intentional)
claude_cli.py+codex_cli.pyship without exercising live CLI. Subprocess machinery + flag combos are based onclaude --help/codex exec --help. First canary 3+3 run on a small Phase 02 PR will validate; flags can be tuned via env vars without code change.Test plan
pytest platformctl/tests/→ 57 greenclaudeCLI failure (e.g., wrong model alias), verify ProviderError surfaces correctly + DECISION_REQUIRED.md fires🤖 Generated with Claude Code
Real-world output from `codex exec` (verified 2026-05-01 smoke test) emits: - Preamble (version, workdir, model, sandbox info) - "user" header + prompt echo (which itself contains an example {...} JSON if the prompt asks for JSON output) - "codex" header + actual model JSON answer - Mid-response error log (e.g., `codex_core::session` rollout warnings) - Token usage line - Duplicate JSON answer The previous parser used "first { to last }" slicing which would capture EVERYTHING between the prompt-echo's example brace and the duplicate-emit's closing brace — unparseable, fell back to ABSTAIN. This commit: - Adds `_extract_balanced_json_blocks` — depth-counted, string-aware scan for top-level balanced `{...}` blocks - Two-pass parse: prefer candidates with `verdict` key (the real answer); fall back to first parseable dict (lenient) - New test fixtures: - `test_parse_handles_codex_cli_real_output` — verbatim captured output - `test_parse_skips_prompt_echo_without_verdict_key` — synthetic case Tests: 59/59 (was 57; +2 here, +5 in PR #5 commit, all green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>lint --cross-refsto catch INDEX/ADR/runbook integrity drift automatically #767