pdurlej/platform

Fork 0

fix(providers): replace direct-API with local-CLI invocations #5

Merged

pdurlej merged 2 commits from codex/review-pipeline/local-cli-providers into main

2026-05-01 12:14:05 +02:00

pdurlej commented

2026-05-01 11:54:20 +02:00

Owner

Summary

Architectural correction post PR #4. Operator clarified (2026-05-01, after PR #4 merged): claude + codex reviewers use local CLI subscriptions, not direct API. Only z.ai goes via HTTPS (key from Infisical).

PR #4 shipped anthropic.py and openai.py as direct-API HTTP adapters. Wrong design — would have required managing 3 API keys, paying per-call instead of using existing Max/Pro subscriptions, and missing that Codex CLI has codex exec --sandbox read-only purpose-built for review.

This PR fixes that.

What changes

New providers:

providers/claude_cli.py — claude -p --model opus --effort max --system-prompt … --no-session-persistence. Uses subscription auth via Keychain. Fresh Claude session per reviewer call → independent reasoning.
providers/codex_cli.py — codex exec --sandbox read-only --ignore-user-config -m gpt-5.5 -c model_reasoning_effort=high. Uses subscription auth via codex's own login. Fresh Codex session per reviewer call.

Registry update (run_review.py REVIEWER_REGISTRY):

Slot	Was (PR #4)	Now (PR #5)
`tech-claude`	`anthropic` (HTTPS)	`claude_cli` (local)
`product-claude`	`anthropic`	`claude_cli`
`tech-gpt`	`openai` (HTTPS)	`codex_cli` (local)
`product-gpt`	`openai`	`codex_cli`
`tech-glm` / `product-glm`	`zai`	`zai` (unchanged)

Fallbacks preserved: anthropic.py + openai.py stay in providers/ for edge cases (CI runners without CLIs, or operator preference for billed API on a specific reviewer). Override via --provider-override tech-claude=anthropic.

z.ai docs: Infisical path documented inline:

/home-platform/providers/zai/ZAI_API_KEY (prod env)

With infisical secrets get … snippet for export.

Spec rewrite: prompts/01.5c §2 — auth model section + invocation tables. Cost stance updated: claude/gpt have $0 per-PR cost (subscriptions); only z.ai trickles cents.

Operator inbox convention: state/OPERATOR_INBOX.md — new file for operator → claude inbound messages while operator is mobile. ScheduleWakeup loop set up (10-min interval) to check.

Tests

57/57 green:

34 in test_run_review.py (was 29; +5 wire-up tests):
- test_get_provider_claude_cli / test_get_provider_codex_cli
- test_default_registry_uses_local_cli_for_claude_and_gpt
- test_get_provider_anthropic_fallback_still_available
- test_get_provider_openai_fallback_still_available
14 consolidator (PR #3)
4 negative-control (PR #1)
5 platformctl smoke

pytest platformctl/tests/ → 57 passed in 3.84s

NOT in this PR (intentional)

Real CLI invocation untested. claude_cli.py + codex_cli.py ship without exercising live CLI. Subprocess machinery + flag combos are based on claude --help / codex exec --help. First canary 3+3 run on a small Phase 02 PR will validate; flags can be tuned via env vars without code change.
No Forgejo Actions wiring. Still follow-up.
Anthropic + OpenAI HTTP providers not deleted. Kept as fallbacks; could be removed in a future PR once operator confirms CLI path is reliable for canary + 5+ subsequent runs.

Test plan

pytest platformctl/tests/ → 57 green
Manual: real canary 3+3 run on a small Phase 02 PR (post-merge); verify all 6 reviewers respond
Manual: induce claude CLI failure (e.g., wrong model alias), verify ProviderError surfaces correctly + DECISION_REQUIRED.md fires

🤖 Generated with Claude Code

## Summary Architectural correction post PR #4. Operator clarified (2026-05-01, after PR #4 merged): claude + codex reviewers use **local CLI subscriptions**, not direct API. Only z.ai goes via HTTPS (key from Infisical). PR #4 shipped `anthropic.py` and `openai.py` as direct-API HTTP adapters. Wrong design — would have required managing 3 API keys, paying per-call instead of using existing Max/Pro subscriptions, and missing that Codex CLI has `codex exec --sandbox read-only` purpose-built for review. This PR fixes that. ## What changes **New providers:** - `providers/claude_cli.py` — `claude -p --model opus --effort max --system-prompt … --no-session-persistence`. Uses subscription auth via Keychain. Fresh Claude session per reviewer call → independent reasoning. - `providers/codex_cli.py` — `codex exec --sandbox read-only --ignore-user-config -m gpt-5.5 -c model_reasoning_effort=high`. Uses subscription auth via codex's own login. Fresh Codex session per reviewer call. **Registry update (`run_review.py` REVIEWER_REGISTRY):** | Slot | Was (PR #4) | Now (PR #5) | |---|---|---| | `tech-claude` | `anthropic` (HTTPS) | `claude_cli` (local) | | `product-claude` | `anthropic` | `claude_cli` | | `tech-gpt` | `openai` (HTTPS) | `codex_cli` (local) | | `product-gpt` | `openai` | `codex_cli` | | `tech-glm` / `product-glm` | `zai` | `zai` (unchanged) | **Fallbacks preserved:** `anthropic.py` + `openai.py` stay in `providers/` for edge cases (CI runners without CLIs, or operator preference for billed API on a specific reviewer). Override via `--provider-override tech-claude=anthropic`. **z.ai docs:** Infisical path documented inline: ``` /home-platform/providers/zai/ZAI_API_KEY (prod env) ``` With `infisical secrets get …` snippet for export. **Spec rewrite:** `prompts/01.5c §2` — auth model section + invocation tables. Cost stance updated: claude/gpt have $0 per-PR cost (subscriptions); only z.ai trickles cents. **Operator inbox convention:** `state/OPERATOR_INBOX.md` — new file for operator → claude inbound messages while operator is mobile. ScheduleWakeup loop set up (10-min interval) to check. ## Tests **57/57 green:** - 34 in `test_run_review.py` (was 29; +5 wire-up tests): - `test_get_provider_claude_cli` / `test_get_provider_codex_cli` - `test_default_registry_uses_local_cli_for_claude_and_gpt` - `test_get_provider_anthropic_fallback_still_available` - `test_get_provider_openai_fallback_still_available` - 14 consolidator (PR #3) - 4 negative-control (PR #1) - 5 platformctl smoke ``` pytest platformctl/tests/ → 57 passed in 3.84s ``` ## NOT in this PR (intentional) - **Real CLI invocation untested.** `claude_cli.py` + `codex_cli.py` ship without exercising live CLI. Subprocess machinery + flag combos are based on `claude --help` / `codex exec --help`. First canary 3+3 run on a small Phase 02 PR will validate; flags can be tuned via env vars without code change. - **No Forgejo Actions wiring.** Still follow-up. - **Anthropic + OpenAI HTTP providers not deleted.** Kept as fallbacks; could be removed in a future PR once operator confirms CLI path is reliable for canary + 5+ subsequent runs. ## Test plan - [x] `pytest platformctl/tests/` → 57 green - [ ] Manual: real canary 3+3 run on a small Phase 02 PR (post-merge); verify all 6 reviewers respond - [ ] Manual: induce `claude` CLI failure (e.g., wrong model alias), verify ProviderError surfaces correctly + DECISION_REQUIRED.md fires 🤖 Generated with [Claude Code](https://claude.com/claude-code)

pdurlej added 1 commit

2026-05-01 11:54:20 +02:00

fix(providers): replace direct-API with local-CLI invocations (PR #5 ) 09df668316

Architectural correction per operator clarification 2026-05-01 (post PR #4
merge): claude + codex reviewers use LOCAL CLI subscriptions, NOT direct
Anthropic/OpenAI APIs. Only z.ai goes via HTTPS (key from Infisical).

Why this matters:
- All 3 providers under operator's existing subscriptions (Anthropic Max,
  OpenAI Pro, z.ai paid). Local CLI = no per-PR API costs for claude/gpt.
- One key to manage (z.ai), not three. Removes API-key-management surface
  the previous design implied.
- Codex CLI has `codex exec --sandbox read-only` purpose-built for review;
  Claude CLI has `claude -p --effort max` for max thinking. Both ship.

Changes:
- NEW `providers/claude_cli.py` — `claude -p --model opus --effort max
  --system-prompt … --no-session-persistence --output-format text`
- NEW `providers/codex_cli.py` — `codex exec --sandbox read-only
  --ignore-user-config -m gpt-5.5 -c model_reasoning_effort=high`
- `providers/__init__.py` — registry adds claude_cli + codex_cli;
  anthropic + openai remain as fallbacks for CI/edge cases
- `providers/zai.py` — docs updated with Infisical path:
  `/home-platform/providers/zai/ZAI_API_KEY` (prod env)
- `run_review.py` REVIEWER_REGISTRY swapped:
  - tech-claude / product-claude → claude_cli
  - tech-gpt / product-gpt → codex_cli
  - tech-glm / product-glm → zai (unchanged)
- `prompts/01.5c §2` — auth model section + invocation tables rewritten
- `state/OPERATOR_INBOX.md` — NEW convention file for operator → claude
  inbound messages (operator pushed: "sam sobie dodaj recuring loop")
- `state/AUDIT_LOG.jsonl` — appended OQ correction + PR5 coded entries
- `state/STATUS_NOW.md` — refreshed live status

Tests: 57/57 green
- 34 in test_run_review.py (was 29; +5 wire-up tests for new providers)
- 14 consolidator (PR #3)
- 4 negative-control (PR #1)
- 5 platformctl smoke

Recurring inbox-check loop scheduled via ScheduleWakeup (600s interval)
so claude session can react to operator messages while operator is mobile.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pdurlej referenced this pull request from a commit

2026-05-01 11:54:20 +02:00

fix(providers): replace direct-API with local-CLI invocations (PR #5)

pdurlej referenced this pull request from a commit

2026-05-01 11:59:50 +02:00

fix(parser): harden against codex CLI prompt-echo + duplicate JSON

pdurlej added 1 commit

2026-05-01 11:59:50 +02:00

fix(parser): harden against codex CLI prompt-echo + duplicate JSON de7526d508

Real-world output from `codex exec` (verified 2026-05-01 smoke test) emits:
- Preamble (version, workdir, model, sandbox info)
- "user" header + prompt echo (which itself contains an example {...} JSON
  if the prompt asks for JSON output)
- "codex" header + actual model JSON answer
- Mid-response error log (e.g., `codex_core::session` rollout warnings)
- Token usage line
- Duplicate JSON answer

The previous parser used "first { to last }" slicing which would capture
EVERYTHING between the prompt-echo's example brace and the duplicate-emit's
closing brace — unparseable, fell back to ABSTAIN.

This commit:
- Adds `_extract_balanced_json_blocks` — depth-counted, string-aware scan
  for top-level balanced `{...}` blocks
- Two-pass parse: prefer candidates with `verdict` key (the real answer);
  fall back to first parseable dict (lenient)
- New test fixtures:
  - `test_parse_handles_codex_cli_real_output` — verbatim captured output
  - `test_parse_skips_prompt_echo_without_verdict_key` — synthetic case

Tests: 59/59 (was 57; +2 here, +5 in PR #5 commit, all green).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pdurlej merged commit ea39b28f0e into main

2026-05-01 12:14:05 +02:00

pdurlej referenced this pull request from a commit

2026-05-01 12:14:07 +02:00

Merge pull request 'fix(providers): replace direct-API with local-CLI invocations' (#5) from codex/review-pipeline/local-cli-providers into main

pdurlej referenced this pull request from a commit

2026-05-01 16:21:13 +02:00

fix(canary-findings): parser bug + claude_cli argv leak + zai default URL

pdurlej referenced this pull request

2026-05-01 16:21:44 +02:00

fix(canary-findings): parser bug + claude_cli argv leak + zai default URL #6