fix(canary-findings): parser bug + claude_cli argv leak + zai default URL #6
No reviewers
Labels
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
pdurlej/platform!6
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "codex/review-pipeline/canary-findings-pr6"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
First canary 3+3 review fired (2026-05-01, retrospective on PR #5) surfaced BLOCKER + 3 HIGH real bugs in the very PR #5 code. 3/3 tech reviewers convergent on the parser bug — exactly what the cross-provider redundancy was designed to catch.
This PR fixes all 4 critical findings. NO auto-merge — operator green light was scoped to a small "default URL fix"; canary surfaced bigger issues. Operator reviews when back.
Canary results (full artifacts in
state/reviews/PR-5/)Tech consensus: 3/3 dissent. Product: 1/3 dissent. Risk = BLOCKER, action = defer.
What this PR fixes
1. BLOCKER — Parser returns prompt-echo's JSON instead of model answer
_base.py:parse_reviewer_responseiterates candidates in source order and picks the FIRST verdict-containing block. But codex_cli echoes the user_message (which embeds the diff verbatim) BEFORE the model's answer. If the diff contains{"verdict":"OK"}literals (e.g.,test_run_review.pyhas 24+ such occurrences), parser silently returns the diff-embedded verdict instead of the model's actual answer.Fix: iterate in REVERSE order. Model answer always comes last in CLI output.
Regression test added with prompt-echo
OK+ model answerNOT_OK— would FAIL before fix, PASSES after.2. HIGH — Claude provider puts full PR payload in argv
claude_cli.pypasseduser_message(full diff) as-pargv. Issues:ps auxshows the diff to anyone with shell accessFix: pass via stdin. Verified
claude -preads stdin when no positional. Also added--bareflag for faster, isolated reviewer sessions (skips hooks/LSP/plugin-sync/auto-memory/keychain/CLAUDE.md). Override viaPLATFORMCTL_CLAUDE_NO_BARE=1.3. HIGH — z.ai key export docs leak secret to /tmp
CANARY_PLAN.md and zai.py docstring had
infisical secrets get ... | tee /tmp/zai-key.txt && export ... && rm /tmp/zai-key.txtpattern. The /tmp file briefly exposes the key on disk.Fix: removed temp-file pattern. Documented direct
export $(infisical secrets get ... --plain)and scopedinfisical run -- python ...alternatives.4. HIGH — Tests added in PR #5 don't exercise bug class
test_parse_handles_codex_cli_real_outputused IDENTICALOKJSON in both prompt-echo and model-answer positions, masking the bug.Fix: new test
test_parse_picks_LAST_verdict_block_when_diff_contains_verdict_literaluses different verdicts in each position. Asserts NOT_OK is returned.Bonus changes
codex_cli.py--ephemeral(tech-gpt opportunity): no session persistence, matches reviewer intent, removes the rollout-recording noise.zai.pydefault URL: PaaS (/api/paas/v4) → Coding (/api/coding/paas/v4). Operator's subscription is Coding Plan; PaaS requires separate balance/recharge — caused the original "find the working endpoint" investigation. Now the default works without env override.Tests
60/60 green:
test_run_review.py(was 36; +1 regression test for parser bug class)NOT in this PR (deferred reviewer findings, low-medium severity)
Architectural / opportunity-grade items worth a separate PR:
claude --json-schemafor structured output (eliminates parser drift entirely; tech-claude × 2)claude --output-format jsonenvelope (cleaner than text parsing)smoke-test-providers.shtoplatformctl provider-doctorsubcommand (product-gpt)local-subscriptionsvsapi-fallback(product-gpt)These are queued; see
state/AUDIT_LOG.jsonlfor the canary findings index.Test plan
pytest platformctl/tests/→ 60 green🤖 Generated with Claude Code
Rejected per operator architectural decision (2026-05-01)
Successor: PR #7 (
codex/review-pipeline/improved-from-pr6) — narrow code-only scope, links back here.Why rejected (convergent reviewer findings, canary #3):
state/DISPATCH_BRIDGE.json— session_id, absolute paths) (4 reviewers convergent)state/canary/CANARY_PLAN.mdPath C) (4 reviewers convergent)Pattern enshrined (per Oracle consult 2026-05-01, will land in PLATFORM_CHARTER via PR #7):
What stays in record (canary findings):
state/reviews/PR-5/(canary #1 — caught real parser BLOCKER) — committed in PR #6, will stay on this branch's historyThis rejection is the FIRST exercise of the "PRs as learning log" pattern.
Pull request closed