feat(platformctl): implement approved apply path #160
No reviewers
Labels
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
pdurlej/platform!160
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "codex/issues/142-phase3-apply"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Canary status: iterating — canary rerun requested after Codex fixes for PR #160 blocker findings
Canary Context Pack
Product story
Packet 3.3 turns
platformctl applyfrom a skeleton into the guarded apply path needed before RS2000 cutover. The owner needs runtime mutation to require a merged PR SHA, sacred-path enforcement, and durable observed-state recording outsidemodule.yaml.What changed
apply_targetfor module IDs and existing plan-file artifacts.--approved, accepting both PR head SHA and merge commit SHA for current Actions compatibility.module.yamland runs it throughTailscaleTransportafter sacred path/command checks..platform/state/modules/<id>.status.json, not source manifests.platformctl apply <plan-file> --approved <sha>compatible, while allowingplatformctl apply --approved <sha> <module>.FORGEJO_TOKENinto the existing apply workflow's Apply step and set checkoutpersist-credentials: false.Why it changed
Phase 4 and Phase 5 depend on
platformctl applybeing real enough to deploy through the control-plane path instead of manual RS2000 edits.Files touched
control-plane/platformctl/apply.pycontrol-plane/platformctl/cli.pycontrol-plane/platformctl/tests/test_apply_phase3.pycontrol-plane/platformctl/tests/test_forgejo_ci_scripts_contract.pycontrol-plane/forgejo-actions/apply.yamlRelevant context
prompts/codex-cutover-flight/dispatch.mdprompts/codex-cutover-flight/phase-3-operational.mdPacket 3.3PLATFORM_CHARTER.mdobserved state layout and apply flowschema/module.schema.mdnote thatlast_apply_shabelongs in.platform/state/modules/<id>.status.jsondocs/forgejo-agent-operations.mdRuntime evidence
No production mutation performed. Tests use fake transport only. Manual refusal smoke:
PYTHONPATH=control-plane python3 -m platformctl.cli apply --approved fake-sha honcho-redis --jsonreturns exit 6 before runtime access.Known constraints
control-plane/forgejo-actions/still reports pre-existing findings inplan.yaml/validate.yaml; this PR only reducedapply.yamlto medium findings.apply.yamlstill has medium fully-qualified-action lint findings; not expanded here to avoid changing action source semantics inside Packet 3.3.Explicit out-of-scope
Requested decision
Review whether Packet 3.3 is safe to merge after official canary and external review: approved SHA gate, sacred-path enforcement, status update location, and workflow compatibility.
Merge blockers
Spec sources read
prompts/codex-cutover-flight/dispatch.md— mission and sequencingprompts/codex-cutover-flight/phase-3-operational.md— Packet 3.3 contractdocs/forgejo-agent-operations.md— Forgejo identity and PR protocolPLATFORM_CHARTER.md— observed-state and apply flow constraintsschema/module.schema.md—last_apply_shaobserved-state locationcontrol-plane/platformctl/apply.py— implementation targetcontrol-plane/platformctl/safety.py— sacred path contractcontrol-plane/platformctl/cli.py— command wiringcontrol-plane/platformctl/transport/tailscale.py— transport contractcontrol-plane/forgejo-actions/apply.yaml— existing apply workflow compatibilityTests
PYTHONPATH=control-plane python3 -m pytest control-plane/platformctl/tests/test_apply_phase3.py -q— 13 passedPYTHONPATH=control-plane python3 -m pytest control-plane/platformctl/tests/test_apply_phase3.py control-plane/platformctl/tests/test_forgejo_ci_scripts_contract.py -q— 23 passedPYTHONPATH=control-plane python3 -m pytest control-plane/platformctl/tests control-plane/platformctl/transport/tests -q— 339 passedPYTHONPATH=control-plane python3 -m pytest tests -q— 316 passed, 15 skippedPYTHONPATH=control-plane python3 -m platformctl.cli validate modules/honcho-redis --strict-v2 --json— exitCode 0PYTHONPATH=control-plane python3 -m platformctl.cli apply --approved fake-sha honcho-redis --json— exits 6 before runtime accesspython3 control-plane/platformctl/ci/lint_workflows.pyscoped to copiedapply.yaml— exit 0, 5 medium findingsgit diff --check— cleanRefs #142
Packet 3.3 external review checkpoint
Ollama Cloud expensive-review gate: 3/3 approve, no blockers.
Reviewers:
deepseek-v4-pro:cloud: APPROVE; blockers none.kimi-k2.6:cloud: APPROVE; blockers none.minimax-m2.7:cloud: APPROVE; blockers none.Reviewer non-blockers captured:
FORGEJO_REPO, while code readsPLATFORMCTL_FORGEJO_REPO; current hardcoded default ispdurlej/platform, so this is not blocking today but should be hardened.break_glass_reasonis required but not persisted in status/result; audit hardening follow-up.exitCodeis recorded but not gated on; manifest-driven execution from the approved checkout mitigates this, but an explicit plan-health gate would harden the contract.Local verification already recorded in the PR body:
testssuite.Official platform canary remains missing and should still be fired before merge per repo policy.
3+3 ensemble review by
claude— tech + product hatsTech hat: ✅ OK (confidence 0.70)
Risks
medium— Forgejo SHA verification scans only ~500 most-recent closed PRscontrol-plane/platformctl/apply.py:170-194 verify_approved_sha paginates state=closed&limit=50 with default pages=10. Once a repo accumulates >500 closed PRs (state=closed includes merged AND rejected), legitimate older merged SHAs silently fail with the misleading message 'SHA did not match any recOpportunities
Product hat: ✅ OK (confidence 0.70)
Risks
high— Self-declared canary gap — PR description says 'missing — fire canary 3+3 manually before merge'PR description: 'Canary status: missing — fire canary 3+3 manually before merge'. This is the first PR whereplatformctl applywill execute realdocker compose upon rs2000 if the merged-SHA gate passes, so canary evidence is the only signal you have that the gate behaves as designed end-to-end.medium— Forgejo token env var precedence is a 4-name maze (PLATFORMCTL_FORGEJO_TOKEN→FORGEJO_TOKEN→FORGEJO_TOKEN_CODEX→CANARY_FORGEJO_TOKEN)control-plane/platformctl/apply.py_forgejo_token()checks four env names in order and returns the first hit. If you ever set a token under one name during debugging and forget it, a fresher token elsewhere is silently shadowed — and the failure surfaces as the generic 'approved SHA invalid or notapproval.detailso a tired operator sees 'missing Forgejo token' instead of 'SHA invalid or not merged'. Option (b) is the smaller change and higher-value for ADHD debugging.Opportunities
applyreturns the idempotent noop becauselast_apply_shamatches, the result currently saysapproved SHA already applied. Includinglast_apply_atandlast_apply_prfrom the loaded status in the human output would let the operator confirm 'yes I already did this last Tuesday via PR #142' without opening the status file. Cheap win — data is already loaded..platform/state/modules/*.status.jsonis gitignored — Status files are written under repo root (_repo_root_from_modulesresolves to the parent ofmodules/). If.platform/state/is not in.gitignore, every successful apply will leave a tracked-but-uncommitted file that pollutesgit statusand tempts the operator into committing runtime state into the source tree. Worth a one-line verification before merge.3+3 ensemble review by
codex— tech + product hatsTech hat: ❌ NOT_OK (confidence 0.88)
Risks
blocker— Approved SHA is not bound to the checkout or plan being appliedcontrol-plane/platformctl/apply.py:123-171 verifies that the supplied SHA belongs to any recent merged PR, but apply_target later loads module.yaml from the current working tree and mutates runtime without checking that current HEAD or the plan artifact was produced by that SHA.high— Plan artifact exitCode is ignored and can still applycontrol-plane/platformctl/apply.py:364-376 records plan_exitCode but does not reject non-zero plan artifacts; a plan with exitCode 1 and no sacred/destructive changes proceeds to compose_up_command and transport.run.medium— Break-glass allows destructive plan changes but then ignores the plan contentcontrol-plane/platformctl/apply.py:233-259 validates destructive changes only for approval, but apply_target never executes the listed plan changes; it always runs docker compose up for the module at lines 376-388.Product hat: ❌ NOT_OK (confidence 0.88)
Risks
blocker— Approved SHA is not bound to the module or plan being appliedcontrol-plane/platformctl/apply.py:103 verifies that the SHA belongs to any recent merged PR; control-plane/platformctl/apply.py:300 then allows that approved_sha to apply any target module.high— Observed state may be written only to ephemeral runner storagecontrol-plane/platformctl/apply.py:56 writes .platform/state/modules/<id>.status.json locally; control-plane/forgejo-actions/apply.yaml:113 only runs Apply and does not persist, upload, commit, or otherwise durably store that status file.medium— PR exceeds the operator's review-size norm for a runtime mutation pathDiff stats: +809/-116 across 5 files; platform context sets a <=300 LOC norm and says larger PRs should be flagged for split.Opportunities
3+3 ensemble review by
glm— tech + product hatsTech hat: ✅ OK (confidence 0.95)
Risks
low— Forgejo token environment variable precedenceapply.py:115-120 (_forgejo_token)medium— Potential token leakage in remote command resultapply.py:330 (result["remote"]["stdout"])Opportunities
Product hat: ❌ NOT_OK (confidence 0.95)
Risks
high— Secret exposure risk in apply.py helpercontrol-plane/platformctl/apply.py:127 - _forgejo_token() checks 'CANARY_FORGEJO_TOKEN' environment variablemedium— Hardcoded runtime path reduces platform portabilitycontrol-plane/platformctl/apply.py:52 - DEFAULT_RUNTIME_ROOT = '/opt/vps-home-platform-infra'medium— Forgejo API pagination limit mismatch riskcontrol-plane/platformctl/apply.py:106 - verify_approved_sha() defaults pages=10Opportunities
Review decision
Status: BLOCKER — recommended action:
deferSingle-reviewer blockers
Single-reviewer high-risk findings
Reviewer dissents
product-glmvoted NOT_OK (confidence 0.95)tech-gptvoted NOT_OK (confidence 0.88)product-gptvoted NOT_OK (confidence 0.88)Operator decisions (yes/no)
Per-actor evidence: see comments by
claude,codex,glmabove. Tech: 2/3 OK · Product: 1/3 OK.3+3 ensemble review by
claude— tech + product hatsTech hat: ✅ OK (confidence 0.72)
Risks
medium— Broken regex in _safe_excerpt makes password redaction unreliablecontrol-plane/platformctl/apply.py: _TOKENISH_RE pattern uses(password=)[^\s]+— the raw stringr"\s"produces regex\swhich in a character class means literal backslash and literal letters, not whitespace. Intended pattern is[^\s]+(non-whitespace), written asr"\S"orr"[^\s]+"`r"(?i)(bearer\s+)[a-z0-9._=-]{12,}|(token=)[a-z0-9._=-]{12,}|(password=)\S+"and add a unit test forpassword=secret123→password=<redacted>.medium— Workflow sets FORGEJO_REPO but apply.py reads PLATFORMCTL_FORGEJO_REPOcontrol-plane/forgejo-actions/apply.yaml setsFORGEJO_REPO: ${{ github.repository }}on the Apply step. control-plane/platformctl/apply.py:verify_approved_sha readsos.environ.get("PLATFORMCTL_FORGEJO_REPO")only —FORGEJO_REPOis dead. Today this hides becauseDEFAULT_FORGEJO_REPO = "pdurlej`PLATFORMCTL_FORGEJO_REPO(matching the host knob convention), or also acceptFORGEJO_REPOin the env-var fallback chain in apply.py. Same goes for documentingPLATFORMCTL_FORGEJO_HOSTin the workflow if you ever move off the default host.low— PR-files pagination cap of 500 can wrongly reject scope-binding for large PRscontrol-plane/platformctl/apply.py:_forgejo_pr_files iteratespages=10×limit=50= 500 files maximum. _validate_approval_binding uses this list to enforcemodules//prefix. If a legitimate merged PR touches 500+ files (rare but possible — schema/CI sweeps, mass renames), the module's manilimit(no upper cap) for the files endpoint, or surface a distinct error when the file list hits the cap so the operator can re-run with a higher pages value rather than seeing a confusing scope-mismatch error.low— Silent skip of approval-binding check when git is unavailablecontrol-plane/platformctl/apply.py:_validate_approval_binding gates the checkout-vs-approved-SHA check oncurrent_sha is not None. _current_git_sha returns None on any subprocess failure or when not inside a git checkout. Result: in environments without git on PATH (or with a non-repo cwd), the bicurrent_shais None, log a warning into the result (e.g.result["approval"]["checkout_check"] = "skipped: git unavailable") so operators can see when the binding was not enforced. Optionally fail-closed unless an explicit--no-checkout-bindingflag is set.Opportunities
_current_git_shahelpers exist in both control-plane/platformctl/apply.py and control-plane/platformctl/plan.py. Consolidate into a smallgitutilmodule so the hex-validation rules and timeout stay in one place and don't drift._write_statuscomputes the tmp path viapath.with_suffix(path.suffix + ".tmp")for<id>.status.json. This works on current CPython but relies onwith_suffixaccepting.json.tmpas a suffix replacement — semantics for multi-dot suffixes have shifted between Python versions.path.with_name(path.name + ".tmp")is unambiguous and portable.Product hat: ✅ OK (confidence 0.70)
Risks
medium— Observed-state status file is gitignored and ephemeral in CI runners.gitignore now ignores .platform/state/; apply.py writes status to repo-root .platform/state/modules/<id>.status.json (apply.py:_status_path / _write_status). In Forgejo Actions this lives in the runner's checkout, which is destroyed when the job ends. PR description claims 'durable observed-state rOpportunities
3+3 ensemble review by
codex— tech + product hatsTech hat: ❌ NOT_OK (confidence 0.90)
Risks
blocker— Apply workflow can fail before reaching platformctl applycontrol-plane/forgejo-actions/apply.yaml:107-111 appends to .audit/AUDIT_LOG.jsonl, but the repo tracks state/AUDIT_LOG.jsonl and no .audit directory; shell redirection to a missing directory fails the step before the Apply step runs.blocker— Remote stderr is redacted in one field but leaked raw in errorcontrol-plane/platformctl/apply.py:636-643 stores _safe_excerpt(stderr) under remote.stderr, then assigns result["error"] = remote.get("stderr") without redaction; cli.py:40-45 prints the full result JSON.high— Observed status is not durable in the Actions apply pathcontrol-plane/platformctl/apply.py:92-97 writes .platform/state/modules/<id>.status.json only in the local checkout; .gitignore ignores .platform/state/, and control-plane/forgejo-actions/apply.yaml:139-169 does not upload, commit, or otherwise persist that file after the job.Product hat: ❌ NOT_OK (confidence 0.82)
Risks
high— Observed apply state is not actually durable.gitignore:2 ignores .platform/state/ while control-plane/platformctl/apply.py writes last_apply_sha only to .platform/state/modules/<id>.status.json; apply.yaml does not persist or upload that state.medium— PR is too large for a runtime-mutation safety changeDiff stats: +1187/-118 across 7 files, touching approval verification, CLI shape, plan source SHA, workflow secrets, sacred-path enforcement, SSH execution, and state recording in one PR.Opportunities
3+3 ensemble review by
glm— tech + product hatsTech hat: ✅ OK (confidence 0.95)
Risks
low— Subprocess environment inherits secretscontrol-plane/platformctl/apply.py:180, control-plane/platformctl/plan.py:60subprocess.runcalls in_current_git_shaand potentially other places inherit the parent environment by default. Whilegitis trusted, best practice for secret hygiene is to pass an explicit environment dict (e.g.,env={}) to prevent accidental secret leakage into child processes.Product hat: ✅ OK (confidence 0.95)
Risks
high— Missing canary execution recordPR Description: 'Canary status: missing — fire canary 3+3 manually before merge'medium— Secret token exposure risk in apply workflowcontrol-plane/forgejo-actions/apply.yaml:116-117 (FORGEJO_TOKEN passed as env var)FORGEJO_TOKENinto logs. Thepersist-credentials: falseon checkout is good, but ensure noset -xor debug logging in the apply script leaks the token.Opportunities
EXIT_APPROVAL_MISMATCHare detailed (e.g., 'did not change modules/honcho-redis/'). This is excellent for the operator to diagnose PR scope issues quickly without digging into Forgejo UI manually.last_apply_sha == approved_shareturningnoop(exit 0) means re-running a successful apply doesn't require anxiety or attention. This fits the 'robustness' goal for a single-operator platform.Review decision
Status: BLOCKER — recommended action:
deferSingle-reviewer blockers
Single-reviewer high-risk findings
Reviewer dissents
product-gptvoted NOT_OK (confidence 0.82)tech-gptvoted NOT_OK (confidence 0.90)Operator decisions (yes/no)
Per-actor evidence: see comments by
claude,codex,glmabove. Tech: 2/3 OK · Product: 2/3 OK.3+3 ensemble review by
claude— tech + product hatsTech hat: ✅ OK (confidence 0.70)
Risks
medium— PR file enumeration capped at 500 entries can falsely reject valid appliescontrol-plane/platformctl/apply.py _forgejo_pr_files (default pages=10, limit=50). _module_changed_by_approved_pr scans onlyapproval.changed_files; a large PR (>500 changed files) whosemodules//entries fall past page 10 will return False and trip thedid not change modules//rejectipagescap), raise the cap to a generous number, or only fetch the file list when the changed-files check is needed and short-circuit positive on first hit. Test with a synthetic >500-file PR fixture.medium— _safe_excerpt redaction misses common secret formats in remote stdout/stderrcontrol-plane/platformctl/apply.py:_TOKENISH_RE only catchesbearer,token=,password=. Output like"api_key":"...", AWS-styleAKIA..., raw 40-char hex creds,Authorization: Basic ..., or environment dumpsMY_SECRET=...pass through unredacted into result.remote.stdout/stderr(api[_-]?key|secret|credential)\s*[:=]\s*\S+, base64-blob length thresholds, common cloud key prefixes) or — safer — drop remote stdout/stderr from artifacts entirely and only persist returncode + a structured result. compose-up shouldn't need stdout in the persisted status.low— Idempotency noop check is TOCTOU against concurrent appliescontrol-plane/platformctl/apply.py:_load_statusthen later_write_statuswith no advisory lock. Two concurrentplatformctl apply --approvedruns (e.g. CI re-trigger or workflow retry) can both pass thelast_apply_sha == approved_shanoop check, both invoketransport.run, bo<repo>/.platform/state/modules/<id>.lockaround the read→ssh→write block, or treat the workflowconcurrency:group as the source of truth and document that local apply is single-runner. Either way, cite the strategy in apply.py so reviewers know which side enforces.Opportunities
compose up -dsuccess without verifying container health — control-plane/platformctl/apply.py writeslast_apply_shaandlast_apply_atwhenevertransport.runreturncode==0.docker compose up -d <service>returns 0 as soon as container creation succeeds; a service that immediately crash-loops still leaves the status file claiming the SHA was applied. Acknowledged Phase 4 scope (smoke_test), but a 1-line `docker compose ps --status running --format jgit status --porcelain. A plan generated from a dirty checkout records source_sha=HEAD but reflects uncommitted content; on apply the plan_source_sha→approved_sha equality check would still pass even though the plan artifact does not match what was reviewed in the PR. Low-impact in CI (clean checkout) but worth eitherProduct hat: ✅ OK (confidence 0.75)
Risks
medium— Merging flips apply from skeleton to live; operator should explicitly acknowledge first real-mutation pathcontrol-plane/platformctl/apply.py: removes Phase-05 gate, replaces 'not-implemented-yet' return with TailscaleTransport.run() that executes docker compose up -d on rs2000. control-plane/forgejo-actions/apply.yaml lines 109-138 now wire the token + PR number through to a real SSH-executing apply stelow— Token resolution leaks an agent identity (FORGEJO_TOKEN_CODEX) into the apply toolcontrol-plane/platformctl/apply.py _forgejo_token() iterates ('PLATFORMCTL_FORGEJO_TOKEN', 'FORGEJO_TOKEN_CODEX', 'FORGEJO_TOKEN'). The middle name encodes 'this token belongs to the codex agent' inside platformctl, which should be agent-agnostic.Opportunities
platformctl status <module>view — The new .platform/state/modules/.status.json carries last_apply_sha, last_apply_pr, last_apply_at, last_apply_host, last_apply_returncode. That is exactly what an operator-visible 'when was this module last touched and by which PR' lookup would need. Worth noting for the Phase 4 observability packet so the file format is treated as a published surface, not an internal artifact, before consumer3+3 ensemble review by
codex— tech + product hatsTech hat: ❌ NOT_OK (confidence 0.86)
Risks
blocker— Apply rejects approved runtime-only PRs unless module.yaml changedcontrol-plane/platformctl/apply.py:_validate_approval_binding rejects any approved PR whose file list does not include modules/<module_id>/high— Plan-file apply can execute a compose up even when the plan has zero changescontrol-plane/platformctl/apply.py: apply_target sets actions_planned from plan changes but always calls compose_up_command(manifest) after _validate_plan_binding accepts exitCode 0 or 1medium— Idempotency check uses approved_sha only, not the normalized applied PR referencecontrol-plane/platformctl/apply.py: status['last_apply_sha'] is compared only to approved_sha, while approval accepts both PR head SHA and merge_commit_shaProduct hat: ❌ NOT_OK (confidence 0.86)
Risks
high— Observed apply state is not durable.gitignore adds.platform/state/;apply.pywriteslast_apply_shaonly to local.platform/state/modules/.status.json;apply.yamluploads it as an artifact but never restores or commits itplatformctl applyread it on later runs, or explicitly narrows Packet 3.3 to non-durable runner-local state.medium— PR exceeds safe review surface for runtime mutationDiff stats are +1217/-120 across 7 files for the first real runtime apply pathOpportunities
3+3 ensemble review by
glm— tech + product hatsTech hat: ✅ OK (confidence 0.95)
Risks
low— Minor exit code inconsistency in error messagecontrol-plane/platformctl/apply.py:619low— Forgejo PR files pagination limit (500 files)control-plane/platformctl/apply.py:168 - default pages=10 with limit=50Opportunities
Product hat: ⚪ ABSTAIN (confidence 0.00)
Risks
medium— reviewer output unparseableno valid JSON object found in responseReview decision
Status: BLOCKER — recommended action:
deferSingle-reviewer blockers
Single-reviewer high-risk findings
platformctl applyread it on later runs, or explicitly narrows Packet 3.3 to non-durable runner-local state.Reviewer dissents
product-gptvoted NOT_OK (confidence 0.86)tech-gptvoted NOT_OK (confidence 0.86)product-glmvoted ABSTAIN (confidence 0.00)Operator decisions (yes/no)
Per-actor evidence: see comments by
claude,codex,glmabove. Tech: 2/3 OK · Product: 1/3 OK.3+3 ensemble review by
claude— tech + product hatsTech hat: ✅ OK (confidence 0.70)
Risks
high— Checkout-vs-approved-SHA binding has zero test coveragecontrol-plane/platformctl/apply.py:591-592 —enforce_checkout_bindingdefaults to True only whenapproval_checker is None. Every test in test_apply_phase3.py injectsapproval_checker=approve, so_current_git_sha(repo_root)is never compared against approved-SHA refs in tests. test_apply_succgit init, write file, commit, capture HEAD), then callsapply_target(..., approval_checker=None, ...)(or explicitenforce_checkout_binding=True) with (a) approved_sha == HEAD → success, (b) approved_sha != HEAD → EXIT_APPROVAL_MISMATCH withapproved SHA does not match current checkout. Without this, the gate is implementamedium— Module-only apply requires module.yaml change, but compose-file-only PRs deploy outside that pathcontrol-plane/platformctl/apply.py:425-431 (_module_changed_by_approved_prcheckspath.startswith(f"modules/{module_id}/")). Honcho-redis manifest declarescompose_file: compose/apps/compose.yaml(outsidemodules/honcho-redis/). A PR that only editscompose/apps/compose.yaml(e.g., image bcompose_filepath, or (b) document explicitly that compose-only changes requireplatformctl plan ... -o plan.json && platformctl apply plan.jsonrather than module-id apply. The current behaviour will surprise operators on the first real RS2000 deploy.medium— Audit log path silently moved from .audit/ to state/control-plane/forgejo-actions/apply.yaml:108-112 and 144-148 —>> .audit/AUDIT_LOG.jsonlchanged to>> state/AUDIT_LOG.jsonlin both audit_start and audit_finish steps; the closing PR comment was already updated to referencestate/AUDIT_LOG.jsonl. The PR description and packet contract describ.gitignoreentry forstate/AUDIT_LOG.jsonlso the runner's local file doesn't surface in working-tree diffs.medium— PR exceeds 300-LOC charter norm by ~4×Diff stats +1294/-120; apply.py alone goes from 137 to 715 lines. Per platform charter, larger PRs should be flagged for split rather than merged with concerns.Opportunities
<id>.status.json.tmpfilename. Two concurrent applies for the same module (or even an interrupted-then-resumed apply) would race the .replace(). Not a current scenario (Forgejo workflow serializes), but a one-line fix (tmp = path.with_name(f"{path.name}.{os.getpid()}.tmp")) makes it safe for any future parallel runner.git.pymodule would prevent drift — e.g., the apply.py copy lowercases the SHA, the plan.py copy preserves case before validating, which is a subtle inconsistency that could matter if Forgejo ever returns mixed-case SHAs in API responses (current Gitea/Forgejo always lowercases, so harmProduct hat: ✅ OK (confidence 0.72)
Risks
medium— PR size (~1.3k LOC) exceeds the ≤300 LOC normDiff stats +1294/-120; apply.py alone grows by ~595 LOC plus 528 LOC of new tests. Charter calls for splitting at this size..gitignore/workflow-token wiring could have been a precursor PR.low— Audit log path silently moved from.audit/tostate/AUDIT_LOG.jsonlcontrol-plane/forgejo-actions/apply.yaml lines 109, 145 —>> .audit/AUDIT_LOG.jsonlbecomes>> state/AUDIT_LOG.jsonl. The change isn't called out in the PR description except indirectly in the PR comment template, and there's no migration note for any prior.audit/content..audit/AUDIT_LOG.jsonlbefore merge. If.audit/was never written to in practice this is a no-op — but worth a one-line confirmation in the merge comment so future-you doesn't go hunting for missing audit history.Opportunities
.platform/state/modules/<id>.status.jsonis now the durable record of what was applied, but the apply workflow's PR comment only mentionsstate/AUDIT_LOG.jsonland the artifact name. For a tired operator approving on mobile, includinglast_apply_sha/last_apply_atdirectly in the ntfy or PR comment (read out of the status file) would remove the click-through into th3+3 ensemble review by
codex— tech + product hatsTech hat: ❌ NOT_OK (confidence 0.90)
Risks
blocker— Plan-file apply is not bound to the approved PR’s changed filescontrol-plane/platformctl/apply.py:_validate_approval_binding skips module changed-file enforcement when plan_bound=True, and _validate_plan_binding trusts user-controlled plan.source_shamedium— Approved SHA search can miss valid older merged PRs and block applycontrol-plane/platformctl/apply.py:verify_approved_sha scans only the first 10 pages of closed PRs unless --approved-pr is suppliedProduct hat: ❌ NOT_OK (confidence 0.90)
Risks
blocker— Split oversized runtime-mutation PR before approvalDiff stats: +1294/-120 across 7 files; PR implements approval verification, plan binding, SSH apply, status persistence, CLI changes, workflow credentials, and tests in one change.high— Observed state may be hidden in expiring artifacts.gitignore adds.platform/state/;control-plane/forgejo-actions/apply.yamluploads.platform/state/modules/*.status.jsononly asplatform-apply-state-${{ github.event.inputs.pr_number }}artifact.3+3 ensemble review by
glm— tech + product hatsTech hat: ✅ OK (confidence 0.95)
Risks
medium— apply.py exceeds 300 LOC normcontrol-plane/platformctl/apply.py: +663/-43 (706 lines total)low— PR file list pagination may miss module changes in very large PRscontrol-plane/platformctl/apply.py:192 (_forgejo_pr_files pages default=10, limit=50 → 500 files max)Opportunities
Product hat: ✅ OK (confidence 0.90)
Risks
low— Apply state location not yet ignored.gitignore shows.platform/state/added, but operator needs to know this dir is for runtime state only, not source manifests.low— Error exit codes mismatch for 'noop' vs 'applied'apply.py returns EXIT_NO_CHANGES (0) for both noop plan (in-sync) and successful apply.Opportunities
.platform/state/modules/<id>.status.json, keeping observed state separate from sourcemodule.yamlas per charter.Review decision
Status: BLOCKER — recommended action:
deferSingle-reviewer blockers
Single-reviewer high-risk findings
git init, write file, commit, capture HEAD), then callsapply_target(..., approval_checker=None, ...)(or explicitenforce_checkout_binding=True) with (a) approved_sha == HEAD → success, (b) approved_sha != HEAD → EXIT_APPROVAReviewer dissents
tech-gptvoted NOT_OK (confidence 0.90)product-gptvoted NOT_OK (confidence 0.90)Operator decisions (yes/no)
Per-actor evidence: see comments by
claude,codex,glmabove. Tech: 2/3 OK · Product: 2/3 OK.Codex status after canary rerun on commit
6d1f570: terminal recommendation issplit_pr, not another in-place iteration.Why:
BLOCKER/defer.Split oversized runtime-mutation PR before approval.Current useful fixes already pushed on this branch:
source_shastate/AUDIT_LOG.jsonlplatform-apply-state-<pr>artifact345 passed, root tests316 passed / 15 skippedRecommended next action: split #160 into smaller PRs: approval binding, plan artifact metadata, compose apply execution, workflow evidence/artifacts. Do not merge #160 as-is.
Superseded by logical split PRs:
Reason: canary on this oversized PR correctly identified that approval binding, plan provenance, runtime mutation, evidence durability, and workflow token wiring should be reviewed as separate security-sensitive invariants. Closing this PR; do not merge #160.
Pull request closed