fix(cutover): support minio-init one-shot apply #379

Merged
pdurlej merged 1 commit from codex/domain/cutover-ops-minio-init into main 2026-05-18 23:19:01 +02:00
Collaborator

Canary status: missing — required Forgejo checks/canary before merge

Canary Context Pack

Product story

#310 blocks the remaining cutover-ops path because minio-init is a sidecar-bound one-shot that currently exits non-zero. We need a safe way to rerun it after the MinIO backup without weakening the general F3 rule that stateful smokes are no-op by default.

What changed

  • Fixed minio-init compose command shape so /bin/sh -c receives the full idempotent bucket bootstrap script.
  • Marked minio-init strict-v2 complete and declared spec.runtime.expected_state: exited-success.
  • Added schema/docs for one-shot expected state and sidecar backup_ref_modules.
  • Updated platformctl plan/health/smoke handling so exited 0 is healthy for declared one-shot services.
  • Added a narrow workflow guard for stateful one-shot real apply: only container.state.exited_success drift is allowed after BACKUP_DONE_F3 + matching backup ref.
  • Updated apply command for one-shots to start the service, wait for the container exit code, and fail unless it exits 0.

Why it changed

The existing F3 workflow correctly blocks drift for stateful modules. That would also block the needed minio-init repair because the current live container is exited/1. This PR preserves the no-op rule for normal F3 stateful services while adding an explicit, test-covered exception for restart:no one-shot services.

Files touched

  • .forgejo/workflows/platformctl-auto-apply.yml
  • compose/core/compose.yaml
  • control-plane/platformctl/apply.py
  • control-plane/platformctl/ci/allow_one_shot_stateful_apply.py
  • control-plane/platformctl/ci/auto_apply_scope.py
  • control-plane/platformctl/{plan.py,health.py}
  • tests/smoke.sh
  • modules/minio-init/{module.yaml,runbook.md}
  • schema/module.schema.{json,md}
  • focused tests under control-plane/platformctl/tests/

Relevant context

  • #310: minio-init data gap / one-shot failure.
  • F3 rule: stateful/sidecar dispatch requires backup ref and normally no-op plan.
  • MinIO backup ref used for this sidecar lane: /opt/pdurlej-platform/backups/minio-20260517T010847Z.tar.gz.

Runtime evidence

  • Read-only RS2000 evidence before this PR: minio healthy/running; minio-init exited with code 1; command shape rendered as bare mc rather than the full script.
  • No production mutation was performed in this PR.
  • Compose render with --no-interpolate now shows minio-init command as -c plus the full script, with literal runtime env references.

Known constraints

  • This PR does not run minio-init on production.
  • The post-merge run is a real one-shot apply and still needs explicit operator dispatch/approval.
  • The one-shot guard does not allow image/config/service drift; it allows only container.state.exited_success drift.

Explicit out-of-scope

  • No MinIO data deletion, prune, bucket policy changes, public exposure changes, or credential value changes.
  • No broader relaxation of F3 stateful apply policy.

Requested decision

Review as Full/security-sensitive cutover workflow work. Merge only if the guarded one-shot exception is acceptable for #310.

Merge blockers

  • Any finding that the exception can apply to normal stateful services.
  • Any finding that the command could print or persist secret values beyond normal container env handling.
  • Any finding that minio-init could mutate buckets beyond create-or-verify.

Validation

  • control-plane/.venv/bin/python -m pytest control-plane/platformctl/tests/test_apply_phase3.py control-plane/platformctl/tests/test_apply_env_file.py control-plane/platformctl/tests/test_plan_phase3.py control-plane/platformctl/tests/test_health_phase3.py control-plane/platformctl/tests/test_forgejo_ci_scripts_contract.py -q → 125 passed
  • PYTHONPATH=control-plane control-plane/.venv/bin/python control-plane/platformctl/ci/lint_workflows.py → passed
  • control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/minio-init/module.yaml → passed
  • PLATFORMCTL_SMOKE_REMOTE_MODE=skip tests/smoke.sh --json minio-init → passed
  • git diff --check on touched files → passed
  • auto_apply_scope.py --module minio-init --allow-stateful --backup-ref /opt/pdurlej-platform/backups/minio-20260517T010847Z.tar.gz --fail-on-blocked → eligible
  • allow_one_shot_stateful_apply.py synthetic guard check → allowed only for container.state.exited_success

Post-merge operator dispatch

Trigger platformctl-auto-apply.yml manually with:

module=minio-init
allow_stateful=true
backup_ref=/opt/pdurlej-platform/backups/minio-20260517T010847Z.tar.gz
stateful_confirm=BACKUP_DONE_F3

Expected first run: plan drift limited to container.state.exited_success; guarded one-shot apply runs; health passes when home-platform-minio-init-1 exits 0 and minio remains healthy.

Spec sources read

  • #310 issue body and evidence summary
  • modules/minio-init/module.yaml
  • modules/minio-init/runbook.md
  • compose/core/compose.yaml
  • .forgejo/workflows/platformctl-auto-apply.yml
  • control-plane/platformctl/{plan.py,health.py,apply.py}
  • control-plane/platformctl/ci/auto_apply_scope.py
  • relevant focused tests listed above

Refs #310

Canary status: missing — required Forgejo checks/canary before merge ## Canary Context Pack ### Product story #310 blocks the remaining cutover-ops path because `minio-init` is a sidecar-bound one-shot that currently exits non-zero. We need a safe way to rerun it after the MinIO backup without weakening the general F3 rule that stateful smokes are no-op by default. ### What changed - Fixed `minio-init` compose command shape so `/bin/sh -c` receives the full idempotent bucket bootstrap script. - Marked `minio-init` strict-v2 complete and declared `spec.runtime.expected_state: exited-success`. - Added schema/docs for one-shot expected state and sidecar `backup_ref_modules`. - Updated platformctl plan/health/smoke handling so `exited 0` is healthy for declared one-shot services. - Added a narrow workflow guard for stateful one-shot real apply: only `container.state.exited_success` drift is allowed after `BACKUP_DONE_F3` + matching backup ref. - Updated apply command for one-shots to start the service, wait for the container exit code, and fail unless it exits `0`. ### Why it changed The existing F3 workflow correctly blocks drift for stateful modules. That would also block the needed `minio-init` repair because the current live container is `exited/1`. This PR preserves the no-op rule for normal F3 stateful services while adding an explicit, test-covered exception for restart:no one-shot services. ### Files touched - `.forgejo/workflows/platformctl-auto-apply.yml` - `compose/core/compose.yaml` - `control-plane/platformctl/apply.py` - `control-plane/platformctl/ci/allow_one_shot_stateful_apply.py` - `control-plane/platformctl/ci/auto_apply_scope.py` - `control-plane/platformctl/{plan.py,health.py}` - `tests/smoke.sh` - `modules/minio-init/{module.yaml,runbook.md}` - `schema/module.schema.{json,md}` - focused tests under `control-plane/platformctl/tests/` ### Relevant context - #310: minio-init data gap / one-shot failure. - F3 rule: stateful/sidecar dispatch requires backup ref and normally no-op plan. - MinIO backup ref used for this sidecar lane: `/opt/pdurlej-platform/backups/minio-20260517T010847Z.tar.gz`. ### Runtime evidence - Read-only RS2000 evidence before this PR: `minio` healthy/running; `minio-init` exited with code `1`; command shape rendered as bare `mc` rather than the full script. - No production mutation was performed in this PR. - Compose render with `--no-interpolate` now shows `minio-init` command as `-c` plus the full script, with literal runtime env references. ### Known constraints - This PR does not run `minio-init` on production. - The post-merge run is a real one-shot apply and still needs explicit operator dispatch/approval. - The one-shot guard does not allow image/config/service drift; it allows only `container.state.exited_success` drift. ### Explicit out-of-scope - No MinIO data deletion, prune, bucket policy changes, public exposure changes, or credential value changes. - No broader relaxation of F3 stateful apply policy. ### Requested decision Review as Full/security-sensitive cutover workflow work. Merge only if the guarded one-shot exception is acceptable for #310. ### Merge blockers - Any finding that the exception can apply to normal stateful services. - Any finding that the command could print or persist secret values beyond normal container env handling. - Any finding that `minio-init` could mutate buckets beyond create-or-verify. ## Validation - `control-plane/.venv/bin/python -m pytest control-plane/platformctl/tests/test_apply_phase3.py control-plane/platformctl/tests/test_apply_env_file.py control-plane/platformctl/tests/test_plan_phase3.py control-plane/platformctl/tests/test_health_phase3.py control-plane/platformctl/tests/test_forgejo_ci_scripts_contract.py -q` → 125 passed - `PYTHONPATH=control-plane control-plane/.venv/bin/python control-plane/platformctl/ci/lint_workflows.py` → passed - `control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/minio-init/module.yaml` → passed - `PLATFORMCTL_SMOKE_REMOTE_MODE=skip tests/smoke.sh --json minio-init` → passed - `git diff --check` on touched files → passed - `auto_apply_scope.py --module minio-init --allow-stateful --backup-ref /opt/pdurlej-platform/backups/minio-20260517T010847Z.tar.gz --fail-on-blocked` → eligible - `allow_one_shot_stateful_apply.py` synthetic guard check → allowed only for `container.state.exited_success` ## Post-merge operator dispatch Trigger `platformctl-auto-apply.yml` manually with: ```text module=minio-init allow_stateful=true backup_ref=/opt/pdurlej-platform/backups/minio-20260517T010847Z.tar.gz stateful_confirm=BACKUP_DONE_F3 ``` Expected first run: plan drift limited to `container.state.exited_success`; guarded one-shot apply runs; health passes when `home-platform-minio-init-1` exits `0` and `minio` remains healthy. ## Spec sources read - #310 issue body and evidence summary - `modules/minio-init/module.yaml` - `modules/minio-init/runbook.md` - `compose/core/compose.yaml` - `.forgejo/workflows/platformctl-auto-apply.yml` - `control-plane/platformctl/{plan.py,health.py,apply.py}` - `control-plane/platformctl/ci/auto_apply_scope.py` - relevant focused tests listed above Refs #310
fix(honcho): scrub private runtime log payloads
All checks were successful
base-is-main / guard (pull_request) Successful in 1s
canary-required / collect-diff (pull_request) Successful in 4s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 4s
platformctl plan / auto-apply scope (pull_request) Successful in 21s
python-ci / Python 3.11 (pull_request) Successful in 38s
python-ci / Python 3.12 (pull_request) Successful in 37s
python-ci / Python 3.13 (pull_request) Successful in 39s
canary-required / canary (pull_request) Successful in 16s
patchwarden-pr-sanity / sanity (pull_request) Successful in 21s
355438c717
fix(cutover): support minio-init one-shot apply
All checks were successful
canary-required / collect-diff (pull_request) Successful in 4s
infra-docs-drift / docs-drift (pull_request) Successful in 5s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 4s
platformctl plan / auto-apply scope (pull_request) Successful in 22s
pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 18s
python-ci / Python 3.11 (pull_request) Successful in 39s
python-ci / Python 3.12 (pull_request) Successful in 40s
python-ci / Python 3.13 (pull_request) Successful in 41s
workflow-lint / lint (pull_request) Successful in 4s
base-is-main / guard (pull_request) Successful in 1s
canary-required / canary (pull_request) Successful in 13s
patchwarden-pr-sanity / sanity (pull_request) Successful in 22s
8cb01651f3
Sign in to join this conversation.
No reviewers
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform!379
No description provided.