fix(platformctl): preflight compose apply with env files #252

Merged
pdurlej merged 1 commit from codex/cutover/apply-env-file-support into main 2026-05-13 07:00:04 +02:00
Collaborator

Canary status: missing — fire canary 3+3 manually before merge

Canary Context Pack

Product story

RS2000 auto-apply must fail closed before any production mutation when canonical compose cannot parse. Piotr's safe PM flow should be: merge PR, runner validates compose with the same env-file and checkout root, then applies only if that preflight passes.

What changed

  • Added platformctl apply --env-file <path> with repeatable env-file support.
  • Added PLATFORMCTL_COMPOSE_ENV_FILE support for runner-local configuration, comma-separated for multiple files.
  • Added docker compose config --quiet preflight before docker compose up -d.
  • Persisted preflight evidence in apply status artifacts.
  • Added focused tests for env-file command construction, CLI parsing, env-var resolution, and fail-closed preflight behavior.

Why it changed

Phase 2 evidence on #142 found the canonical legacy env source at /opt/vps-home-platform-infra/env/stack.env and confirmed the current canonical apply path needs explicit runner-local env-file support. The preflight prevents a repeat of the morning failure mode where compose interpolation fails only after the apply path starts.

Files touched

  • control-plane/platformctl/apply.py
  • control-plane/platformctl/cli.py
  • control-plane/platformctl/tests/test_apply_phase3.py
  • control-plane/platformctl/tests/test_apply_env_file.py

Relevant context

  • PR #250 recovery plan
  • PR #251 compose include fix
  • Issue #142 comment 4913 Phase 2 evidence
  • docs/ci/runner-contract.md deploy runner section

Runtime evidence

No runtime mutation was performed.

Phase 2 read-only evidence:

canonical legacy env candidate: /opt/vps-home-platform-infra/env/stack.env
merged legacy env exists: /opt/vps-home-platform-infra/state/stack.merged.env
legacy wrapper: /opt/vps-home-platform-infra/scripts/compose.sh

Tests:

uv run --project control-plane --with pytest python -m pytest \
  control-plane/platformctl/tests/test_plan_phase3.py \
  control-plane/platformctl/tests/test_apply_phase3.py \
  control-plane/platformctl/tests/test_apply_env_file.py \
  control-plane/platformctl/tests/test_forgejo_ci_scripts_contract.py \
  control-plane/platformctl/tests/test_health_phase3.py \
  control-plane/platformctl/tests/test_smoke.py \
  -q
109 passed in 3.26s

Known constraints

This PR only adds env-file/preflight mechanics. The runner-local /opt/pdurlej-platform/runtime/compose.env file and PLATFORMCTL_COMPOSE_ENV_FILE runtime configuration are verified in the next checkpoint after merge.

Phase 2 also found that legacy compose.sh exports state/stack.merged.env into process env when Infisical export is enabled because Docker Compose can mis-handle $ in secret values read directly from --env-file. This PR keeps to the recovery-plan minimum and fails closed if the configured env-file is insufficient.

Explicit out-of-scope

  • No Infisical Token Auth retry.
  • No direct PAT removal.
  • No workflow_dispatch.
  • No production apply.
  • No stateful service changes.
  • No generated dotenv from docker inspect.

Requested decision

Review and operator-merge if the env-file and preflight boundary are acceptable.

Merge blockers

  • Any path where up -d can run after config --quiet fails.
  • Any secret value exposure in logs, PR body, status artifact, or command construction.
  • Any requirement to put deploy secrets in Forgejo repo secrets.

Spec sources read

  • state/codex-prep/RECOVERY-PLAN-CUTOVER-2026-05-12-evening.md — execution contract
  • state/codex-prep/rs2000-closeout-handover-2026-05-12.md — runtime baseline and failed approaches
  • compose/apps/compose.yaml — canonical compose target after PR #251
  • compose/base/compose.yaml — included shared primitives
  • compose/core/compose.yaml — included core service dependencies
  • control-plane/platformctl/apply.py — target apply implementation
  • control-plane/platformctl/cli.py — CLI flag wiring
  • docs/ci/runner-contract.md — deploy runner constraints
  • docs/forgejo-agent-operations.md — Forgejo API identity rules

Refs #142

Canary status: missing — fire canary 3+3 manually before merge ## Canary Context Pack ### Product story RS2000 auto-apply must fail closed before any production mutation when canonical compose cannot parse. Piotr's safe PM flow should be: merge PR, runner validates compose with the same env-file and checkout root, then applies only if that preflight passes. ### What changed - Added `platformctl apply --env-file <path>` with repeatable env-file support. - Added `PLATFORMCTL_COMPOSE_ENV_FILE` support for runner-local configuration, comma-separated for multiple files. - Added `docker compose config --quiet` preflight before `docker compose up -d`. - Persisted preflight evidence in apply status artifacts. - Added focused tests for env-file command construction, CLI parsing, env-var resolution, and fail-closed preflight behavior. ### Why it changed Phase 2 evidence on #142 found the canonical legacy env source at `/opt/vps-home-platform-infra/env/stack.env` and confirmed the current canonical apply path needs explicit runner-local env-file support. The preflight prevents a repeat of the morning failure mode where compose interpolation fails only after the apply path starts. ### Files touched - `control-plane/platformctl/apply.py` - `control-plane/platformctl/cli.py` - `control-plane/platformctl/tests/test_apply_phase3.py` - `control-plane/platformctl/tests/test_apply_env_file.py` ### Relevant context - PR #250 recovery plan - PR #251 compose include fix - Issue #142 comment 4913 Phase 2 evidence - `docs/ci/runner-contract.md` deploy runner section ### Runtime evidence No runtime mutation was performed. Phase 2 read-only evidence: ```text canonical legacy env candidate: /opt/vps-home-platform-infra/env/stack.env merged legacy env exists: /opt/vps-home-platform-infra/state/stack.merged.env legacy wrapper: /opt/vps-home-platform-infra/scripts/compose.sh ``` Tests: ```text uv run --project control-plane --with pytest python -m pytest \ control-plane/platformctl/tests/test_plan_phase3.py \ control-plane/platformctl/tests/test_apply_phase3.py \ control-plane/platformctl/tests/test_apply_env_file.py \ control-plane/platformctl/tests/test_forgejo_ci_scripts_contract.py \ control-plane/platformctl/tests/test_health_phase3.py \ control-plane/platformctl/tests/test_smoke.py \ -q 109 passed in 3.26s ``` ### Known constraints This PR only adds env-file/preflight mechanics. The runner-local `/opt/pdurlej-platform/runtime/compose.env` file and `PLATFORMCTL_COMPOSE_ENV_FILE` runtime configuration are verified in the next checkpoint after merge. Phase 2 also found that legacy `compose.sh` exports `state/stack.merged.env` into process env when Infisical export is enabled because Docker Compose can mis-handle `$` in secret values read directly from `--env-file`. This PR keeps to the recovery-plan minimum and fails closed if the configured env-file is insufficient. ### Explicit out-of-scope - No Infisical Token Auth retry. - No direct PAT removal. - No workflow_dispatch. - No production apply. - No stateful service changes. - No generated dotenv from `docker inspect`. ### Requested decision Review and operator-merge if the env-file and preflight boundary are acceptable. ### Merge blockers - Any path where `up -d` can run after `config --quiet` fails. - Any secret value exposure in logs, PR body, status artifact, or command construction. - Any requirement to put deploy secrets in Forgejo repo secrets. ## Spec sources read - `state/codex-prep/RECOVERY-PLAN-CUTOVER-2026-05-12-evening.md` — execution contract - `state/codex-prep/rs2000-closeout-handover-2026-05-12.md` — runtime baseline and failed approaches - `compose/apps/compose.yaml` — canonical compose target after PR #251 - `compose/base/compose.yaml` — included shared primitives - `compose/core/compose.yaml` — included core service dependencies - `control-plane/platformctl/apply.py` — target apply implementation - `control-plane/platformctl/cli.py` — CLI flag wiring - `docs/ci/runner-contract.md` — deploy runner constraints - `docs/forgejo-agent-operations.md` — Forgejo API identity rules Refs #142
fix(platformctl): preflight compose apply with env files
All checks were successful
canary-required / collect-diff (pull_request) Successful in 4s
platformctl plan / auto-apply scope (pull_request) Successful in 19s
pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 16s
python-ci / Python 3.11 (pull_request) Successful in 33s
python-ci / Python 3.12 (pull_request) Successful in 35s
python-ci / Python 3.13 (pull_request) Successful in 34s
canary-required / canary (pull_request) Successful in 12s
base-is-main / guard (pull_request) Successful in 1s
76f89c9255
Sign in to join this conversation.
No reviewers
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform!252
No description provided.