meta(roadmap): execute post-soak legacy cleanup flight #387
Labels
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
pdurlej/platform#387
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Scope
Execute ADR-0020 and
state/cutover/rs2000-post-soak-legacy-cleanup.mdafter the operator opens the cleanup flight.Spec sources
decisions/0020-post-soak-legacy-cleanup-and-platform-modularization.mdstate/cutover/rs2000-post-soak-legacy-cleanup.mdstate/cutover/rs2000-control-plane-cutoff.mdstate/roadmap/current-platform-roadmap.mdAcceptance criteria
state/STATUS_NOW.mdafter the cleanup flight closes.Out of scope
Codex Wave 1 M01 checkpoint — stale blockers reconciled
Role: executor
Status: M01 cleanup carrier remains open
Wave 1 started after Wave 0 PR backlog cleanup. I reconciled the stale/resolved M01 blockers:
minio-init: resolved; read-only runtime evidence showshome-platform-minio-init-1exited0, MinIO is healthy, and expected buckets were bootstrapped.Next action for this issue is still Phase 0 evidence refresh from
state/cutover/rs2000-post-soak-legacy-cleanup.md: inventory legacy path sizes, bind mounts, tool cruft age, smoke/health state, and rollback references. No destructive cleanup should run without a fresh operator gate.Codex Wave 1 / #387 Phase 0 inventory refresh — 2026-05-24 08:26 CEST
Role: executor
Mode: read-only inventory; no delete, move, prune, restart, or compose change
Platform health snapshot
/opt/pdurlej-platform/current -> releases/c6928decfb42e12756ca4cd638fa66c704ba949865running,0unhealthyforgejo-deploy-runner.service: activeforgejo-deploy-runner-watchdog.timer: active2026-05-19 00:00:02026-05-19 00:00:0platformctl-auto-apply.ymlsince2026-05-19:19runs, pickup min0s, max2s, avg0.84s, stuck waiting0platform-smoke.yml: UI#1650, DB id2116, status1, created2026-05-24 07:17:02+02, stopped07:17:08+02Legacy tree size inventory
/opt/vps-home-platform-infratotal:195G.Largest classes:
backups194Gstate1.4Gdata63Mconfig532Kenv356Kproducts/openclaw-mail-infra/configproductstotal824K.git,compose,scripts,docs,runbooks,contracts,tests,systemd,.github,templates.venv-zeroclaw-tools,_bmad-output,output,patches,policy.venvis71M; others smallCurrent legacy bind mounts
2833data=22,config=16,env=3,products=2home-platform-deploy-control-1mounts the whole legacy root:/opt/vps-home-platform-infra -> /repoThat root mount means Phase 1 cannot just move
data/config; first cleanup PR should remove or replace thedeploy-controlroot dependency, otherwise zero-legacy-bind-mount verification will never pass.Recommended next action
Start #387 Phase 1 as a non-destructive compose/runtime-root migration design:
deploy-controlroot mount with the minimal needed mount(s), ideally current control-plane root plus explicit runtime state path.data/config/env/productsas-is only after operator gate and smoke plan.No raw secrets or private content were inspected or recorded; this is path/size/status metadata only.
Role: executor
Intent: checkpoint
Needs owner: no
Phase 1 prep PRs opened:
docs(roadmap): record wave0 triage and wave map— records W0 triage output, ADR status normalization, and W↔M wave map.fix(cutover): remove deploy-control legacy root mount— removes the whole legacy-root/repobind from deploy-control and removes the missing repo-local build context. Merge is not production mutation; runtime apply remains F3/operator-gated with fresh backup ref.Runtime note from read-only audit:
deploy-controlsource exists only in legacy/opt/vps-home-platform-infra/scripts/deploy-control; current release-root lacks it, while live imagehome-platform-deploy-control:1.0.0exists. The code path is safe for this narrow change becauseSIGNAL_DEPLOY_ENABLED=falsereturns before touching/repo.Next: operator reviews/merges #407 then #408; after #408 merge, run backup-before for
deploy-control, then dispatch the stateful F3 apply/smoke withBACKUP_DONE_F3and the backup path.Role: executor
Intent: checkpoint
Needs owner: yes
#408 merge was not enough to change live
deploy-control: manual F3 run #2156 succeeded but producedapply.status=noop, becauseplatformctl plancurrently sees image/status in-sync and does not detect compose bind-mount drift. Runtime evidence still shows/opt/vps-home-platform-infra => /repo.Opened #409 to keep the fix inside the trusted workflow instead of doing ad-hoc SSH mutation. It adds explicit
force_compose_upfor backup-gated manual F3 dispatches only.Next: merge #409, then rerun
platformctl-auto-apply.ymlfordeploy-controlwith the existing backup ref/opt/pdurlej-platform/backups/deploy-control-20260524T070134Z.tar.gz,allow_stateful=true,stateful_confirm=BACKUP_DONE_F3, andforce_compose_up=true.Role: executor
Intent: checkpoint
Needs owner: yes
Codex checkpoint — deploy-control legacy root mount apply blocked at host-agent allowlist
Status: stopped before deploy-control mutation; PR #411 is ready for operator merge.
Evidence:
/opt/pdurlej-platform/backups/deploy-control-20260524T070134Z.tar.gz, size932080592, sha2562e3bb0f9f2ca8e971a7f07b976ce9a9b31d7e660e0d1f694e1065bc93dfd2dc1.force_compose_up=true, failed in preflight because canonical compose requiredTHINGS_USERNAME; no mutation./opt/pdurlej-platform/runtime/things-compose.env, mode0640 root:platform-host-agent; values not printed; deploy-runner env list updated and runner restarted whilerunner_id=5had 0 active tasks.platform-host-agent-wrapperdenied/opt/pdurlej-platform/runtime/things-compose.env; no mutation.home-platform-deploy-control-1remains healthy and still has legacy/repomount; target apply still pending.PR ready:
fix(host-agent): allow Things compose env filepython3 -m pytest tests/test_platform_host_agent_wrapper.py→13 passedNext after #411 merge:
/usr/local/sbin/platform-host-agent-wrapperfrom trusted main/release-root;platform-host-agentpreflight accepts the full env-file list;platformctl-auto-apply.ymlfordeploy-controlwith the same backup ref andforce_compose_up=true;/opt/vps-home-platform-infra => /repois gone fromhome-platform-deploy-control-1.Next: operator merge #411, then Codex continues F3 deploy-control apply.
Role: executor
Intent: checkpoint
Needs owner: no
Codex checkpoint — deploy-control legacy
/repomount removedStatus: GREEN.
Evidence:
/opt/pdurlej-platform/current -> releases/5982e2eca04f3dfd3de99072ce4aebb5035dbf8f./usr/local/sbin/platform-host-agent-wrappernow allows/opt/pdurlej-platform/runtime/things-compose.env.platform-host-agent: full canonical env-file list +docker compose config --quietreturned0./opt/pdurlej-platform/backups/deploy-control-20260524T070134Z.tar.gz, sha2562e3bb0f9f2ca8e971a7f07b976ce9a9b31d7e660e0d1f694e1065bc93dfd2dc1.deploy-control.apply.stdout.json:force_compose_up=true,preflight.returncode=0,remote.returncode=0,status=applied.deploy-control.health.json:status=OK,exitCode=0.home-platform-deploy-control-1running + healthy, started2026-05-24T07:49:23Z.DEPLOY_REPO_ROOT=/repo-disabled,SIGNAL_DEPLOY_ENABLED=false./opt/vps-home-platform-infra => /repo; remaining mounts are/state,/env, and Docker socket.forgejo-deploy-runner.serviceactive; watchdog timer active; recent watchdog logs showno stuck trusted-main auto-apply jobs.Notes:
Next: continue Wave 1 closeout / legacy cleanup planning; deploy-control no longer keeps the whole legacy tree mounted as active control-plane state.
Codex LD0 legacy bind-mount inventory — 2026-05-24 11:15 CEST
Status: read-only complete
PR: #415
Evidence summary
654332env=3,product-runtime=2,service-config=16,service-data=22fdc36093a953ecb446d8ff97fcc61b7d37b6ad47, unhealthy containers0, deploy runner/watchdog activeKey finding
deploy-controlis not an isolated first remount. It shares legacyenvanddata/integrationsroots with other live services, so LD1 should start with shared bundle work rather than a single-service move.Recommended next order
envnormalization.config/integrations+data/integrations.products/openclaw-mail-infra/config.No production mutation, no secrets/private content, no delete/copy/remount/restart was performed.
Codex LD1a — env-root normalization prep complete — 2026-05-24 12:11 CEST
Status: complete / merged to
mainDeliverable
feat(cutover): make env bind mounts runtime-rooted0f33636664d6f1d3cff5aee320cae2acd3625510/opt/pdurlej-platform/current -> releases/0f33636664d6f1d3cff5aee320cae2acd3625510What changed
${PROJECT_ROOT}/envbind mounts now use${PLATFORM_RUNTIME_ENV_DIR:-/opt/vps-home-platform-infra/env}.state/reports/rs2000-ld1a-env-inventory-2026-05-24/.Verification
/tmp:docker compose config --quietexit 0.forgejo-deploy-runner.service: active.forgejo-deploy-runner-watchdog.timer: active.no stuck trusted-main auto-apply jobs.Next
LD1b can perform the operator-gated runtime env copy/remount plan: copy legacy env to
/opt/pdurlej-platform/runtime/legacy-import/env, setPLATFORM_RUNTIME_ENV_DIR, run compose config, and smoke affected services one by one.Codex LD1 — env + integrations runtime roots complete — 2026-05-24 12:51 CEST
Status: green / complete
Repo changes
PLATFORM_RUNTIME_ENV_DIR.STATUS_NOW.mdupdated with LD1 completion evidence.Runtime changes
/opt/vps-home-platform-infra/envto/opt/pdurlej-platform/runtime/legacy-import/env./opt/vps-home-platform-infra/config/integrationsto/opt/pdurlej-platform/runtime/legacy-import/config/integrations./opt/vps-home-platform-infra/data/integrationsto/opt/pdurlej-platform/runtime/legacy-import/data/integrations./opt/pdurlej-platform/runtime/compose.envwithPLATFORM_RUNTIME_ENV_DIR,INFISICAL_BOOTSTRAP_ENV_FILE,PLATFORM_RUNTIME_INTEGRATIONS_CONFIG_DIR, andPLATFORM_RUNTIME_INTEGRATIONS_DATA_DIR.Evidence
/opt/pdurlej-platform/current -> releases/e458511253c9a047cd7c5226fe84f42aba673de2.gmail-triage-mcp,gmail-private-mcp,storage-ro-mcp,git-mirror,deploy-control,safe-session-api.tests/smoke.shpassed for all six modules.0.forgejo-deploy-runner.service: active.forgejo-deploy-runner-watchdog.timer: active.no stuck trusted-main auto-apply jobs.Notes
storage-ro-mcpunhealthy state exposed the missing config/data root coupling; it was fixed by #418 and the final remount is healthy.Next
Continue Milestone 01 with the next LD0 bind-mount batch: product runtime or lower-risk config, not destructive cleanup.
LD1 addendum — 2026-05-24 12:56 CEST: status-only #421 was also promoted so
STATUS_NOW.mdin the release root matches main. Current release root is now/opt/pdurlej-platform/current -> releases/eccd17cbd430a3be2d6f27009a658dd1e163417c; runtime service evidence from the LD1 checkpoint remains unchanged and green.