ops(cutover): add F3 backup-before-apply draft #270

Merged
pdurlej merged 1 commit from codex/f3-prep/backup-before-apply-script into main 2026-05-13 23:31:49 +02:00
Collaborator

Canary status: missing - Lite/script PR; operator spot-check before merge, no runtime smoke in this PR.

Canary Context Pack

Product story

F3 stateful smoke needs a boring pre-backup ritual before any stateful apply. This PR adds the draft operator-run backup helper and the audit table that should shape the F3 design discussion.

What changed

  • Added scripts/cutover/backup-before-apply.sh, an operator-run script that writes timestamped backups under /opt/pdurlej-platform/backups.
  • Added scripts/cutover/README.md with backup class mappings, the 2026-05-13 stateful module audit, and first-F3 candidate guidance.

Why it changed

F1/F1.5/F2 no-op smokes validated the apply pipeline on stateless services. F3 is different: stateful modules require backup-before-apply discipline before any smoke or real apply.

Class mapping table

Class Modules / pattern Procedure
A postgres, honcho-postgres, agent-plane-shadow-postgres, *postgres pg_dumpall logical dump, gzipped
B redis, honcho-redis, infisical-redis, *redis Redis BGSAVE when possible, then mounted data archive
C vault Vault raft snapshot with VAULT_TOKEN; fallback mounted data archive
D minio mc mirror with MINIO_MC_ALIAS; fallback mounted data archive
E Default app/filesystem state Mounted data archive
F karakeep-meilisearch, searxng Engine-specific placeholder; mounted data archive until dump is proven
G agaria-postgres, agaria-redis Separate compose/root archive; Agaria-specific follow-up required

Runtime evidence

Read-only repo + Docker evidence only. No backup was run.

  • Repo audit found 45 modules with spec.runtime.statefulness: stateful.
  • docker system df reported 63 local volumes, 11.34GB total, 5.304GB reclaimable.
  • The README includes the grouped stateful audit and readiness notes.

Test plan

Local checks already run:

bash -n scripts/cutover/backup-before-apply.sh
wc -l scripts/cutover/backup-before-apply.sh scripts/cutover/README.md
git diff --cached --check

Operator validation before any F3 smoke:

  1. On RS2000, pick one low-blast Class E module, preferably uptime-kuma or searxng after confirming the live container/mounts.
  2. Run scripts/cutover/backup-before-apply.sh <module-id> as root.
  3. Confirm a new /opt/pdurlej-platform/backups/<module>-<ts>.* file exists with mode 0600.
  4. Confirm the service remains healthy and no apply/smoke has been triggered.
  5. Repeat separately for Class A/B/C/D only after explicit operator decision and class-specific restore notes exist.

Known constraints

  • This script is a draft for operator review. It is not called by Forgejo Actions.
  • Some stateful modules are non-canonical or sunset/parked and should be excluded from first F3.
  • Per-module backup size is not asserted yet; the README records total Docker volume footprint only.

Explicit out-of-scope

  • No F3 smoke.
  • No real apply.
  • No production restart.
  • No restore execution.
  • No Infisical/secret migration.

Requested decision

Operator spot-check + merge if the draft backup-before contract is acceptable for next-session F3 planning.

Merge blockers

  • Any path that would make the script run automatically.
  • Any missing privacy guard for backup files.
  • Any claim that F3 is ready without a separate operator-approved design issue.

Spec sources read

  • prompts/codex-f1.5-hardening-f2-2026-05-13.md - inherited F2/F3 context.
  • modules/*/module.yaml - stateful candidate audit.
  • modules/*/runbook.md - container-name fallback pattern for the helper script.
  • compose/apps/compose.yaml, compose/core/compose.yaml, compose/edge/compose.yaml, compose/base/compose.yaml - canonical compose context for backup planning.
  • docs/ci/runner-contract.md - deploy-runner boundary awareness; this script intentionally stays operator-run.

F3 design issue

Design gate: #271

Canary status: missing - Lite/script PR; operator spot-check before merge, no runtime smoke in this PR. ## Canary Context Pack ### Product story F3 stateful smoke needs a boring pre-backup ritual before any stateful apply. This PR adds the draft operator-run backup helper and the audit table that should shape the F3 design discussion. ### What changed - Added `scripts/cutover/backup-before-apply.sh`, an operator-run script that writes timestamped backups under `/opt/pdurlej-platform/backups`. - Added `scripts/cutover/README.md` with backup class mappings, the 2026-05-13 stateful module audit, and first-F3 candidate guidance. ### Why it changed F1/F1.5/F2 no-op smokes validated the apply pipeline on stateless services. F3 is different: stateful modules require backup-before-apply discipline before any smoke or real apply. ### Class mapping table | Class | Modules / pattern | Procedure | |---|---|---| | A | `postgres`, `honcho-postgres`, `agent-plane-shadow-postgres`, `*postgres` | `pg_dumpall` logical dump, gzipped | | B | `redis`, `honcho-redis`, `infisical-redis`, `*redis` | Redis `BGSAVE` when possible, then mounted data archive | | C | `vault` | Vault raft snapshot with `VAULT_TOKEN`; fallback mounted data archive | | D | `minio` | `mc mirror` with `MINIO_MC_ALIAS`; fallback mounted data archive | | E | Default app/filesystem state | Mounted data archive | | F | `karakeep-meilisearch`, `searxng` | Engine-specific placeholder; mounted data archive until dump is proven | | G | `agaria-postgres`, `agaria-redis` | Separate compose/root archive; Agaria-specific follow-up required | ### Runtime evidence Read-only repo + Docker evidence only. No backup was run. - Repo audit found 45 modules with `spec.runtime.statefulness: stateful`. - `docker system df` reported 63 local volumes, 11.34GB total, 5.304GB reclaimable. - The README includes the grouped stateful audit and readiness notes. ### Test plan Local checks already run: ```bash bash -n scripts/cutover/backup-before-apply.sh wc -l scripts/cutover/backup-before-apply.sh scripts/cutover/README.md git diff --cached --check ``` Operator validation before any F3 smoke: 1. On RS2000, pick one low-blast Class E module, preferably `uptime-kuma` or `searxng` after confirming the live container/mounts. 2. Run `scripts/cutover/backup-before-apply.sh <module-id>` as root. 3. Confirm a new `/opt/pdurlej-platform/backups/<module>-<ts>.*` file exists with mode `0600`. 4. Confirm the service remains healthy and no apply/smoke has been triggered. 5. Repeat separately for Class A/B/C/D only after explicit operator decision and class-specific restore notes exist. ### Known constraints - This script is a draft for operator review. It is not called by Forgejo Actions. - Some stateful modules are non-canonical or sunset/parked and should be excluded from first F3. - Per-module backup size is not asserted yet; the README records total Docker volume footprint only. ### Explicit out-of-scope - No F3 smoke. - No real apply. - No production restart. - No restore execution. - No Infisical/secret migration. ### Requested decision Operator spot-check + merge if the draft backup-before contract is acceptable for next-session F3 planning. ### Merge blockers - Any path that would make the script run automatically. - Any missing privacy guard for backup files. - Any claim that F3 is ready without a separate operator-approved design issue. ## Spec sources read - `prompts/codex-f1.5-hardening-f2-2026-05-13.md` - inherited F2/F3 context. - `modules/*/module.yaml` - stateful candidate audit. - `modules/*/runbook.md` - container-name fallback pattern for the helper script. - `compose/apps/compose.yaml`, `compose/core/compose.yaml`, `compose/edge/compose.yaml`, `compose/base/compose.yaml` - canonical compose context for backup planning. - `docs/ci/runner-contract.md` - deploy-runner boundary awareness; this script intentionally stays operator-run. ## F3 design issue Design gate: #271
ops(cutover): add F3 backup-before-apply draft
Some checks failed
canary-required / collect-diff (pull_request) Failing after 4s
canary-required / canary (pull_request) Has been skipped
base-is-main / guard (pull_request) Successful in 2s
ab83f142b2
Sign in to join this conversation.
No reviewers
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform!270
No description provided.