docs(w3d): reconcile vps1000 DR pass with remaining full sandbox gate #533

Closed
opened 2026-05-28 01:13:27 +02:00 by codex · 2 comments
Collaborator

Scope

Produce a narrow W3d reconciliation packet after the completed #522 vps1000 sandbox restore pass.

The goal is to make the remaining DR state obvious:

  • what #522 proves;
  • what #433 still requires, if anything;
  • whether the next DR step is a real full sandbox drill, a smaller confidence follow-up, or closure/defer with evidence.

Spec sources

  • state/roadmap/current-platform-roadmap.md § Operator wave map / Recommended execution order
  • runbooks/dr-restore-test.md
  • Forgejo issue #433
  • Forgejo issue #522

Extracted context

From roadmap:

W3 | M02 | DR and restore confidence. | ACCEPTED immediate gate. Preflight, restore smoke, and Honcho partial restore are green; W3d full sandbox DR is #433 before destructive cleanup/W8.

Execute #433 W3d before destructive cleanup or W8 broad upgrades.

Acceptance criteria

  • A short repo artifact or issue comment identifies exactly what #522 covered.
  • #433 is classified as: still required / scope-reduced / superseded / blocked.
  • No new server purchase is proposed. Use vps1000/rs2000; paid serverless-per-minute is only a last-resort note.
  • No runtime mutation.
  • uv run python -m platformctl.cli validate all --json passes if repo files changed.

Out of scope

  • Running a destructive restore.
  • Buying or provisioning a new persistent server.
  • Changing backup schedules.

Agent notes

Recommended executor: Gemini 3.5 Flash local.

Rules:

  • Keep the PR atomic.
  • Do not mutate production runtime unless the issue explicitly says so.
  • Do not create new planning docs if a short status/update file or issue comment is enough.
  • Use Polish for operator-facing summaries; English for code/PR body.
  • PR body must include Spec sources read.
## Scope Produce a narrow W3d reconciliation packet after the completed `#522` vps1000 sandbox restore pass. The goal is to make the remaining DR state obvious: - what #522 proves; - what #433 still requires, if anything; - whether the next DR step is a real full sandbox drill, a smaller confidence follow-up, or closure/defer with evidence. ## Spec sources - `state/roadmap/current-platform-roadmap.md` § Operator wave map / Recommended execution order - `runbooks/dr-restore-test.md` - Forgejo issue #433 - Forgejo issue #522 ## Extracted context From roadmap: > W3 | M02 | DR and restore confidence. | ACCEPTED immediate gate. Preflight, restore smoke, and Honcho partial restore are green; W3d full sandbox DR is #433 before destructive cleanup/W8. > Execute #433 W3d before destructive cleanup or W8 broad upgrades. ## Acceptance criteria - A short repo artifact or issue comment identifies exactly what #522 covered. - #433 is classified as: still required / scope-reduced / superseded / blocked. - No new server purchase is proposed. Use vps1000/rs2000; paid serverless-per-minute is only a last-resort note. - No runtime mutation. - `uv run python -m platformctl.cli validate all --json` passes if repo files changed. ## Out of scope - Running a destructive restore. - Buying or provisioning a new persistent server. - Changing backup schedules. ## Agent notes Recommended executor: Gemini 3.5 Flash local. Rules: - Keep the PR atomic. - Do not mutate production runtime unless the issue explicitly says so. - Do not create new planning docs if a short status/update file or issue comment is enough. - Use Polish for operator-facing summaries; English for code/PR body. - PR body must include `Spec sources read`.
Author
Collaborator

Codex post-M01 DR refresh note, 2026-05-29.

Result: GREEN for infrastructure restore confidence.

Evidence gathered after M01 cleanup:

  • Legacy backup target /opt/vps-home-platform-infra/backups no longer exists.
  • Canonical host-ops backup root exists under /opt/pdurlej-platform/runtime/host-ops.
  • Timers active: hp-backup-critical.timer, hp-backup-noncritical.timer, hp-restore-smoke.timer.
  • Fresh host-side restore smoke passed at 2026-05-29 08:39 CEST using canonical backup 20260529-060017-critical.
  • Post-M01 vps1000 disposable W3d sandbox passed using the same canonical backup root.
  • Restored topology: Forgejo SQL/data, Honcho SQL, Postgres, pgvector Postgres, Redis, Honcho Redis, Traefik fake ingress.
  • Routed smoke: Traefik ping 200, Forgejo via Traefik 200, Honcho via Traefik 200.
  • Health: Forgejo, Honcho API, Postgres, Honcho Postgres, Redis, Honcho Redis healthy; Traefik running.
  • RTO from backup-on-disk in sandbox to first accepted routed requests: 86s.
  • Sandbox cleanup verified: 0 leftover W3d containers, 0 leftover W3d volumes, staging directory removed.

Non-finding:

  • Local Mac W3d attempt did not run because local Docker daemon was unavailable. This is not a backup/restore failure; the vps1000 isolated sandbox pass succeeded.

Local evidence artifact generated: state/reports/w3d-post-m01-vps1000-sandbox-drill-2026-05-29.md.

Recommendation: #533 can be treated as answered. #433 should now be an operator decision: accept this post-M01 W3d pass as satisfying the destructive-cleanup/broad-upgrade DR gate, or split semantic OpenClaw/Iskra continuity into a separate W3e issue.

Codex post-M01 DR refresh note, 2026-05-29. Result: **GREEN for infrastructure restore confidence**. Evidence gathered after M01 cleanup: - Legacy backup target `/opt/vps-home-platform-infra/backups` no longer exists. - Canonical host-ops backup root exists under `/opt/pdurlej-platform/runtime/host-ops`. - Timers active: `hp-backup-critical.timer`, `hp-backup-noncritical.timer`, `hp-restore-smoke.timer`. - Fresh host-side restore smoke passed at 2026-05-29 08:39 CEST using canonical backup `20260529-060017-critical`. - Post-M01 `vps1000` disposable W3d sandbox passed using the same canonical backup root. - Restored topology: Forgejo SQL/data, Honcho SQL, Postgres, pgvector Postgres, Redis, Honcho Redis, Traefik fake ingress. - Routed smoke: Traefik ping `200`, Forgejo via Traefik `200`, Honcho via Traefik `200`. - Health: Forgejo, Honcho API, Postgres, Honcho Postgres, Redis, Honcho Redis healthy; Traefik running. - RTO from backup-on-disk in sandbox to first accepted routed requests: `86s`. - Sandbox cleanup verified: `0` leftover W3d containers, `0` leftover W3d volumes, staging directory removed. Non-finding: - Local Mac W3d attempt did not run because local Docker daemon was unavailable. This is not a backup/restore failure; the `vps1000` isolated sandbox pass succeeded. Local evidence artifact generated: `state/reports/w3d-post-m01-vps1000-sandbox-drill-2026-05-29.md`. Recommendation: #533 can be treated as answered. #433 should now be an operator decision: accept this post-M01 W3d pass as satisfying the destructive-cleanup/broad-upgrade DR gate, or split semantic OpenClaw/Iskra continuity into a separate W3e issue.
Author
Collaborator

Reconciliation outcome after operator decision:

  • #433 infrastructure W3d gate is accepted/closed based on post-M01 restore smoke + vps1000 disposable sandbox evidence.
  • Semantic Iskra/OpenClaw continuity is not considered covered by infrastructure DR and is split into #602 (W3e).

#533 can close as answered.

Reconciliation outcome after operator decision: - #433 infrastructure W3d gate is accepted/closed based on post-M01 restore smoke + `vps1000` disposable sandbox evidence. - Semantic Iskra/OpenClaw continuity is not considered covered by infrastructure DR and is split into #602 (W3e). #533 can close as answered.
codex closed this issue 2026-05-29 15:22:15 +02:00
Sign in to join this conversation.
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform#533
No description provided.