Phase 8 monthly restore-test stale 26+ days #45

Closed
opened 2026-05-03 23:30:22 +02:00 by claude · 2 comments
Collaborator

Migrated from state/L3/OPEN_LOOPS.md per ADR 0001 + AGENTS.md (Forgejo Issues as memory layer for follow-ups).

Source

state/L3/OPEN_LOOPS.md Backup/DR validation gate section + STATE_OF_PLATFORM v2 §5 risk/runtime HIGH.

What

Phase 8 monthly restore-test (originally af8855ae Feb 18 plan) — required before 'production declare'. hp-restore-smoke.timer last ran 2026-04-07, 26+ days stale.

Reclassification (per operator amendment 14, 2026-05-03)

NOT immediate Phase 02 blocker. Critical pre-cutover gate. Blocks production declare / final RS2000 replacement. Local restore rehearsal may happen earlier to reduce cutover risk.

Default action (from STATE_OF_PLATFORM v2 Owner Action Board)

Resume monthly cadence; first run scheduled within 7 days. Honcho-postgres-specific quarterly drill on top.

Owner

piotr — schedule the first run; verify timer re-armed.

Acceptance criteria

  • First restore-test run completes successfully within 7 days
  • hp-restore-smoke.timer shows recent run (systemctl --user status hp-restore-smoke.timer)
  • Honcho-postgres-specific quarterly drill scheduled in calendar
  • Issue closed when both monthly cadence + quarterly Honcho drill verified active
Migrated from `state/L3/OPEN_LOOPS.md` per ADR 0001 + AGENTS.md (Forgejo Issues as memory layer for follow-ups). ## Source `state/L3/OPEN_LOOPS.md` Backup/DR validation gate section + STATE_OF_PLATFORM v2 §5 risk/runtime HIGH. ## What Phase 8 monthly restore-test (originally af8855ae Feb 18 plan) — required before 'production declare'. `hp-restore-smoke.timer` last ran 2026-04-07, **26+ days stale**. ## Reclassification (per operator amendment 14, 2026-05-03) NOT immediate Phase 02 blocker. Critical pre-cutover gate. Blocks production declare / final RS2000 replacement. Local restore rehearsal may happen earlier to reduce cutover risk. ## Default action (from STATE_OF_PLATFORM v2 Owner Action Board) Resume monthly cadence; first run scheduled within 7 days. Honcho-postgres-specific quarterly drill on top. ## Owner piotr — schedule the first run; verify timer re-armed. ## Acceptance criteria - [ ] First restore-test run completes successfully within 7 days - [ ] `hp-restore-smoke.timer` shows recent run (`systemctl --user status hp-restore-smoke.timer`) - [ ] Honcho-postgres-specific quarterly drill scheduled in calendar - [ ] Issue closed when both monthly cadence + quarterly Honcho drill verified active
Collaborator

Codex W3b restore smoke — 2026-05-24 16:28 CEST

Role: executor
Status: green, metadata-only evidence PR opened: #431

Evidence:

  • Operator gate received: w3-restore-smoke-approved.
  • hp-restore-smoke.service started manually and exited 0/SUCCESS.
  • Duration: 6s.
  • Backup used: /opt/vps-home-platform-infra/backups/20260524-120007-critical.
  • Restore image: postgres:16.12-alpine.
  • Unhealthy containers after run: 0.
  • Disposable restore-test-postgres container count after run: 0.
  • Timer remains active; next scheduled run: 2026-06-02 03:45 CEST.

Interpretation: the monthly Forgejo restore-smoke path is no longer stale. I am not closing this issue from the API because the original acceptance also mentions Honcho quarterly scheduling; #238 remains the broader W3 carrier.

## Codex W3b restore smoke — 2026-05-24 16:28 CEST **Role:** executor **Status:** green, metadata-only evidence PR opened: #431 Evidence: - Operator gate received: `w3-restore-smoke-approved`. - `hp-restore-smoke.service` started manually and exited `0/SUCCESS`. - Duration: 6s. - Backup used: `/opt/vps-home-platform-infra/backups/20260524-120007-critical`. - Restore image: `postgres:16.12-alpine`. - Unhealthy containers after run: `0`. - Disposable `restore-test-postgres` container count after run: `0`. - Timer remains active; next scheduled run: 2026-06-02 03:45 CEST. Interpretation: the monthly Forgejo restore-smoke path is no longer stale. I am not closing this issue from the API because the original acceptance also mentions Honcho quarterly scheduling; #238 remains the broader W3 carrier.
Collaborator

W3a/b/c restore-confidence evidence is now current and accepted as the immediate gate per PR #432 comment #9292.

Evidence package:

  • #430: W3 preflight reconciled current backup/restore timer state.
  • #431: W3b reran hp-restore-smoke.service; exit 0/SUCCESS, disposable Forgejo PostgreSQL restore cleaned up, zero unhealthy containers.
  • #432: W3c restored Honcho pgvector backup into an isolated disposable target; metadata-only schema/count/dimension checks passed, zero unhealthy containers.

The stale monthly restore-test risk described here is no longer accurate: the restore smoke was rerun successfully on 2026-05-24, and the timer remains active for the next scheduled run.

The deeper full sandbox DR drill is still tracked separately in #433 and is required before irreversible Class A/B/D legacy cleanup or broad W8 module upgrade waves. Closing this stale cutover-gate issue now.

W3a/b/c restore-confidence evidence is now current and accepted as the immediate gate per PR #432 comment #9292. Evidence package: - #430: W3 preflight reconciled current backup/restore timer state. - #431: W3b reran `hp-restore-smoke.service`; exit `0/SUCCESS`, disposable Forgejo PostgreSQL restore cleaned up, zero unhealthy containers. - #432: W3c restored Honcho pgvector backup into an isolated disposable target; metadata-only schema/count/dimension checks passed, zero unhealthy containers. The stale monthly restore-test risk described here is no longer accurate: the restore smoke was rerun successfully on 2026-05-24, and the timer remains active for the next scheduled run. The deeper full sandbox DR drill is still tracked separately in #433 and is required before irreversible Class A/B/D legacy cleanup or broad W8 module upgrade waves. Closing this stale cutover-gate issue now.
codex closed this issue 2026-05-24 18:18:47 +02:00
Sign in to join this conversation.
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform#45
No description provided.