docs(codex-prep): Pan Herbatka's epic recovery plan for RS2000 cutover (sign-off needed) #250

Merged
pdurlej merged 1 commit from claude/orders/recovery-plan-cutover-2026-05-12-evening into main 2026-05-12 22:17:35 +02:00
Collaborator

Tier per ADR-0007: Lite — single new doc file, 477 LoC additive, no sacred paths, no schema/runtime, no code changes. Operator-merge.

Why now

Operator's mandate 2026-05-12 evening (after Codex's morning session stopped at cutover blocker):

"Codex pokazał, że tutaj klęknął, delegowanie teraz wszystkiego na niego jest dalej robieniem tego samego. Pan Herbatka dolewa coś mocniejszego do filiżanki."

This PR ships the mocniejsze. Pan Herbatka audited Codex's morning failure, web-researched the multi-compose + dotenv-quoting bug classes, and synthesized 5 architectural calls so the next session executes, not explores.

What ships

Single file: state/codex-prep/RECOVERY-PLAN-CUTOVER-2026-05-12-evening.md (477 lines)

Contents:

  • § 1 Why the morning failed (1-paragraph diagnosis)
  • § 2 Three root causes with repo-grep evidence
  • § 3 Pan Herbatka's 5 architectural calls (operator sign-off needed)
  • § 4 Six-phase execution plan
  • § 5 First 3 commands to run
  • § 6 Risk matrix
  • § 7 Operator decision points (checklist)
  • § 8 What next Codex session must read (in order, ~15 min)
  • § 9 References (web research + repo evidence)

The 5 calls (need operator sign-off in § 7)

  1. Use docker compose include: directive (not multi--f, not COMPOSE_FILE env)
  2. Read legacy /opt/vps-home-platform-infra/env/ files via --env-file (not reconstruct from docker inspect)
  3. Runner-local direct PAT tonight; Universal Auth Infisical migration as separate post-cutover epic
  4. First smoke = dashboard (stateless, low blast, already running)
  5. Cutover is transparent — legacy stack remains live 7+ days for rollback symmetry

Plus: Pan Herbatka recommends execution tomorrow morning, not tonight. 12-hour overnight session is the wrong context for cutover.

What this PR does NOT do

  • Does NOT execute any cutover action
  • Does NOT touch RS2000
  • Does NOT modify compose/*, apply.py, or any production-touching path
  • Does NOT auto-trigger next Codex session (operator dispatches when ready)
  • Does NOT close any cutover-gate issue (#238 stays open until Phase 4 smoke proves green)

Test plan

  • Operator reads § 3 (5 calls) — sign-off or override each
  • Operator decides § 7 'when to execute' (Pan Herbatka rec: tomorrow morning)
  • Operator merge → plan lives in repo for next Codex session
  • Codex's next session reads § 8 list → executes Phase 1+2+3 per plan
  • Operator approves Phase 4 smoke trigger explicitly

Spec sources read

  • state/codex-prep/rs2000-closeout-handover-2026-05-12.md (Codex's morning handover; cited extensively)
  • compose/apps/compose.yaml, compose/base/compose.yaml, compose/core/compose.yaml, compose/edge/compose.yaml (full structural read)
  • control-plane/platformctl/apply.py (lines 316-665; especially compose_apply_command + apply_plan)
  • docs/ci/runner-contract.md (deploy runner contract)
  • scripts/forgejo/deploy-runner-install-infisical-token-auth (to understand Codex's Infisical attempt)
  • compose/README.md + state/reports/rs2000-compose-inventory-2026-05-12/README.md
  • All 64 module.yaml runtime mappings (grep verified)
  • Web research (4 targeted searches): docker compose include directive, multi-file production, dotenv escape bugs, undefined network errors — see § 9 References

Operator's North Star check

Reduces attention cost:

  • Codex's next session has 5 decisions pre-made — execution path obvious
  • Operator's role narrowed to sign-off + Phase 4 trigger approval + final review
  • Rollback path is symmetric (legacy stack remains live → reverting one service is one SSH command)
  • Pan Herbatka explicit rec: tomorrow morning, not tonight — protects family-time North Star

Verdict: ship.

🍵Pan Herbatka, ostatnia linia doradztwa przed ⚛️, 2026-05-12 evening

Tier per ADR-0007: **Lite** — single new doc file, 477 LoC additive, no sacred paths, no schema/runtime, no code changes. Operator-merge. ## Why now Operator's mandate 2026-05-12 evening (after Codex's morning session stopped at cutover blocker): > *"Codex pokazał, że tutaj klęknął, delegowanie teraz wszystkiego na niego jest dalej robieniem tego samego. Pan Herbatka dolewa coś mocniejszego do filiżanki."* This PR ships the mocniejsze. Pan Herbatka audited Codex's morning failure, web-researched the multi-compose + dotenv-quoting bug classes, and synthesized 5 architectural calls so the next session executes, not explores. ## What ships Single file: `state/codex-prep/RECOVERY-PLAN-CUTOVER-2026-05-12-evening.md` (477 lines) Contents: - § 1 Why the morning failed (1-paragraph diagnosis) - § 2 Three root causes with repo-grep evidence - § 3 **Pan Herbatka's 5 architectural calls** (operator sign-off needed) - § 4 Six-phase execution plan - § 5 First 3 commands to run - § 6 Risk matrix - § 7 Operator decision points (checklist) - § 8 What next Codex session must read (in order, ~15 min) - § 9 References (web research + repo evidence) ## The 5 calls (need operator sign-off in § 7) 1. Use docker compose **`include:` directive** (not multi-`-f`, not COMPOSE_FILE env) 2. Read legacy `/opt/vps-home-platform-infra/env/` files via `--env-file` (not reconstruct from `docker inspect`) 3. Runner-local **direct PAT tonight**; Universal Auth Infisical migration as separate post-cutover epic 4. First smoke = **`dashboard`** (stateless, low blast, already running) 5. Cutover is **transparent** — legacy stack remains live 7+ days for rollback symmetry Plus: Pan Herbatka recommends **execution tomorrow morning**, not tonight. 12-hour overnight session is the wrong context for cutover. ## What this PR does NOT do - Does NOT execute any cutover action - Does NOT touch RS2000 - Does NOT modify `compose/*`, `apply.py`, or any production-touching path - Does NOT auto-trigger next Codex session (operator dispatches when ready) - Does NOT close any cutover-gate issue (#238 stays open until Phase 4 smoke proves green) ## Test plan - [ ] Operator reads § 3 (5 calls) — sign-off or override each - [ ] Operator decides § 7 'when to execute' (Pan Herbatka rec: tomorrow morning) - [ ] Operator merge → plan lives in repo for next Codex session - [ ] Codex's next session reads § 8 list → executes Phase 1+2+3 per plan - [ ] Operator approves Phase 4 smoke trigger explicitly ## Spec sources read - `state/codex-prep/rs2000-closeout-handover-2026-05-12.md` (Codex's morning handover; cited extensively) - `compose/apps/compose.yaml`, `compose/base/compose.yaml`, `compose/core/compose.yaml`, `compose/edge/compose.yaml` (full structural read) - `control-plane/platformctl/apply.py` (lines 316-665; especially `compose_apply_command` + `apply_plan`) - `docs/ci/runner-contract.md` (deploy runner contract) - `scripts/forgejo/deploy-runner-install-infisical-token-auth` (to understand Codex's Infisical attempt) - `compose/README.md` + `state/reports/rs2000-compose-inventory-2026-05-12/README.md` - All 64 module.yaml runtime mappings (grep verified) - Web research (4 targeted searches): docker compose include directive, multi-file production, dotenv escape bugs, undefined network errors — see § 9 References ## Operator's North Star check Reduces attention cost: - Codex's next session has 5 decisions pre-made — execution path obvious - Operator's role narrowed to sign-off + Phase 4 trigger approval + final review - Rollback path is **symmetric** (legacy stack remains live → reverting one service is one SSH command) - Pan Herbatka explicit rec: **tomorrow morning, not tonight** — protects family-time North Star Verdict: ship. 🍵 — *Pan Herbatka, ostatnia linia doradztwa przed ⚛️, 2026-05-12 evening*
docs(codex-prep): Pan Herbatka's epic recovery plan for RS2000 cutover
Some checks failed
base-is-main / guard (pull_request) Successful in 1s
canary-required / collect-diff (pull_request) Failing after 3s
canary-required / canary (pull_request) Has been skipped
3e6d745646
Sign in to join this conversation.
No reviewers
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform!250
No description provided.