docs(vault): add safe-session local CA cutover runbook #614

Merged
pdurlej merged 1 commit from codex/m04-safe-session-cutover-runbook into main 2026-05-29 22:08:35 +02:00
Collaborator

Canary status: pending — Forgejo Actions must run before merge.

Canary Context Pack

Product story

M04 Vault sunset needs a reversible, evidence-first cutover from Vault SSH signing to the local safe-session OpenSSH CA signer, while keeping Infisical as the only canonical secret source.

What changed

  • Added an explicit compose overlay for safe-session-api local signer mode.
  • Added a read-only preflight script for the rendered CA key and safe-session container prerequisites.
  • Added a cutover runbook with preconditions, smoke evidence, rollback, and destructive-cleanup boundary.

Why it changed

Vault is still running on RS2000, but observed inventory shows no KV secrets and only the safe-session SSH signing path remains. This PR prepares the runtime cutover without performing it.

Files touched

  • compose/overlays/safe-session-local-ca.yaml
  • scripts/cutover/safe-session-local-ca-preflight.sh
  • runbooks/safe-session-local-ca-cutover.md

Relevant context

  • decisions/0024-infisical-primary-secrets-pipeline.md
  • migrations/vault-to-infisical.md
  • docs/specs/vault-to-infisical-migration-v0/02-plan-and-tasks.md
  • PR #612
  • PR #613

Runtime evidence

No runtime mutation was performed. Validation run locally:

bash -n scripts/cutover/safe-session-local-ca-preflight.sh: pass
platformctl validate all --json: pass, exitCode=0, 88 modules

Known constraints

  • The overlay is opt-in only and requires PLATFORM_RUNTIME_SAFE_SESSION_CA_KEY_FILE.
  • The CA key remains outside git and must be rendered from Infisical on RS2000.
  • Runtime cutover still requires the operator phrase m04-vault-runtime-cutover-approved.
  • Destructive Vault cleanup still requires m04-vault-destructive-cleanup-approved.

Explicit out-of-scope

  • No Infisical writes.
  • No safe-session restart.
  • No Vault stop/delete.
  • No issue/comment mutation.
  • No change to allowed principals beyond the observed llmops parity.

Requested decision

Approve this as the nondestructive cutover runbook/guardrail PR.

Merge blockers

  • Red Forgejo checks.
  • Evidence that the overlay could create an accidental writable secret mount.
  • Evidence that the preflight leaks key material.
  • Evidence that the runbook authorizes Vault shutdown/deletion without the explicit gate.

DeepSeek V4 Pro redteam

DeepSeek V4 Pro reviewed the overlay, preflight, and runbook and returned APPROVE. The review called out no blockers; the main caveat was that modes 0500/0700 are owner-only but executable, which is acceptable for the host-key preflight and can be tightened later if desired.

Validation

  • bash -n scripts/cutover/safe-session-local-ca-preflight.sh — pass
  • UV_CACHE_DIR=/private/tmp/codex-uv-cache PYTHONPATH=control-plane uv run --project control-plane python -m platformctl.cli validate all --json — pass (exitCode=0, 88 modules)

Spec sources read

  • AGENTS.md — Forgejo identity and PR contract
  • docs/forgejo-agent-operations.md — Forgejo write path and identity rules
  • decisions/0024-infisical-primary-secrets-pipeline.md — M04 boundary
  • migrations/vault-to-infisical.md — migration plan
  • docs/specs/vault-to-infisical-migration-v0/02-plan-and-tasks.md — task sequencing
  • scripts/safe-session-api/local_signer.py — signer contract from PR #613
  • scripts/safe-session-api/server.py — runtime signer mode contract from PR #613

Refs #64
Refs #609

Canary status: pending — Forgejo Actions must run before merge. ## Canary Context Pack ### Product story M04 Vault sunset needs a reversible, evidence-first cutover from Vault SSH signing to the local safe-session OpenSSH CA signer, while keeping Infisical as the only canonical secret source. ### What changed - Added an explicit compose overlay for `safe-session-api` local signer mode. - Added a read-only preflight script for the rendered CA key and safe-session container prerequisites. - Added a cutover runbook with preconditions, smoke evidence, rollback, and destructive-cleanup boundary. ### Why it changed Vault is still running on RS2000, but observed inventory shows no KV secrets and only the safe-session SSH signing path remains. This PR prepares the runtime cutover without performing it. ### Files touched - `compose/overlays/safe-session-local-ca.yaml` - `scripts/cutover/safe-session-local-ca-preflight.sh` - `runbooks/safe-session-local-ca-cutover.md` ### Relevant context - `decisions/0024-infisical-primary-secrets-pipeline.md` - `migrations/vault-to-infisical.md` - `docs/specs/vault-to-infisical-migration-v0/02-plan-and-tasks.md` - PR #612 - PR #613 ### Runtime evidence No runtime mutation was performed. Validation run locally: ```text bash -n scripts/cutover/safe-session-local-ca-preflight.sh: pass platformctl validate all --json: pass, exitCode=0, 88 modules ``` ### Known constraints - The overlay is opt-in only and requires `PLATFORM_RUNTIME_SAFE_SESSION_CA_KEY_FILE`. - The CA key remains outside git and must be rendered from Infisical on RS2000. - Runtime cutover still requires the operator phrase `m04-vault-runtime-cutover-approved`. - Destructive Vault cleanup still requires `m04-vault-destructive-cleanup-approved`. ### Explicit out-of-scope - No Infisical writes. - No safe-session restart. - No Vault stop/delete. - No issue/comment mutation. - No change to allowed principals beyond the observed `llmops` parity. ### Requested decision Approve this as the nondestructive cutover runbook/guardrail PR. ### Merge blockers - Red Forgejo checks. - Evidence that the overlay could create an accidental writable secret mount. - Evidence that the preflight leaks key material. - Evidence that the runbook authorizes Vault shutdown/deletion without the explicit gate. ## DeepSeek V4 Pro redteam DeepSeek V4 Pro reviewed the overlay, preflight, and runbook and returned APPROVE. The review called out no blockers; the main caveat was that modes `0500/0700` are owner-only but executable, which is acceptable for the host-key preflight and can be tightened later if desired. ## Validation - `bash -n scripts/cutover/safe-session-local-ca-preflight.sh` — pass - `UV_CACHE_DIR=/private/tmp/codex-uv-cache PYTHONPATH=control-plane uv run --project control-plane python -m platformctl.cli validate all --json` — pass (`exitCode=0`, 88 modules) ## Spec sources read - `AGENTS.md` — Forgejo identity and PR contract - `docs/forgejo-agent-operations.md` — Forgejo write path and identity rules - `decisions/0024-infisical-primary-secrets-pipeline.md` — M04 boundary - `migrations/vault-to-infisical.md` — migration plan - `docs/specs/vault-to-infisical-migration-v0/02-plan-and-tasks.md` — task sequencing - `scripts/safe-session-api/local_signer.py` — signer contract from PR #613 - `scripts/safe-session-api/server.py` — runtime signer mode contract from PR #613 Refs #64 Refs #609
docs(vault): add safe-session local CA cutover runbook
Some checks failed
canary-required / collect-diff (pull_request) Successful in 4s
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 5s
platformctl plan / auto-apply scope (pull_request) Successful in 23s
canary-required / canary (pull_request) Has been skipped
patchwarden-client-dry-run / dry-run (pull_request) Successful in 22s
base-is-main / guard (pull_request) Successful in 1s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 5s
patchwarden-pr-sanity / sanity (pull_request) Failing after 3m58s
db32650348
codex force-pushed codex/m04-safe-session-cutover-runbook from db32650348
Some checks failed
canary-required / collect-diff (pull_request) Successful in 4s
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 5s
platformctl plan / auto-apply scope (pull_request) Successful in 23s
canary-required / canary (pull_request) Has been skipped
patchwarden-client-dry-run / dry-run (pull_request) Successful in 22s
base-is-main / guard (pull_request) Successful in 1s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 5s
patchwarden-pr-sanity / sanity (pull_request) Failing after 3m58s
to 86eaa7b5bc
Some checks failed
base-is-main / guard (pull_request) Successful in 1s
canary-required / collect-diff (pull_request) Successful in 4s
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 4s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 5s
platformctl plan / auto-apply scope (pull_request) Successful in 23s
canary-required / canary (pull_request) Has been skipped
patchwarden-client-dry-run / dry-run (pull_request) Successful in 22s
patchwarden-pr-sanity / sanity (pull_request) Failing after 2m21s
2026-05-29 21:19:19 +02:00
Compare
codex force-pushed codex/m04-safe-session-cutover-runbook from 86eaa7b5bc
Some checks failed
base-is-main / guard (pull_request) Successful in 1s
canary-required / collect-diff (pull_request) Successful in 4s
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 4s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 5s
platformctl plan / auto-apply scope (pull_request) Successful in 23s
canary-required / canary (pull_request) Has been skipped
patchwarden-client-dry-run / dry-run (pull_request) Successful in 22s
patchwarden-pr-sanity / sanity (pull_request) Failing after 2m21s
to 11aef9acf4
All checks were successful
base-is-main / guard (pull_request) Successful in 1s
canary-required / collect-diff (pull_request) Successful in 4s
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 4s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 4s
platformctl plan / auto-apply scope (pull_request) Successful in 22s
canary-required / canary (pull_request) Has been skipped
patchwarden-client-dry-run / dry-run (pull_request) Successful in 21s
patchwarden-pr-sanity / sanity (pull_request) Successful in 1m55s
2026-05-29 21:44:52 +02:00
Compare
Author
Collaborator

Patchwarden PR sanity

  • Status: advisory_findings
  • PR: 614
  • Commit: 2c2848e3f70dd56eceb1681a72cbdeba9e54ae73
  • Security-sensitive label: present
  • Authority: advisory model review plus deterministic blockers only
  • 3+3 canary: still alive; this does not replace it

Deterministic findings

No deterministic findings.

Model reviewers

global-glm / glm-5.1:cloud

  • Status: ok

  • Verdict: OK

  • low Preflight mode check accepts 500/700 despite runbook stating '0600 or stricter'

    • Evidence: scripts/cutover/safe-session-local-ca-preflight.sh line 27: case "$mode" in 400|500|600|700) ;;accepts mode 700 (rwx for owner), but runbooks/safe-session-local-ca-cutover.md line 21 states 'mode0600 or stricter'. Mode 700 adds execut
    • Next: Consider tightening the case pattern to 400|600 only, or update the runbook to accurately reflect that modes 500/700 are accepted. This is a hardening opportunity, not a blocker, since 700 is still owner-only.

global-deepseek / deepseek-v4-pro:cloud

  • Status: ok
  • Verdict: OK
  • Findings: none

redteam / kimi-k2.6:cloud

  • Status: ok

  • Verdict: NOT_OK

  • high Preflight omits owner validation despite runbook requiring root:root

    • Evidence: scripts/cutover/safe-session-local-ca-preflight.sh lines 27-28 compute owner="$(stat ... )"and line 44 emits it inok(), but never enforce the value; runbooks/safe-session-local-ca-cutover.md explicitly lists the precondition 'The rend
    • Next: Add an explicit ownership guard to the preflight script before the ok() call: if [[ "$owner" != "root:root" ]]; then fail "ca_key_owner_must_be_root:root"; fi

Policy notes

  • GLM 5.1 + DeepSeek V4 Pro are the operator-required model mix for this bot.
  • Optional red-team model is enabled only when PLATFORMCTL_PR_SANITY_REDTEAM_MODEL is configured.
  • Auto-merge is not enabled here.
<!-- patchwarden-pr-sanity:pdurlej/platform:PR-614 --> # Patchwarden PR sanity - Status: `advisory_findings` - PR: `614` - Commit: `2c2848e3f70dd56eceb1681a72cbdeba9e54ae73` - Security-sensitive label: `present` - Authority: advisory model review plus deterministic blockers only - 3+3 canary: still alive; this does not replace it ## Deterministic findings No deterministic findings. ## Model reviewers ### `global-glm` / `glm-5.1:cloud` - Status: `ok` - Verdict: `OK` - **`low`** Preflight mode check accepts 500/700 despite runbook stating '0600 or stricter' - Evidence: `scripts/cutover/safe-session-local-ca-preflight.sh line 27: `case "$mode" in 400|500|600|700) ;;` accepts mode 700 (rwx for owner), but runbooks/safe-session-local-ca-cutover.md line 21 states 'mode `0600` or stricter'. Mode 700 adds execut` - Next: Consider tightening the case pattern to `400|600` only, or update the runbook to accurately reflect that modes 500/700 are accepted. This is a hardening opportunity, not a blocker, since 700 is still owner-only. ### `global-deepseek` / `deepseek-v4-pro:cloud` - Status: `ok` - Verdict: `OK` - Findings: none ### `redteam` / `kimi-k2.6:cloud` - Status: `ok` - Verdict: `NOT_OK` - **`high`** Preflight omits owner validation despite runbook requiring root:root - Evidence: `scripts/cutover/safe-session-local-ca-preflight.sh lines 27-28 compute `owner="$(stat ... )"` and line 44 emits it in `ok()`, but never enforce the value; runbooks/safe-session-local-ca-cutover.md explicitly lists the precondition 'The rend` - Next: Add an explicit ownership guard to the preflight script before the ok() call: `if [[ "$owner" != "root:root" ]]; then fail "ca_key_owner_must_be_root:root"; fi` ## Policy notes - GLM 5.1 + DeepSeek V4 Pro are the operator-required model mix for this bot. - Optional red-team model is enabled only when `PLATFORMCTL_PR_SANITY_REDTEAM_MODEL` is configured. - Auto-merge is not enabled here.
codex force-pushed codex/m04-safe-session-cutover-runbook from 11aef9acf4
All checks were successful
base-is-main / guard (pull_request) Successful in 1s
canary-required / collect-diff (pull_request) Successful in 4s
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 4s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 4s
platformctl plan / auto-apply scope (pull_request) Successful in 22s
canary-required / canary (pull_request) Has been skipped
patchwarden-client-dry-run / dry-run (pull_request) Successful in 21s
patchwarden-pr-sanity / sanity (pull_request) Successful in 1m55s
to 1c7f99a360
Some checks failed
base-is-main / guard (pull_request) Successful in 1s
canary-required / collect-diff (pull_request) Successful in 4s
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 5s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 5s
platformctl plan / auto-apply scope (pull_request) Successful in 23s
canary-required / canary (pull_request) Has been skipped
patchwarden-client-dry-run / dry-run (pull_request) Successful in 22s
patchwarden-pr-sanity / sanity (pull_request) Has been cancelled
2026-05-29 21:55:47 +02:00
Compare
codex force-pushed codex/m04-safe-session-cutover-runbook from 1c7f99a360
Some checks failed
base-is-main / guard (pull_request) Successful in 1s
canary-required / collect-diff (pull_request) Successful in 4s
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 5s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 5s
platformctl plan / auto-apply scope (pull_request) Successful in 23s
canary-required / canary (pull_request) Has been skipped
patchwarden-client-dry-run / dry-run (pull_request) Successful in 22s
patchwarden-pr-sanity / sanity (pull_request) Has been cancelled
to 2c2848e3f7
All checks were successful
base-is-main / guard (pull_request) Successful in 2s
canary-required / collect-diff (pull_request) Successful in 5s
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 5s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 4s
platformctl plan / auto-apply scope (pull_request) Successful in 24s
canary-required / canary (pull_request) Has been skipped
patchwarden-client-dry-run / dry-run (pull_request) Successful in 20s
patchwarden-pr-sanity / sanity (pull_request) Successful in 2m50s
2026-05-29 22:03:04 +02:00
Compare
Sign in to join this conversation.
No reviewers
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform!614
No description provided.