chore(vault): record quarantine and disable Vault recreation #626

Merged
pdurlej merged 2 commits from codex/m04-vault-quarantine-evidence into main 2026-05-30 08:22:34 +02:00
Collaborator

Summary

Records M04 Vault pre-quarantine/quarantine evidence and removes Vault from active Compose so destructive cleanup cannot be undone by a future docker compose up.

What changed

  • Added pre-quarantine safe-session smoke evidence.
  • Added Vault quarantine evidence after m04-vault-quarantine-approved.
  • Removed vault and vault-bootstrap services from active core Compose.
  • Removed vault_data and vault_logs named volume definitions from base Compose.
  • Removed the old safe-session-vault-signer overlay.
  • Updated runbooks to make Vault signer rollback a restore-from-backup/git-history path after destructive cleanup.

Runtime evidence

  • docker stop home-platform-vault-1 succeeded.
  • safe-session-api remained healthy and returned /safe-session/health 200.
  • Real /safe-session/api/v1/session/sign path issued internal-ops certs for llmops after Vault stop and again during the 10-minute observation pass.
  • SSH login with issued cert returned llmops.
  • Negative path cloud-note was rejected.
  • vault-sunset-readiness.sh passed after Vault stop.
  • Uptime Kuma and safe-session-web remained healthy.
  • No secret-like log matches and no Vault reconnect-like log matches were found.

Red-team

DeepSeek V4 Pro first required a scope change: active Compose must not be able to recreate Vault after deletion. This PR now removes that Compose path.

DeepSeek V4 Pro second pass: GO_FOR_DESTRUCTIVE_CLEANUP, with preconditions:

  • PR #626 must be merged/deployed first.
  • post-deployment observation must stay green.
  • exact Vault env-file references must be grepped before deletion.

Safety boundary

  • No destructive cleanup is performed by this PR.
  • Runtime destructive cleanup remains a separate operator-approved action using m04-vault-destructive-cleanup-approved.
  • Retained rollback material exists in the fresh post-M04 DR backup.

Validation

  • UV_CACHE_DIR=/private/tmp/uv-cache PYTHONPATH=control-plane uv run --project control-plane python -m platformctl.cli validate all --json — pass.
  • Candidate RS2000 docker compose --env-file /opt/pdurlej-platform/runtime/compose.env -f compose/base/compose.yaml -f compose/core/compose.yaml config --quiet — pass.
  • Patchwarden sanity — pass.

Spec sources read

  • docs/forgejo-agent-operations.md — Forgejo write identity rules.
  • runbooks/vault-quarantine-and-sunset.md — quarantine gates and rollback boundary.
  • runbooks/safe-session-local-ca-cutover.md — safe-session smoke and rollback requirements.
  • compose/base/compose.yaml, compose/core/compose.yaml — active Compose definitions.

Updates #64.

## Summary Records M04 Vault pre-quarantine/quarantine evidence and removes Vault from active Compose so destructive cleanup cannot be undone by a future `docker compose up`. ## What changed - Added pre-quarantine safe-session smoke evidence. - Added Vault quarantine evidence after `m04-vault-quarantine-approved`. - Removed `vault` and `vault-bootstrap` services from active core Compose. - Removed `vault_data` and `vault_logs` named volume definitions from base Compose. - Removed the old `safe-session-vault-signer` overlay. - Updated runbooks to make Vault signer rollback a restore-from-backup/git-history path after destructive cleanup. ## Runtime evidence - `docker stop home-platform-vault-1` succeeded. - `safe-session-api` remained healthy and returned `/safe-session/health` 200. - Real `/safe-session/api/v1/session/sign` path issued `internal-ops` certs for `llmops` after Vault stop and again during the 10-minute observation pass. - SSH login with issued cert returned `llmops`. - Negative path `cloud-note` was rejected. - `vault-sunset-readiness.sh` passed after Vault stop. - Uptime Kuma and safe-session-web remained healthy. - No secret-like log matches and no Vault reconnect-like log matches were found. ## Red-team DeepSeek V4 Pro first required a scope change: active Compose must not be able to recreate Vault after deletion. This PR now removes that Compose path. DeepSeek V4 Pro second pass: `GO_FOR_DESTRUCTIVE_CLEANUP`, with preconditions: - PR #626 must be merged/deployed first. - post-deployment observation must stay green. - exact Vault env-file references must be grepped before deletion. ## Safety boundary - No destructive cleanup is performed by this PR. - Runtime destructive cleanup remains a separate operator-approved action using `m04-vault-destructive-cleanup-approved`. - Retained rollback material exists in the fresh post-M04 DR backup. ## Validation - `UV_CACHE_DIR=/private/tmp/uv-cache PYTHONPATH=control-plane uv run --project control-plane python -m platformctl.cli validate all --json` — pass. - Candidate RS2000 `docker compose --env-file /opt/pdurlej-platform/runtime/compose.env -f compose/base/compose.yaml -f compose/core/compose.yaml config --quiet` — pass. - Patchwarden sanity — pass. ## Spec sources read - `docs/forgejo-agent-operations.md` — Forgejo write identity rules. - `runbooks/vault-quarantine-and-sunset.md` — quarantine gates and rollback boundary. - `runbooks/safe-session-local-ca-cutover.md` — safe-session smoke and rollback requirements. - `compose/base/compose.yaml`, `compose/core/compose.yaml` — active Compose definitions. Updates #64.
docs(vault): record M04 quarantine evidence
All checks were successful
base-is-main / guard (pull_request) Successful in 2s
canary-required / collect-diff (pull_request) Successful in 5s
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 5s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 5s
canary-required / canary (pull_request) Has been skipped
patchwarden-client-dry-run / dry-run (pull_request) Successful in 54s
patchwarden-pr-sanity / sanity (pull_request) Successful in 2m42s
9cbd6749ae
Author
Collaborator

Patchwarden PR sanity

  • Status: advisory_findings
  • PR: 626
  • Commit: a9dfe3b88061696af094975e2c46c28fcb24d5d0
  • Security-sensitive label: missing
  • Authority: advisory model review plus deterministic blockers only
  • 3+3 canary: still alive; this does not replace it

Deterministic findings

No deterministic findings.

Model reviewers

global-glm / glm-5.1:cloud

  • Status: ok

  • Verdict: NOT_OK

  • high PR title says 'record evidence' but changes remove infrastructure definitions

    • Evidence: compose/core/compose.yaml removes vault and vault-bootstrap service definitions (62 lines removed). compose/overlays/safe-session-vault-signer.yaml is deleted entirely. compose/base/compose.yaml removes vault_data and vault_logs volume defi
    • Next: Either split into two PRs (one for evidence reports, one for compose definition removal) or update the title to reflect infrastructure changes, e.g., 'feat(vault): remove compose definitions and record M04 quarantine evidence'
  • high Rollback path 'docker start' is fragile and undocumented as time-limited

    • Evidence: PR description states 'Rollback remains docker start home-platform-vault-1' and runbooks/vault-quarantine-and-sunset.md line 127 shows this command. However, compose/core/compose.yaml no longer defines the vault service, so if the container
    • Next: Add explicit warning in runbooks that 'docker start' rollback only works while the stopped container exists. If docker compose down is run, the container is removed and must be recreated from git history. Consider keeping the vault service definition commented out rather than deleted.
  • medium Volume definitions removed may cause orphaned volume management issues

    • Evidence: compose/base/compose.yaml lines 16-19 removed vault_data and vault_logs named volume definitions. While named volumes with explicit names persist independently, removing them from compose.yaml means docker compose won't track them as part o
    • Next: Either retain the volume definitions with a comment indicating they're preserved for quarantine rollback, or document in the runbook that these volumes must be manually protected from cleanup until destructive cleanup is approved

global-deepseek / deepseek-v4-pro:cloud

  • Status: ok

  • Verdict: NOT_OK

  • high Premature removal of Vault from compose definitions during quarantine

    • Evidence: compose/base/compose.yaml removes vault_data and vault_logs volumes; compose/core/compose.yaml removes the vault and vault-bootstrap services; compose/overlays/safe-session-vault-signer.yaml is deleted. The PR description states Vault was o
    • Next: Either revert the compose changes and keep them in a separate PR explicitly for the destructive-cleanup phase, or update the PR description and runbooks to clarify that removing Vault from compose is an approved part of the quarantine step (not just documentation). Ensure the rollback path remains u

redteam / kimi-k2.6:cloud

  • Status: ok

  • Verdict: NOT_OK

  • high Vault container orphaned by compose service removal

    • Evidence: compose/core/compose.yaml removes the vaultandvault-bootstrapservice definitions entirely, and compose/base/compose.yaml removes thevault_dataandvault_logsvolume declarations. The PR explicitly states the stopped containerh`
    • Next: Do not remove the vault and vault-bootstrap service definitions from compose/core/compose.yaml or the volume declarations from compose/base/compose.yaml until destructive cleanup is actually executed. Keep the existing profile-gated definitions so the stopped container remains a managed Comp

Policy notes

  • GLM 5.1 + DeepSeek V4 Pro are the operator-required model mix for this bot.
  • Optional red-team model is enabled only when PLATFORMCTL_PR_SANITY_REDTEAM_MODEL is configured.
  • Auto-merge is not enabled here.
<!-- patchwarden-pr-sanity:pdurlej/platform:PR-626 --> # Patchwarden PR sanity - Status: `advisory_findings` - PR: `626` - Commit: `a9dfe3b88061696af094975e2c46c28fcb24d5d0` - Security-sensitive label: `missing` - Authority: advisory model review plus deterministic blockers only - 3+3 canary: still alive; this does not replace it ## Deterministic findings No deterministic findings. ## Model reviewers ### `global-glm` / `glm-5.1:cloud` - Status: `ok` - Verdict: `NOT_OK` - **`high`** PR title says 'record evidence' but changes remove infrastructure definitions - Evidence: `compose/core/compose.yaml removes vault and vault-bootstrap service definitions (62 lines removed). compose/overlays/safe-session-vault-signer.yaml is deleted entirely. compose/base/compose.yaml removes vault_data and vault_logs volume defi` - Next: Either split into two PRs (one for evidence reports, one for compose definition removal) or update the title to reflect infrastructure changes, e.g., 'feat(vault): remove compose definitions and record M04 quarantine evidence' - **`high`** Rollback path 'docker start' is fragile and undocumented as time-limited - Evidence: `PR description states 'Rollback remains docker start home-platform-vault-1' and runbooks/vault-quarantine-and-sunset.md line 127 shows this command. However, compose/core/compose.yaml no longer defines the vault service, so if the container` - Next: Add explicit warning in runbooks that 'docker start' rollback only works while the stopped container exists. If docker compose down is run, the container is removed and must be recreated from git history. Consider keeping the vault service definition commented out rather than deleted. - **`medium`** Volume definitions removed may cause orphaned volume management issues - Evidence: `compose/base/compose.yaml lines 16-19 removed vault_data and vault_logs named volume definitions. While named volumes with explicit names persist independently, removing them from compose.yaml means docker compose won't track them as part o` - Next: Either retain the volume definitions with a comment indicating they're preserved for quarantine rollback, or document in the runbook that these volumes must be manually protected from cleanup until destructive cleanup is approved ### `global-deepseek` / `deepseek-v4-pro:cloud` - Status: `ok` - Verdict: `NOT_OK` - **`high`** Premature removal of Vault from compose definitions during quarantine - Evidence: `compose/base/compose.yaml removes vault_data and vault_logs volumes; compose/core/compose.yaml removes the vault and vault-bootstrap services; compose/overlays/safe-session-vault-signer.yaml is deleted. The PR description states Vault was o` - Next: Either revert the compose changes and keep them in a separate PR explicitly for the destructive-cleanup phase, or update the PR description and runbooks to clarify that removing Vault from compose is an approved part of the quarantine step (not just documentation). Ensure the rollback path remains u ### `redteam` / `kimi-k2.6:cloud` - Status: `ok` - Verdict: `NOT_OK` - **`high`** Vault container orphaned by compose service removal - Evidence: `compose/core/compose.yaml removes the `vault` and `vault-bootstrap` service definitions entirely, and compose/base/compose.yaml removes the `vault_data` and `vault_logs` volume declarations. The PR explicitly states the stopped container `h` - Next: Do not remove the `vault` and `vault-bootstrap` service definitions from `compose/core/compose.yaml` or the volume declarations from `compose/base/compose.yaml` until destructive cleanup is actually executed. Keep the existing profile-gated definitions so the stopped container remains a managed Comp ## Policy notes - GLM 5.1 + DeepSeek V4 Pro are the operator-required model mix for this bot. - Optional red-team model is enabled only when `PLATFORMCTL_PR_SANITY_REDTEAM_MODEL` is configured. - Auto-merge is not enabled here.
chore(vault): prevent compose from recreating Vault
All checks were successful
base-is-main / guard (pull_request) Successful in 1s
canary-required / collect-diff (pull_request) Successful in 4s
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 4s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 4s
platformctl plan / auto-apply scope (pull_request) Successful in 23s
canary-required / canary (pull_request) Has been skipped
patchwarden-client-dry-run / dry-run (pull_request) Successful in 22s
patchwarden-pr-sanity / sanity (pull_request) Successful in 2m50s
a9dfe3b880
codex changed title from docs(vault): record M04 quarantine evidence to chore(vault): record quarantine and disable Vault recreation 2026-05-30 07:48:29 +02:00
pdurlej approved these changes 2026-05-30 07:50:16 +02:00
pdurlej left a comment

Operator-authorized approval via temporary admin PAT. PR #626 records M04 quarantine evidence and applies DeepSeek-required scope fix so active Compose cannot recreate Vault. Checks are green; destructive runtime cleanup remains separately gated.

Operator-authorized approval via temporary admin PAT. PR #626 records M04 quarantine evidence and applies DeepSeek-required scope fix so active Compose cannot recreate Vault. Checks are green; destructive runtime cleanup remains separately gated.
Sign in to join this conversation.
No reviewers
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform!626
No description provided.