ops(apply): pending runtime applies — operator-approved batch (postgres / memory-migration / runner) #688

Closed
opened 2026-06-02 12:09:48 +02:00 by claude · 2 comments
Collaborator

Context

Three runtime changes were authored/prepared but their applies did not land — the linked issues (#668 / #661 / #675) were closed on the author/prepare/audit half, while the runtime (operator-gated) half stayed pending. Verified live (rs2000, 2026-06-02):

  • home-platform-postgres-1 + home-platform-kan-postgres-1 still postgres:16.12-alpine → the 11 CVEs fixed in 16.14 are unpatched.
  • forgejo-runner still mounts /var/run/docker.sock:rw → host-compromise surface open.
  • Memory tables (task_run/task_checkpoint) not confirmed in the live main Postgres.

This is the canonical tracker so these don't fall through — no green code with red PM.

Operator approval (explicit, in-session 2026-06-02)

The operator (Piotr) approves carrying these through to APPLY — a single human gate covering prepare → apply. Do NOT split into a second human approval between plan and apply. The automated CI/plan verification still runs (the safety check); the human approval is granted here, once.

Scoped approval handle: pending-runtime-applies-2026-06-02-approved.

The 3 applies

1. PostgreSQL 16.12 → 16.14 (apply-only)

  • Targets: home-platform-postgres-1 + home-platform-kan-postgres-1.
  • Repo change already prepared (#668). Low-risk: stop → swap binaries → restart, no dump/reload.
  • Approved → apply.

2. Memory migration (apply-only)

  • Create task_run / task_checkpoint (+ related) in the live main Postgres, per the authored migration (#661). Additive.
  • Approved → apply. Re-verify the tables exist afterward.

3. Runner hardening (author-THEN-apply — the involved one)

  • Only the audit-doc landed (#675); the fix does not exist yet. Codex authors the hardening (socket-proxy / removal / build-vs-deploy split), then applies.
  • 🛡️ Bootstrapping safety-net REQUIRED: verify the NEW runner config can successfully run a job before removing the current socket-mounted runner. A broken runner can't run the CI that would fix it. Stage it: stand the hardened runner up alongside, prove it green, then cut over.
  • Approved to author + apply, with the safety-net.

Scope boundary (sticky scoped approval discipline)

This approval covers ONLY these 3 named applies. It does NOT authorize: other migrations, destructive cleanup, data deletion, public exposure, DNS/network/Tailscale/auth changes, credential rotation, or any runtime mutation outside this list. New risk → new approval.

Acceptance

  • Both postgres instances on 16.14-alpinelive-verified (not "prepared").
  • Memory tables present in live main Postgres — live-verified.
  • Runner hardened (socket removed/proxied) with CI still functionallive-verified.
  • Each item closed on live confirmation, never on "prepared/authored" (this is the bug that created this tracker).

Ties

#668 (postgres), #661 (memory migration), #675 (runner). Live-verified by claude (Phase-0 audit, 2026-06-02).

## Context Three runtime changes were authored/prepared but their **applies did not land** — the linked issues (#668 / #661 / #675) were closed on the author/prepare/audit half, while the runtime (operator-gated) half stayed pending. **Verified live (rs2000, 2026-06-02):** - `home-platform-postgres-1` + `home-platform-kan-postgres-1` still `postgres:16.12-alpine` → the 11 CVEs fixed in 16.14 are **unpatched**. - `forgejo-runner` still mounts `/var/run/docker.sock:rw` → host-compromise surface **open**. - Memory tables (`task_run`/`task_checkpoint`) not confirmed in the live main Postgres. This is the canonical tracker so these don't fall through — *no green code with red PM*. ## ✅ Operator approval (explicit, in-session 2026-06-02) The operator (Piotr) **approves carrying these through to APPLY** — a **single human gate** covering prepare → apply. **Do NOT split into a second human approval between plan and apply.** The automated CI/plan verification still runs (the safety check); the human approval is granted here, once. Scoped approval handle: `pending-runtime-applies-2026-06-02-approved`. ## The 3 applies ### 1. PostgreSQL 16.12 → 16.14 (apply-only) - Targets: `home-platform-postgres-1` + `home-platform-kan-postgres-1`. - Repo change already prepared (#668). Low-risk: stop → swap binaries → restart, **no dump/reload**. - **Approved → apply.** ### 2. Memory migration (apply-only) - Create `task_run` / `task_checkpoint` (+ related) in the **live main Postgres**, per the authored migration (#661). Additive. - **Approved → apply.** Re-verify the tables exist afterward. ### 3. Runner hardening (author-THEN-apply — the involved one) - Only the audit-doc landed (#675); **the fix does not exist yet.** Codex authors the hardening (socket-proxy / removal / build-vs-deploy split), then applies. - **🛡️ Bootstrapping safety-net REQUIRED:** verify the NEW runner config can successfully run a job **before** removing the current socket-mounted runner. A broken runner can't run the CI that would fix it. Stage it: stand the hardened runner up alongside, prove it green, then cut over. - **Approved to author + apply, with the safety-net.** ## Scope boundary (sticky scoped approval discipline) This approval covers **ONLY** these 3 named applies. It does **NOT** authorize: other migrations, destructive cleanup, data deletion, public exposure, DNS/network/Tailscale/auth changes, credential rotation, or any runtime mutation outside this list. New risk → new approval. ## Acceptance - [ ] Both postgres instances on `16.14-alpine` — **live-verified** (not "prepared"). - [ ] Memory tables present in live main Postgres — **live-verified**. - [ ] Runner hardened (socket removed/proxied) **with CI still functional** — **live-verified**. - [ ] Each item closed on **live confirmation**, never on "prepared/authored" (this is the bug that created this tracker). ## Ties #668 (postgres), #661 (memory migration), #675 (runner). Live-verified by claude (Phase-0 audit, 2026-06-02).
Collaborator

Codex #688 live-read/apply checkpoint (sanitized):

Done under operator approval handle pending-runtime-applies-2026-06-02-approved:

  1. Postgres image apply
  • Updated runtime image overrides for POSTGRES_IMAGE and KAN_POSTGRES_IMAGE to pinned postgres:16.14-alpine@sha256:16bc17....
  • Backups created on RS2000 for the touched env files with suffix .bak-postgres-16-14-20260602T102943Z.
  • Recreated only postgres and kan-postgres through release-root compose, --no-deps.
  • Verified:
    • home-platform-postgres-1: 16.14 pinned image, healthy, pg_isready accepting connections.
    • home-platform-kan-postgres-1: 16.14 pinned image, healthy, pg_isready accepting connections.
  1. Memory Control Plane live migration
  • Applied /opt/pdurlej-platform/current/migrations/20260601_memory_control_plane.sql to live main Postgres DB postgres with ON_ERROR_STOP=1.
  • Verified tables: task_run, task_run_event, task_checkpoint, memory_procedure, memory_procedure_version, memory_procedure_event, memory_procedure_embedding.
  • Verified append-only triggers for checkpoint/procedure/event/embedding tables.
  1. Runner hardening
  • Repo-side safety-net PR opened: #691.
  • PR #691 removes the direct runner /var/run/docker.sock mount and routes runner Docker API through pinned docker-socket-proxy.
  • RS2000 preflight for the PR compose file passed: docker compose config --quiet.
  • Live runner cutover is intentionally waiting for PR #691 merge/green checks.

No tokens, env values, private issue bodies, or raw secret material included here.

Codex #688 live-read/apply checkpoint (sanitized): Done under operator approval handle `pending-runtime-applies-2026-06-02-approved`: 1. Postgres image apply - Updated runtime image overrides for `POSTGRES_IMAGE` and `KAN_POSTGRES_IMAGE` to pinned `postgres:16.14-alpine@sha256:16bc17...`. - Backups created on RS2000 for the touched env files with suffix `.bak-postgres-16-14-20260602T102943Z`. - Recreated only `postgres` and `kan-postgres` through release-root compose, `--no-deps`. - Verified: - `home-platform-postgres-1`: 16.14 pinned image, healthy, `pg_isready` accepting connections. - `home-platform-kan-postgres-1`: 16.14 pinned image, healthy, `pg_isready` accepting connections. 2. Memory Control Plane live migration - Applied `/opt/pdurlej-platform/current/migrations/20260601_memory_control_plane.sql` to live main Postgres DB `postgres` with `ON_ERROR_STOP=1`. - Verified tables: `task_run`, `task_run_event`, `task_checkpoint`, `memory_procedure`, `memory_procedure_version`, `memory_procedure_event`, `memory_procedure_embedding`. - Verified append-only triggers for checkpoint/procedure/event/embedding tables. 3. Runner hardening - Repo-side safety-net PR opened: #691. - PR #691 removes the direct runner `/var/run/docker.sock` mount and routes runner Docker API through pinned `docker-socket-proxy`. - RS2000 preflight for the PR compose file passed: `docker compose config --quiet`. - Live runner cutover is intentionally waiting for PR #691 merge/green checks. No tokens, env values, private issue bodies, or raw secret material included here.
Collaborator

#688 final runtime evidence

Operator approval handle honored: pending-runtime-applies-2026-06-02-approved.

Completed scope:

  1. Postgres images

    • home-platform-postgres-1 is on postgres:16.14-alpine@sha256:16bc17... and healthy.
    • home-platform-kan-postgres-1 is on postgres:16.14-alpine@sha256:16bc17... and healthy.
    • Env overrides were updated in runtime compose env files; backups were left on host.
  2. Memory Control Plane migration

    • Applied migrations/20260601_memory_control_plane.sql to live main Postgres.
    • Verified tables: task_run, task_checkpoint, task_run_event, memory_procedure, memory_procedure_version, memory_procedure_event, memory_procedure_embedding.
  3. Forgejo runner socket hardening

    • #691 moved runner jobs off the direct raw Docker socket and through docker-socket-proxy.
    • Runtime smoke caught missing proxy EXEC=1; live proxy was hotfixed.
    • #693 made EXEC=1 source-of-truth and added regression coverage.
    • Live verification: runner has no raw Docker socket mount, uses DOCKER_HOST=tcp://docker-socket-proxy:2375; proxy has EXEC=1 with AUTH=0, SECRETS=0, SYSTEM=0.
    • Smoke PR #692 rerun after #693: 7/7 checks green, including Patchwarden dry-run and sanity; closed without merge.

Validation run locally for #693:

  • runner/contract pytest: 54 passed
  • runner hardening audit: passed with no critical findings
  • platformctl validate all --json: exitCode=0

Known remaining runner hardening follow-ups from audit, not part of this approved batch:

  • docker_host_label_available
  • runner_has_infisical_token_file
  • persistent_runner_data_mount

No secrets, raw tokens, DB values, issue bodies, or private env contents were included in this evidence.

# #688 final runtime evidence Operator approval handle honored: `pending-runtime-applies-2026-06-02-approved`. Completed scope: 1. Postgres images - `home-platform-postgres-1` is on `postgres:16.14-alpine@sha256:16bc17...` and healthy. - `home-platform-kan-postgres-1` is on `postgres:16.14-alpine@sha256:16bc17...` and healthy. - Env overrides were updated in runtime compose env files; backups were left on host. 2. Memory Control Plane migration - Applied `migrations/20260601_memory_control_plane.sql` to live main Postgres. - Verified tables: `task_run`, `task_checkpoint`, `task_run_event`, `memory_procedure`, `memory_procedure_version`, `memory_procedure_event`, `memory_procedure_embedding`. 3. Forgejo runner socket hardening - #691 moved runner jobs off the direct raw Docker socket and through `docker-socket-proxy`. - Runtime smoke caught missing proxy `EXEC=1`; live proxy was hotfixed. - #693 made `EXEC=1` source-of-truth and added regression coverage. - Live verification: runner has no raw Docker socket mount, uses `DOCKER_HOST=tcp://docker-socket-proxy:2375`; proxy has `EXEC=1` with `AUTH=0`, `SECRETS=0`, `SYSTEM=0`. - Smoke PR #692 rerun after #693: 7/7 checks green, including Patchwarden dry-run and sanity; closed without merge. Validation run locally for #693: - runner/contract pytest: 54 passed - runner hardening audit: passed with no critical findings - `platformctl validate all --json`: `exitCode=0` Known remaining runner hardening follow-ups from audit, not part of this approved batch: - `docker_host_label_available` - `runner_has_infisical_token_file` - `persistent_runner_data_mount` No secrets, raw tokens, DB values, issue bodies, or private env contents were included in this evidence.
Sign in to join this conversation.
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform#688
No description provided.