ops(deploy): non-module ops scripts do not promote to RS2000 release root after merge #279

Closed
opened 2026-05-14 10:51:33 +02:00 by codex · 2 comments
Collaborator

Context

During #260 deploy-runner pickup RCA, PR #278 landed watchdog instrumentation on main, but RS2000 was still executing the older release root:

  • /opt/pdurlej-platform/current -> releases/79955cfa377256c83c74d969745a4931621c3ac0
  • origin/main after #278: f592fc56725ff9affabe1db3473f4e6102a36b4d
  • .forgejo/workflows/platformctl-auto-apply.yml only triggers on push paths modules/**, so #278 (scripts/forgejo/deploy-runner-watchdog, tests, prompt) did not promote runtime release root.

Codex manually created/promoted trusted release f592fc56725ff9affabe1db3473f4e6102a36b4d as part of #260 because the diff from the active release was limited to:

A prompts/codex-260-runner-pickup-rca-2026-05-14.md
M scripts/forgejo/deploy-runner-watchdog
M tests/test_deploy_runner_watchdog.py

No compose or module files changed.

Problem

Ops scripts used by systemd from /opt/pdurlej-platform/current can drift behind merged main when the merge does not touch modules/**.

That means a PR can be correctly merged and still not affect the runtime script that systemd executes.

Evidence

  • systemctl cat forgejo-deploy-runner-watchdog.service uses:
    ExecStart=/opt/pdurlej-platform/current/scripts/forgejo/deploy-runner-watchdog
  • Before manual promotion, grep "candidate deploy-host runner rows" /opt/pdurlej-platform/current/scripts/forgejo/deploy-runner-watchdog failed.
  • After manual trusted release promotion:
    /opt/pdurlej-platform/current -> releases/f592fc56725ff9affabe1db3473f4e6102a36b4d
  • bash -n /opt/pdurlej-platform/current/scripts/forgejo/deploy-runner-watchdog passes.
  • Watchdog timer remains active and reports no current stuck jobs.

Candidate resolutions

  1. Add a trusted-main release-root promotion workflow for scripts/forgejo/**, runbooks/forgejo-actions-runner.md, and related ops files.
  2. Extend platformctl-auto-apply.yml or create a sibling workflow that updates /opt/pdurlej-platform/current when only non-module ops files change, without running compose apply.
  3. Add a drift check that reports when systemd-referenced scripts under /opt/pdurlej-platform/current differ from trusted main.
  4. If automatic promotion is intentionally deferred, document the manual promotion command and require it in PR bodies for ops-script changes.

Acceptance criteria

  • After merging a PR that changes scripts/forgejo/deploy-runner-watchdog, RS2000 executes the merged version without ad-hoc copying into an old release directory.
  • The mechanism does not run untrusted PR head code.
  • The mechanism does not apply compose changes or restart production containers.
  • The mechanism leaves an auditable release SHA.

Explicit out of scope

  • Changing module auto-apply semantics.
  • Changing compose files.
  • Cleaning stale Forgejo runner rows.
  • Infisical or PAT changes.

Refs #260, #278

## Context During #260 deploy-runner pickup RCA, PR #278 landed watchdog instrumentation on `main`, but RS2000 was still executing the older release root: - `/opt/pdurlej-platform/current -> releases/79955cfa377256c83c74d969745a4931621c3ac0` - `origin/main` after #278: `f592fc56725ff9affabe1db3473f4e6102a36b4d` - `.forgejo/workflows/platformctl-auto-apply.yml` only triggers on `push` paths `modules/**`, so #278 (`scripts/forgejo/deploy-runner-watchdog`, tests, prompt) did not promote runtime release root. Codex manually created/promoted trusted release `f592fc56725ff9affabe1db3473f4e6102a36b4d` as part of #260 because the diff from the active release was limited to: ```text A prompts/codex-260-runner-pickup-rca-2026-05-14.md M scripts/forgejo/deploy-runner-watchdog M tests/test_deploy_runner_watchdog.py ``` No compose or module files changed. ## Problem Ops scripts used by systemd from `/opt/pdurlej-platform/current` can drift behind merged `main` when the merge does not touch `modules/**`. That means a PR can be correctly merged and still not affect the runtime script that systemd executes. ## Evidence - `systemctl cat forgejo-deploy-runner-watchdog.service` uses: `ExecStart=/opt/pdurlej-platform/current/scripts/forgejo/deploy-runner-watchdog` - Before manual promotion, `grep "candidate deploy-host runner rows" /opt/pdurlej-platform/current/scripts/forgejo/deploy-runner-watchdog` failed. - After manual trusted release promotion: `/opt/pdurlej-platform/current -> releases/f592fc56725ff9affabe1db3473f4e6102a36b4d` - `bash -n /opt/pdurlej-platform/current/scripts/forgejo/deploy-runner-watchdog` passes. - Watchdog timer remains active and reports no current stuck jobs. ## Candidate resolutions 1. Add a trusted-main release-root promotion workflow for `scripts/forgejo/**`, `runbooks/forgejo-actions-runner.md`, and related ops files. 2. Extend `platformctl-auto-apply.yml` or create a sibling workflow that updates `/opt/pdurlej-platform/current` when only non-module ops files change, without running compose apply. 3. Add a drift check that reports when systemd-referenced scripts under `/opt/pdurlej-platform/current` differ from trusted `main`. 4. If automatic promotion is intentionally deferred, document the manual promotion command and require it in PR bodies for ops-script changes. ## Acceptance criteria - After merging a PR that changes `scripts/forgejo/deploy-runner-watchdog`, RS2000 executes the merged version without ad-hoc copying into an old release directory. - The mechanism does not run untrusted PR head code. - The mechanism does not apply compose changes or restart production containers. - The mechanism leaves an auditable release SHA. ## Explicit out of scope - Changing module auto-apply semantics. - Changing compose files. - Cleaning stale Forgejo runner rows. - Infisical or PAT changes. Refs #260, #278
Author
Collaborator

Codex #279 release-root bootstrap checkpoint — 2026-05-16 08:38 CEST

Role: executor
Status: PR ready; merge is owner-gated by Forgejo permissions

What changed

  • PR #293 is open, mergeable, and all checks are green.
  • codex attempted API merge after explicit operator approval, but Forgejo denied it with User not allowed to merge PR.
  • I installed the root-owned release-root promotion helper on RS2000 and configured the narrow sudoers entry for forgejo-deploy.

Evidence

  • /etc/sudoers.d/forgejo-deploy-release-root-promote: parsed OK
  • forgejo-deploy can invoke /usr/local/sbin/pdurlej-platform-promote-release-root --help via sudo -n.

Next action

  • Operator merges #293.
  • Then Codex can run/verify release-root promotion to trusted main and fire one no-op matrix-well-known auto-apply smoke.
## Codex #279 release-root bootstrap checkpoint — 2026-05-16 08:38 CEST **Role:** executor **Status:** PR ready; merge is owner-gated by Forgejo permissions ### What changed - PR #293 is open, mergeable, and all checks are green. - `codex` attempted API merge after explicit operator approval, but Forgejo denied it with `User not allowed to merge PR`. - I installed the root-owned release-root promotion helper on RS2000 and configured the narrow sudoers entry for `forgejo-deploy`. ### Evidence - `/etc/sudoers.d/forgejo-deploy-release-root-promote: parsed OK` - `forgejo-deploy` can invoke `/usr/local/sbin/pdurlej-platform-promote-release-root --help` via `sudo -n`. ### Next action - Operator merges #293. - Then Codex can run/verify release-root promotion to trusted main and fire one no-op `matrix-well-known` auto-apply smoke.
Author
Collaborator

Codex #279 final evidence — 2026-05-16 09:24 CEST

Role: executor
Status: fixed and verified

  • PR #293 added the trusted-main release-root promotion lane.
  • RS2000 helper/sudoers installed.
  • Runtime blocker found and fixed: forgejo-deploy-runner.service requires NoNewPrivileges=false so the narrow sudo helper can elevate.
  • Manual dispatch run #1131/API #878 promoted /opt/pdurlej-platform/current to e5e8fe02a3fc97dc147b9104670128bbc459a39a.
  • PR #294 records the missing NoNewPrivileges=false runbook/contract detail and current status evidence.
## Codex #279 final evidence — 2026-05-16 09:24 CEST **Role:** executor **Status:** fixed and verified - PR #293 added the trusted-main release-root promotion lane. - RS2000 helper/sudoers installed. - Runtime blocker found and fixed: `forgejo-deploy-runner.service` requires `NoNewPrivileges=false` so the narrow sudo helper can elevate. - Manual dispatch run #1131/API #878 promoted `/opt/pdurlej-platform/current` to `e5e8fe02a3fc97dc147b9104670128bbc459a39a`. - PR #294 records the missing `NoNewPrivileges=false` runbook/contract detail and current status evidence.
Sign in to join this conversation.
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform#279
No description provided.