ops(rs2000): codify disk health follow-ups #802
No reviewers
Labels
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
pdurlej/platform!802
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "codex/orders/rs2000-runtime-followups"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Canary status: missing - fire canary 3+3 manually before merge
Canary Context Pack
Product story
RS2000 already recovered from the Docker/runtime disk pressure incident in #800. This follow-up makes the remaining policy edges explicit so future agents can tell the difference between safe cache cleanup, staged-but-not-live host policy, and data-bearing runtime state.
What changed
docker_disk_policy.py report.returncode=127command payloads instead of crashing local/non-systemd runs.SystemMaxUse=1G,MaxRetentionSec=14day.journald.policyreport metadata:configured,disk_usage_status,live_verified,pending_restart,status, expected/observed config, and warnings.Why it changed
The previous emergency cleanup solved the immediate outage pattern, but the follow-up needed to prevent two agent-visible failure modes: broad unsafe Docker pruning, and false confidence that a staged journald file is already active.
Files touched
scripts/host-ops/docker_disk_policy.pyscripts/host-ops/install_docker_disk_policy.pytests/test_host_ops_docker_disk_policy.pyrunbooks/rs2000-disk-hygiene.mdstate/reports/rs2000-docker-reclaimables-classification-2026-06-17.mdstate/reports/rs2000-logopts-ownership-plan-2026-06-17.mdRelevant context
Runtime evidence
No fresh RS2000 mutation was performed in this PR. Fresh read-only SSH was attempted, but local
ssh-agentrefused key signing. This PR uses #800 closeout evidence and local deterministic tests only.Known constraints
systemd-journald.journald.policy.status=activemeans the configured drop-in is older than the observed journald process entry.pending_restart,configured_unverified, andunknownare not green states.Explicit out-of-scope
docker system prune --volumes, backup deletion, ordocker compose --remove-orphans.Requested decision
Approve the repo-side policy/logging/docs follow-up. Runtime rollout or any destructive cleanup remains a separate approved maintenance action.
Merge blockers
Spec sources read
scripts/host-ops/docker_disk_policy.py- implementation target.scripts/host-ops/install_docker_disk_policy.py- installer target.tests/test_host_ops_docker_disk_policy.py- focused tests.runbooks/rs2000-disk-hygiene.md- operator-facing policy.state/reports/rs2000-runtime-health-closeout-2026-06-17.md- #800 closeout evidence.Verification
python3 -m pytest tests/test_host_ops_docker_disk_policy.py-> 10 passed.python3 scripts/host-ops/docker_disk_policy.py report --json >/tmp/rs2000-disk-policy-local-report.json && python3 -m json.tool /tmp/rs2000-disk-policy-local-report.json-> JSON valid.journalctlis unavailable there.git diff --check-> clean.rgover touched files -> no matches.Model review
journald.policystatus and tests.approve, no remaining blocker.approve, required changenull; residual risk is operator ignoringpending_restart, documented in the runbook.Closes #796
Closes #798
Closes #799
Refs #801
Role: executor
Terminal action:
operator_override.Reason: Forgejo commit status for
140997a96e8b371705751cc844e90ce5a29129b8remained pending with all PR jobs atWaiting to run/ blocked-by-required-conditions after repeated polling. The PR itself is mergeable and contains no live runtime mutation.Evidence before override:
python3 -m pytest tests/test_host_ops_docker_disk_policy.py-> 10 passed.python3 scripts/host-ops/docker_disk_policy.py report --json-> valid JSON viapython3 -m json.tool.git diff --check-> clean.rgover touched files -> no matches.Active approval: present; scope: live merge-fest for this RS2000 runtime/disk-health follow-up. No exact approval phrase is recorded.
Out of scope remains unchanged: no live RS2000 deploy, no Docker/journald restart, no image/volume/system prune, no backup deletion, and no runtime secret reads.
Patchwarden PR sanity
advisory_findings802140997a96e8b371705751cc844e90ce5a29129b8missingDeterministic findings
No deterministic findings.
Model reviewers
global-glm/glm-5.1:cloudokOKglobal-deepseek/deepseek-v4-pro:cloudStatus:
okVerdict:
ABSTAINhighMissing diff prevents reviewThe PR description lists changed files but the diff block is empty. Without the actual code changes, it is impossible to verify the claimed additions (root filesystem thresholds, journald drop-in, returncode=127 handling, etc.) or check forredteam/kimi-k2.6:cloudStatus:
okVerdict:
NOT_OKhighjournald active-status relies on easily backdated mtimePR description explicitly definesjournald.policy.status=activeinscripts/host-ops/docker_disk_policy.pyas: 'configured drop-in is older than the observed journald process entry'. This mtime-vs-process-start heuristic is trivially bypactivewhen the drop-in checksum matches the sentinel and the journald process start time is newer than the sentinel write time. AltePolicy notes
PLATFORMCTL_PR_SANITY_REDTEAM_MODELis configured.