fix(platformctl): observe image repo digests during plan #297

Merged
pdurlej merged 1 commit from codex/plan/image-repodigests-observation into main 2026-05-16 12:37:03 +02:00
Collaborator

Canary status: missing - Forgejo checks plus operator merge gate required before merge

Canary Context Pack

Product story

F2 no-op smokes should fail only on real drift. np-meerkat-frontend failed after #296 because platformctl plan compared the cataloged repo digest with only the container tag and image ID from docker inspect, even though docker image inspect proved the running image has the cataloged RepoDigest.

What changed

  • platformctl plan now performs a second read-only observation: docker image inspect <container-image-id> --format '{{json .RepoDigests}}'.
  • The discovered repo digests are added to observed image candidates before image drift comparison.
  • Tests cover the Meerkat failure shape: image ID differs from repo digest, but repo digest matches the manifest.

Why it changed

This is a false drift signal in an agent-visible deploy lane. Per ADR-0018, carrying false red signals is not acceptable as normal operating state.

Files touched

  • control-plane/platformctl/plan.py
  • control-plane/platformctl/tests/test_plan_phase3.py

Runtime evidence

  • Failed run: API 1189, UI #933, module np-meerkat-frontend
  • Plan artifact: status=drift, exitCode=1, only change container.image
  • Observed candidates in failed plan: tag latest + image ID sha256:7c6b...
  • RS2000 read-only evidence: docker image inspect sha256:7c6b... returns ghcr.io/fbuchner/meerkat-crm-frontend@sha256:32f73297..., matching module.yaml
  • Local make_plan simulation with live inspect + image RepoDigests returns status=in-sync, exitCode=0

Known constraints

  • This PR does not retry the smoke. Retry happens only after merge.
  • This does not change apply behavior or mutate production.
  • If docker image inspect fails, the plan keeps existing behavior and simply lacks repo digest candidates.

Explicit out-of-scope

  • No Meerkat runtime change.
  • No manifest digest rewrite to image ID.
  • No backend/F3 smoke.

Requested decision

Approve merge if checks are green, then rerun np-meerkat-frontend smoke once.

Merge blockers

  • Plan tests fail.
  • The new observation command is not read-only.
  • The Meerkat live-observation simulation does not return in-sync.

Spec sources read

  • control-plane/platformctl/plan.py - image drift comparison and remote observation
  • control-plane/platformctl/tests/test_plan_phase3.py - plan contract tests
  • modules/np-meerkat-frontend/module.yaml - desired image claim from failed smoke
  • compose/apps/compose.yaml - canonical service imported in #296

Verification

  • pytest -q control-plane/platformctl/tests/test_plan_phase3.py control-plane/platformctl/tests/test_forgejo_ci_scripts_contract.py tests/test_platform_host_agent_wrapper.py -> 47 passed
  • git diff --check -> passed
  • Read-only RS2000 inspect + local make_plan() simulation for np-meerkat-frontend -> status=in-sync, exitCode=0

Refs: #142, #269, PR #296, run #933/API 1189

Canary status: missing - Forgejo checks plus operator merge gate required before merge ## Canary Context Pack ### Product story F2 no-op smokes should fail only on real drift. `np-meerkat-frontend` failed after #296 because `platformctl plan` compared the cataloged repo digest with only the container tag and image ID from `docker inspect`, even though `docker image inspect` proved the running image has the cataloged `RepoDigest`. ### What changed - `platformctl plan` now performs a second read-only observation: `docker image inspect <container-image-id> --format '{{json .RepoDigests}}'`. - The discovered repo digests are added to observed image candidates before image drift comparison. - Tests cover the Meerkat failure shape: image ID differs from repo digest, but repo digest matches the manifest. ### Why it changed This is a false drift signal in an agent-visible deploy lane. Per ADR-0018, carrying false red signals is not acceptable as normal operating state. ### Files touched - `control-plane/platformctl/plan.py` - `control-plane/platformctl/tests/test_plan_phase3.py` ### Runtime evidence - Failed run: API `1189`, UI `#933`, module `np-meerkat-frontend` - Plan artifact: `status=drift`, `exitCode=1`, only change `container.image` - Observed candidates in failed plan: tag `latest` + image ID `sha256:7c6b...` - RS2000 read-only evidence: `docker image inspect sha256:7c6b...` returns `ghcr.io/fbuchner/meerkat-crm-frontend@sha256:32f73297...`, matching `module.yaml` - Local make_plan simulation with live inspect + image RepoDigests returns `status=in-sync`, `exitCode=0` ### Known constraints - This PR does not retry the smoke. Retry happens only after merge. - This does not change apply behavior or mutate production. - If `docker image inspect` fails, the plan keeps existing behavior and simply lacks repo digest candidates. ### Explicit out-of-scope - No Meerkat runtime change. - No manifest digest rewrite to image ID. - No backend/F3 smoke. ### Requested decision Approve merge if checks are green, then rerun `np-meerkat-frontend` smoke once. ### Merge blockers - Plan tests fail. - The new observation command is not read-only. - The Meerkat live-observation simulation does not return in-sync. ## Spec sources read - `control-plane/platformctl/plan.py` - image drift comparison and remote observation - `control-plane/platformctl/tests/test_plan_phase3.py` - plan contract tests - `modules/np-meerkat-frontend/module.yaml` - desired image claim from failed smoke - `compose/apps/compose.yaml` - canonical service imported in #296 ## Verification - `pytest -q control-plane/platformctl/tests/test_plan_phase3.py control-plane/platformctl/tests/test_forgejo_ci_scripts_contract.py tests/test_platform_host_agent_wrapper.py` -> 47 passed - `git diff --check` -> passed - Read-only RS2000 inspect + local `make_plan()` simulation for `np-meerkat-frontend` -> `status=in-sync`, `exitCode=0` Refs: #142, #269, PR #296, run #933/API 1189
fix(platformctl): observe image repo digests during plan
All checks were successful
canary-required / collect-diff (pull_request) Successful in 3s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 3s
platformctl plan / auto-apply scope (pull_request) Successful in 19s
pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 17s
python-ci / Python 3.11 (pull_request) Successful in 35s
python-ci / Python 3.12 (pull_request) Successful in 36s
python-ci / Python 3.13 (pull_request) Successful in 37s
base-is-main / guard (pull_request) Successful in 1s
canary-required / canary (pull_request) Successful in 12s
patchwarden-pr-sanity / sanity (pull_request) Successful in 22s
e5314bba9a
Sign in to join this conversation.
No reviewers
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform!297
No description provided.