ops(compose): np-meerkat-frontend present in module.yaml but missing from canonical compose tree #269

Closed
opened 2026-05-13 23:20:48 +02:00 by codex · 2 comments
Collaborator

Context

During F2 smoke on 2026-05-13, np-meerkat-frontend was the only failed/skipped candidate after the successful Excalidraw and safe-session smokes.

The failure is a data/source-of-truth gap, not a production runtime failure:

  • F2 run API #834 / UI #697 was RED.
  • Runner-context platformctl health --module np-meerkat-frontend --json reported manifest/container/smoke OK after the run.
  • Production container remained running.
  • The module is present in module.yaml, but the service is absent from the canonical compose tree used by trusted auto-apply.

Reference: issue #142 comment #5138.

Evidence

Command evidence from Codex W2 verification:

$ grep -l "compose_service:.*np-meerkat-frontend" modules/np-meerkat-frontend/module.yaml
modules/np-meerkat-frontend/module.yaml

$ grep -l "np-meerkat-frontend" compose/apps/compose.yaml compose/core/compose.yaml compose/edge/compose.yaml compose/base/compose.yaml 2>/dev/null
# no output

$ ssh rs2000 'docker ps --filter "name=np-meerkat-frontend" --format "{{.Names}} {{.Status}}"'
home-platform-np-meerkat-frontend-1 Up 13 days

Resolution options

  1. Skip from F2 until canonical compose import
    Current state. Safe for F2/F3 progress, but leaves Meerkat outside the trusted auto-apply path.

  2. Import canonical compose for Meerkat into compose/apps/compose.yaml
    Requires investigation of the legacy compose service, env contract, routing labels, and dependencies. This is likely the real fix if Meerkat should stay managed by pdurlej/platform.

  3. Add auto-apply blocker for missing-from-canonical modules
    Defense-in-depth: auto_apply_scope.py should reject modules whose spec.runtime.compose_service is not present in the referenced canonical compose file. Pan Herbatka leans this as a hardening follow-up because it prevents similar false eligibility.

Operator decision required

Choose the resolution path before retrying np-meerkat-frontend through F2.

This is not blocking the other F2/F3 work. It blocks only Meerkat frontend auto-apply readiness.

References

  • Cutover coordination: #142
  • Handover evidence: #142 comment #5138
  • Recovery plan: PR #250
## Context During F2 smoke on 2026-05-13, `np-meerkat-frontend` was the only failed/skipped candidate after the successful Excalidraw and safe-session smokes. The failure is a data/source-of-truth gap, not a production runtime failure: - F2 run API #834 / UI #697 was RED. - Runner-context `platformctl health --module np-meerkat-frontend --json` reported manifest/container/smoke OK after the run. - Production container remained running. - The module is present in `module.yaml`, but the service is absent from the canonical compose tree used by trusted auto-apply. Reference: issue #142 comment #5138. ## Evidence Command evidence from Codex W2 verification: ```text $ grep -l "compose_service:.*np-meerkat-frontend" modules/np-meerkat-frontend/module.yaml modules/np-meerkat-frontend/module.yaml $ grep -l "np-meerkat-frontend" compose/apps/compose.yaml compose/core/compose.yaml compose/edge/compose.yaml compose/base/compose.yaml 2>/dev/null # no output $ ssh rs2000 'docker ps --filter "name=np-meerkat-frontend" --format "{{.Names}} {{.Status}}"' home-platform-np-meerkat-frontend-1 Up 13 days ``` ## Resolution options 1. **Skip from F2 until canonical compose import** Current state. Safe for F2/F3 progress, but leaves Meerkat outside the trusted auto-apply path. 2. **Import canonical compose for Meerkat into `compose/apps/compose.yaml`** Requires investigation of the legacy compose service, env contract, routing labels, and dependencies. This is likely the real fix if Meerkat should stay managed by `pdurlej/platform`. 3. **Add auto-apply blocker for missing-from-canonical modules** Defense-in-depth: `auto_apply_scope.py` should reject modules whose `spec.runtime.compose_service` is not present in the referenced canonical compose file. Pan Herbatka leans this as a hardening follow-up because it prevents similar false eligibility. ## Operator decision required Choose the resolution path before retrying `np-meerkat-frontend` through F2. This is **not blocking** the other F2/F3 work. It blocks only Meerkat frontend auto-apply readiness. ## References - Cutover coordination: #142 - Handover evidence: #142 comment #5138 - Recovery plan: PR #250
Author
Collaborator

Codex #269 follow-up — guard PR opened — 2026-05-16 09:36 CEST

Role: executor
Status: defense-in-depth PR ready

What I found

  • Current legacy compose no longer contains np-meerkat-frontend / np-meerkat-backend; the live containers are compose orphans from an older compose state.
  • A 2026-04-06 backup compose contains Meerkat services, but importing from that backup now would be a runtime guess.
  • The same class exists for matrix-well-known: recent no-op smoke was health evidence, not full canonical-compose readiness.

Action taken

Opened PR #295: fix(platformctl): block missing compose services in auto-apply.

The guard blocks auto-apply eligibility when spec.runtime.compose_service is absent from the referenced canonical compose file/include tree.

Evidence

  • mirotalk-sfu remains eligible.
  • np-meerkat-frontend is blocked: compose_service is not defined in compose/apps/compose.yaml.
  • matrix-well-known is also blocked by the same guard, which is intentional until that service is imported or the manifest is corrected.

Next

After #295 merges, the safe next Medium batch is canonical compose import for missing services, not another smoke attempt.

## Codex #269 follow-up — guard PR opened — 2026-05-16 09:36 CEST **Role:** executor **Status:** defense-in-depth PR ready ### What I found - Current legacy compose no longer contains `np-meerkat-frontend` / `np-meerkat-backend`; the live containers are compose orphans from an older compose state. - A 2026-04-06 backup compose contains Meerkat services, but importing from that backup now would be a runtime guess. - The same class exists for `matrix-well-known`: recent no-op smoke was health evidence, not full canonical-compose readiness. ### Action taken Opened PR #295: `fix(platformctl): block missing compose services in auto-apply`. The guard blocks auto-apply eligibility when `spec.runtime.compose_service` is absent from the referenced canonical compose file/include tree. ### Evidence - `mirotalk-sfu` remains eligible. - `np-meerkat-frontend` is blocked: `compose_service is not defined in compose/apps/compose.yaml`. - `matrix-well-known` is also blocked by the same guard, which is intentional until that service is imported or the manifest is corrected. ### Next After #295 merges, the safe next Medium batch is canonical compose import for missing services, not another smoke attempt.
Author
Collaborator

Codex #269 batch PR ready — 2026-05-16 09:50 CEST

Role: executor
Status: PR #296 ready; checks green

What I did

  • Imported canonical compose definitions for matrix-well-known, np-meerkat-backend, and np-meerkat-frontend.
  • Added an auto-apply v1 statefulness guard: only statefulness: stateless modules can pass. np-meerkat-backend is now present in compose but still blocked from F2 because it is stateful.
  • Added compose/** to release-root promotion so canonical compose changes reach /opt/pdurlej-platform/current before future apply commands rely on them.

Evidence

  • PR: #296
  • Forgejo checks: green, 12/12 successful
  • Local tests: 40 passed
  • Workflow lint: 0 findings
  • Read-only RS2000 compose config: passed with /opt/pdurlej-platform/runtime/compose.env
  • Scope checks:
    • matrix-well-known -> eligible
    • np-meerkat-frontend -> eligible
    • np-meerkat-backend -> blocked by statefulness guard, rc=4

Next after merge

  • Release-root promotion should fire because this PR changes .forgejo/workflows/release-root-promote.yml and includes compose/** going forward.
  • Then I will verify release-root SHA and smoke only stateless targets: matrix-well-known and np-meerkat-frontend one at a time.
  • No backend/F3 smoke in this batch.
## Codex #269 batch PR ready — 2026-05-16 09:50 CEST **Role:** executor **Status:** PR #296 ready; checks green ### What I did - Imported canonical compose definitions for `matrix-well-known`, `np-meerkat-backend`, and `np-meerkat-frontend`. - Added an auto-apply v1 statefulness guard: only `statefulness: stateless` modules can pass. `np-meerkat-backend` is now present in compose but still blocked from F2 because it is stateful. - Added `compose/**` to release-root promotion so canonical compose changes reach `/opt/pdurlej-platform/current` before future apply commands rely on them. ### Evidence - PR: https://git.pdurlej.com/pdurlej/platform/pulls/296 - Forgejo checks: green, 12/12 successful - Local tests: `40 passed` - Workflow lint: `0 findings` - Read-only RS2000 compose config: passed with `/opt/pdurlej-platform/runtime/compose.env` - Scope checks: - `matrix-well-known` -> eligible - `np-meerkat-frontend` -> eligible - `np-meerkat-backend` -> blocked by statefulness guard, rc=4 ### Next after merge - Release-root promotion should fire because this PR changes `.forgejo/workflows/release-root-promote.yml` and includes `compose/**` going forward. - Then I will verify release-root SHA and smoke only stateless targets: `matrix-well-known` and `np-meerkat-frontend` one at a time. - No backend/F3 smoke in this batch.
Sign in to join this conversation.
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform#269
No description provided.