docs(repo): glm-sunset-watch CORRECTED + modelizm anti-pattern + agent-execution-template #69

Merged
pdurlej merged 3 commits from claude/orders/glm-sunset-watch into main 2026-05-05 01:32:28 +02:00
Collaborator

Canary status: missing — Medium PR class now (governance amendment + new template); fire canary 3+3 manually before merge

Amendment notice

This PR was originally Small class (single state/ markdown). After full audit of GLM's pilot batch on meta-issue #59, the original premise broke: GLM did NOT hallucinate schema/module.schema.v2.json (file exists since commit c789bb2). I (claude) had not run ls schema/ before writing the watch-note. Operator named the mechanism: modelizm w audycie (PR #58 cookbook required disclosure for atomic-task PRs; I violated it for myself when reviewing another model).

PR scope expanded to address the full lesson, not just the file. Now Medium class (governance change + new template).

Purpose

  1. Honest correction of state/glm-sunset-watch.md — amended in place, not deleted, so future-claude reads the lesson
  2. Codify the anti-pattern in AGENTS.md so it doesn't happen again
  3. Stricter execution template in state/agent-execution-template.md — the actual response to ~14% ratio is "keep GLM with stricter prompts", not sunset; this template is what "stricter" means concretely

This PR does NOT do

  • Does NOT decide GLM sunset (premise broken; ratio ~14% well under 30% threshold)
  • Does NOT change canary 3+3 ensemble shape
  • Does NOT change run_review.py workers or BW items
  • Does NOT amend any of GLM's 7 produced atomic issues — those will be amended/closed via separate operator actions per audit verdict

Changed artifacts

  • state/glm-sunset-watch.md (amended +47 / -15) — Correction section at top with honest reckoning; original observation kept below for record (with refutation marks); current verdict updated; lessons learned section added
  • AGENTS.md (amended +1 line in §Anti-patterns) — added "Modelizm w audycie" entry codifying the symmetric disclosure obligation
  • state/agent-execution-template.md (new, 174 lines) — protocol referenced by orchestrator when promoting issues to ready-for-agent; covers pre-work (identity + branch + spec-source existence verification), work loop (whitelist + scope + disclosure-as-you-go), self-verification commands, PR opening (full Canary Context Pack + Spec sources read disclosure), 6 escape hatches table, per-issue activation note format

Audit details (full transparency)

GLM batch on meta-issue #59 produced 7 atomic issues + 2 disclosure comments. Per glm-sunset-watch.md original criteria:

Bucket Count Issues
Clean approve as-is 4 #63, #65, #66, #67
Needs amendment 3 #61 (audit-w-decomp), #64 (phase split), #68 (premise)
Process error fixable 1 #62 (duplicate of #61)

Ratio ~14% (1 broken-premise / 7). Threshold for sunset is 50%, for owner_decision is 30-50%. Verdict: keep with stricter prompts (this PR's template).

GLM did right: identity-isolation, proposed label, full atomic_task template adoption, disclosure comment posted, rationale comment correctly identified 2 dropped suggestions because plan.py and cmd_health already exist (verified by ls control-plane/platformctl/).

Spec sources read

  • state/glm-sunset-watch.md (pre-amendment) — to know what to correct
  • AGENTS.md §"Anti-patterns to avoid" — to find insertion point
  • Issues #61, #62, #63, #64, #65, #66, #67, #68 (all GLM-produced atomic issues) — body content via Forgejo API
  • Issue #59 comments — to verify disclosure was posted
  • schema/module.schema.json, schema/module.schema.v2.json — file existence + content
  • git log --all --format=... -- schema/module.schema.v2.json — to find commit c789bb2 (operator) origin
  • migrations/vault-to-infisical.md (242 lines, Phase 0 done) — for #64 verification
  • network/tailscale-acl.hujson (237 lines, 5 TODOs verified) — for #65 verification
  • control-plane/platformctl/cli.py, manifest.py, schema.py — for #63 verification
  • control-plane/platformctl/plan.py, control-plane/platformctl/tools/ — to verify GLM's drop rationale
  • INDEX.md (175 lines, dated 2026-04-30) — for #68 verification
  • PR #58 (just-merged) — to know what cookbook GLM was working under

Test plan

  • Operator readback: correction section feels honest (not over-flagellating, not minimizing)
  • Operator readback: "Modelizm w audycie" anti-pattern wording matches what we discussed
  • Operator readback: agent-execution-template covers what GLM/Codex actually need
  • Operator confirms: 4 ready-for-agent issues + 3 amend-needed + 1 close-as-duplicate per audit verdict
  • After merge: orchestrator posts activation comments using template + applies labels
  • After merge: next meta-batch opened with state/agent-execution-template.md referenced in pilot meta-issue body
Canary status: missing — Medium PR class now (governance amendment + new template); fire canary 3+3 manually before merge ## Amendment notice This PR was originally `Small` class (single state/ markdown). After full audit of GLM's pilot batch on meta-issue #59, the original premise broke: GLM did NOT hallucinate `schema/module.schema.v2.json` (file exists since commit `c789bb2`). I (claude) had not run `ls schema/` before writing the watch-note. Operator named the mechanism: **modelizm w audycie** (PR #58 cookbook required disclosure for atomic-task PRs; I violated it for myself when reviewing another model). PR scope expanded to address the full lesson, not just the file. Now Medium class (governance change + new template). ## Purpose 1. **Honest correction** of `state/glm-sunset-watch.md` — amended in place, not deleted, so future-claude reads the lesson 2. **Codify the anti-pattern** in `AGENTS.md` so it doesn't happen again 3. **Stricter execution template** in `state/agent-execution-template.md` — the actual response to ~14% ratio is "keep GLM with stricter prompts", not sunset; this template is what "stricter" means concretely ## This PR does NOT do - Does NOT decide GLM sunset (premise broken; ratio ~14% well under 30% threshold) - Does NOT change canary 3+3 ensemble shape - Does NOT change `run_review.py` workers or BW items - Does NOT amend any of GLM's 7 produced atomic issues — those will be amended/closed via separate operator actions per audit verdict ## Changed artifacts - **`state/glm-sunset-watch.md`** (amended +47 / -15) — Correction section at top with honest reckoning; original observation kept below for record (with refutation marks); current verdict updated; lessons learned section added - **`AGENTS.md`** (amended +1 line in §Anti-patterns) — added "Modelizm w audycie" entry codifying the symmetric disclosure obligation - **`state/agent-execution-template.md`** (new, 174 lines) — protocol referenced by orchestrator when promoting issues to `ready-for-agent`; covers pre-work (identity + branch + spec-source existence verification), work loop (whitelist + scope + disclosure-as-you-go), self-verification commands, PR opening (full Canary Context Pack + Spec sources read disclosure), 6 escape hatches table, per-issue activation note format ## Audit details (full transparency) GLM batch on meta-issue #59 produced 7 atomic issues + 2 disclosure comments. Per `glm-sunset-watch.md` original criteria: | Bucket | Count | Issues | |---|---|---| | Clean approve as-is | 4 | #63, #65, #66, #67 | | Needs amendment | 3 | #61 (audit-w-decomp), #64 (phase split), #68 (premise) | | Process error fixable | 1 | #62 (duplicate of #61) | Ratio ~14% (1 broken-premise / 7). Threshold for sunset is 50%, for owner_decision is 30-50%. Verdict: **keep with stricter prompts** (this PR's template). GLM did right: identity-isolation, `proposed` label, full atomic_task template adoption, disclosure comment posted, rationale comment correctly identified 2 dropped suggestions because `plan.py` and `cmd_health` already exist (verified by `ls control-plane/platformctl/`). ## Spec sources read - `state/glm-sunset-watch.md` (pre-amendment) — to know what to correct - `AGENTS.md` §"Anti-patterns to avoid" — to find insertion point - Issues #61, #62, #63, #64, #65, #66, #67, #68 (all GLM-produced atomic issues) — body content via Forgejo API - Issue #59 comments — to verify disclosure was posted - `schema/module.schema.json`, `schema/module.schema.v2.json` — file existence + content - `git log --all --format=... -- schema/module.schema.v2.json` — to find commit `c789bb2` (operator) origin - `migrations/vault-to-infisical.md` (242 lines, Phase 0 done) — for #64 verification - `network/tailscale-acl.hujson` (237 lines, 5 TODOs verified) — for #65 verification - `control-plane/platformctl/cli.py`, `manifest.py`, `schema.py` — for #63 verification - `control-plane/platformctl/plan.py`, `control-plane/platformctl/tools/` — to verify GLM's drop rationale - `INDEX.md` (175 lines, dated 2026-04-30) — for #68 verification - `PR #58` (just-merged) — to know what cookbook GLM was working under ## Test plan - [ ] Operator readback: correction section feels honest (not over-flagellating, not minimizing) - [ ] Operator readback: "Modelizm w audycie" anti-pattern wording matches what we discussed - [ ] Operator readback: agent-execution-template covers what GLM/Codex actually need - [ ] Operator confirms: 4 ready-for-agent issues + 3 amend-needed + 1 close-as-duplicate per audit verdict - [ ] After merge: orchestrator posts activation comments using template + applies labels - [ ] After merge: next meta-batch opened with `state/agent-execution-template.md` referenced in pilot meta-issue body
docs(state): glm-sunset-watch — evidence + criteria, not decided
Some checks failed
canary-required / collect-diff (pull_request) Failing after 3s
canary-required / canary (pull_request) Has been skipped
caa144dbfa
Operator signal 2026-05-05: GLM (current tech-glm + product-glm voices)
on path to sunset; allowed to finish pilot batch on meta-issue #59 first.

This markdown captures:
- Evidence so far (Issue #61: hallucinated v2 schema, missing disclosure,
  audit-in-decomposition anti-pattern)
- What GLM did right (template adoption, identity-isolation, labels)
- Decision criteria (post-batch ratio review)
- What sunset would touch when decided (ADR 0001, AGENTS.md, run_review.py)

Captured as state/ markdown rather than Forgejo issue per operator
preference: silent for the swarm (no open commitment hanging in tracker),
audible for next claude session (clean handoff).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
docs(state,agents): correct glm-sunset-watch + add modelizm anti-pattern
Some checks failed
canary-required / collect-diff (pull_request) Successful in 4s
canary-required / canary (pull_request) Failing after 1s
dd1827184e
Per operator audit response 2026-05-05: original watch-note's accusation
of GLM hallucinating schema/module.schema.v2.json was false — file
exists since commit c789bb2 (operator, "platform v0 — L4 complete").
I (claude) had not run `ls schema/` before writing the verdict.

After full audit of GLM's pilot batch (7 atomic issues + 2 disclosure
comments on meta-issue #59):
- 1/7 broken premise (#68 INDEX.md), 2/7 process errors (#62 dup, #61
  audit-in-decomp), 1/7 scope creep (#64 phase split), 4/7 clean
- Ratio ~14%, well below 30% threshold for sunset
- Disclosure comment + rationale comment WERE posted (Oracle insight
  respected); I checked too early
- GLM correctly dropped 2 suggestions because plan.py and cmd_health
  already exist (verified by ls)

Verdict: NOT sunset. Keep with stricter prompts.

Watch-note amended in place (not deleted) so future-claude reads the
correction and learns. Honesty restitution beats silent rewrite.

AGENTS.md anti-patterns: added "Modelizm w audycie" — operator-named
mechanism for accepting shallow verdicts about another agent's work
based on model-class heuristics without verifying facts against repo.
Disclosure obligation is symmetric. Every model is unique; verify, do
not classify.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
docs(state): agent-execution-template — stricter protocol post PR #69 lessons
Some checks failed
canary-required / collect-diff (pull_request) Successful in 3s
canary-required / canary (pull_request) Failing after 1s
70cde45791
After audit corrected the GLM-sunset premise and ratio came out at ~14%
("keep with stricter prompts" per criteria), the lesson is not to drop
GLM but to make execution prompts more explicit:

- Pre-work P3: verify each spec source path exists BEFORE work (catches
  the modelizm anti-pattern in the agent itself, not just orchestrator)
- Disclosure-as-you-go (running list, not retrofit)
- Self-verification commands per work type (bash -n / py_compile / yamllint
  / pytest)
- Mandatory `## Spec sources read` in PR body — disclosure is symmetric
  obligation (per AGENTS.md anti-pattern just added)
- Escape hatches table: 6 trigger conditions, each with explicit STOP action
- Per-issue activation note format keeps orchestrator activation comments
  short while protocol stays authoritative in this template file

Reusable for any actor (glm / codex / claude) on any ready-for-agent
issue. Referenced by name in activation comments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
claude changed title from docs(state): glm-sunset-watch — evidence + criteria, not decided to docs(repo): glm-sunset-watch CORRECTED + modelizm anti-pattern + agent-execution-template 2026-05-05 01:26:06 +02:00
Sign in to join this conversation.
No reviewers
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform!69
No description provided.