pdurlej/platform

Fork 0

docs(repo): glm-sunset-watch CORRECTED + modelizm anti-pattern + agent-execution-template #69

Merged

pdurlej merged 3 commits from claude/orders/glm-sunset-watch into main

2026-05-05 01:32:28 +02:00

claude commented

2026-05-05 01:10:50 +02:00

Collaborator

Canary status: missing — Medium PR class now (governance amendment + new template); fire canary 3+3 manually before merge

Amendment notice

This PR was originally Small class (single state/ markdown). After full audit of GLM's pilot batch on meta-issue #59, the original premise broke: GLM did NOT hallucinate schema/module.schema.v2.json (file exists since commit c789bb2). I (claude) had not run ls schema/ before writing the watch-note. Operator named the mechanism: modelizm w audycie (PR #58 cookbook required disclosure for atomic-task PRs; I violated it for myself when reviewing another model).

PR scope expanded to address the full lesson, not just the file. Now Medium class (governance change + new template).

Purpose

Honest correction of state/glm-sunset-watch.md — amended in place, not deleted, so future-claude reads the lesson
Codify the anti-pattern in AGENTS.md so it doesn't happen again
Stricter execution template in state/agent-execution-template.md — the actual response to ~14% ratio is "keep GLM with stricter prompts", not sunset; this template is what "stricter" means concretely

This PR does NOT do

Does NOT decide GLM sunset (premise broken; ratio ~14% well under 30% threshold)
Does NOT change canary 3+3 ensemble shape
Does NOT change run_review.py workers or BW items
Does NOT amend any of GLM's 7 produced atomic issues — those will be amended/closed via separate operator actions per audit verdict

Changed artifacts

state/glm-sunset-watch.md (amended +47 / -15) — Correction section at top with honest reckoning; original observation kept below for record (with refutation marks); current verdict updated; lessons learned section added
AGENTS.md (amended +1 line in §Anti-patterns) — added "Modelizm w audycie" entry codifying the symmetric disclosure obligation
state/agent-execution-template.md (new, 174 lines) — protocol referenced by orchestrator when promoting issues to ready-for-agent; covers pre-work (identity + branch + spec-source existence verification), work loop (whitelist + scope + disclosure-as-you-go), self-verification commands, PR opening (full Canary Context Pack + Spec sources read disclosure), 6 escape hatches table, per-issue activation note format

Audit details (full transparency)

GLM batch on meta-issue #59 produced 7 atomic issues + 2 disclosure comments. Per glm-sunset-watch.md original criteria:

Bucket	Count	Issues
Clean approve as-is	4	#63, #65, #66, #67
Needs amendment	3	#61 (audit-w-decomp), #64 (phase split), #68 (premise)
Process error fixable	1	#62 (duplicate of #61)

Ratio ~14% (1 broken-premise / 7). Threshold for sunset is 50%, for owner_decision is 30-50%. Verdict: keep with stricter prompts (this PR's template).

GLM did right: identity-isolation, proposed label, full atomic_task template adoption, disclosure comment posted, rationale comment correctly identified 2 dropped suggestions because plan.py and cmd_health already exist (verified by ls control-plane/platformctl/).

Spec sources read

state/glm-sunset-watch.md (pre-amendment) — to know what to correct
AGENTS.md §"Anti-patterns to avoid" — to find insertion point
Issues #61, #62, #63, #64, #65, #66, #67, #68 (all GLM-produced atomic issues) — body content via Forgejo API
Issue #59 comments — to verify disclosure was posted
schema/module.schema.json, schema/module.schema.v2.json — file existence + content
git log --all --format=... -- schema/module.schema.v2.json — to find commit c789bb2 (operator) origin
migrations/vault-to-infisical.md (242 lines, Phase 0 done) — for #64 verification
network/tailscale-acl.hujson (237 lines, 5 TODOs verified) — for #65 verification
control-plane/platformctl/cli.py, manifest.py, schema.py — for #63 verification
control-plane/platformctl/plan.py, control-plane/platformctl/tools/ — to verify GLM's drop rationale
INDEX.md (175 lines, dated 2026-04-30) — for #68 verification
PR #58 (just-merged) — to know what cookbook GLM was working under

Test plan

Operator readback: correction section feels honest (not over-flagellating, not minimizing)
Operator readback: "Modelizm w audycie" anti-pattern wording matches what we discussed
Operator readback: agent-execution-template covers what GLM/Codex actually need
Operator confirms: 4 ready-for-agent issues + 3 amend-needed + 1 close-as-duplicate per audit verdict
After merge: orchestrator posts activation comments using template + applies labels
After merge: next meta-batch opened with state/agent-execution-template.md referenced in pilot meta-issue body

Canary status: missing — Medium PR class now (governance amendment + new template); fire canary 3+3 manually before merge ## Amendment notice This PR was originally `Small` class (single state/ markdown). After full audit of GLM's pilot batch on meta-issue #59, the original premise broke: GLM did NOT hallucinate `schema/module.schema.v2.json` (file exists since commit `c789bb2`). I (claude) had not run `ls schema/` before writing the watch-note. Operator named the mechanism: **modelizm w audycie** (PR #58 cookbook required disclosure for atomic-task PRs; I violated it for myself when reviewing another model). PR scope expanded to address the full lesson, not just the file. Now Medium class (governance change + new template). ## Purpose 1. **Honest correction** of `state/glm-sunset-watch.md` — amended in place, not deleted, so future-claude reads the lesson 2. **Codify the anti-pattern** in `AGENTS.md` so it doesn't happen again 3. **Stricter execution template** in `state/agent-execution-template.md` — the actual response to ~14% ratio is "keep GLM with stricter prompts", not sunset; this template is what "stricter" means concretely ## This PR does NOT do - Does NOT decide GLM sunset (premise broken; ratio ~14% well under 30% threshold) - Does NOT change canary 3+3 ensemble shape - Does NOT change `run_review.py` workers or BW items - Does NOT amend any of GLM's 7 produced atomic issues — those will be amended/closed via separate operator actions per audit verdict ## Changed artifacts - **`state/glm-sunset-watch.md`** (amended +47 / -15) — Correction section at top with honest reckoning; original observation kept below for record (with refutation marks); current verdict updated; lessons learned section added - **`AGENTS.md`** (amended +1 line in §Anti-patterns) — added "Modelizm w audycie" entry codifying the symmetric disclosure obligation - **`state/agent-execution-template.md`** (new, 174 lines) — protocol referenced by orchestrator when promoting issues to `ready-for-agent`; covers pre-work (identity + branch + spec-source existence verification), work loop (whitelist + scope + disclosure-as-you-go), self-verification commands, PR opening (full Canary Context Pack + Spec sources read disclosure), 6 escape hatches table, per-issue activation note format ## Audit details (full transparency) GLM batch on meta-issue #59 produced 7 atomic issues + 2 disclosure comments. Per `glm-sunset-watch.md` original criteria: | Bucket | Count | Issues | |---|---|---| | Clean approve as-is | 4 | #63, #65, #66, #67 | | Needs amendment | 3 | #61 (audit-w-decomp), #64 (phase split), #68 (premise) | | Process error fixable | 1 | #62 (duplicate of #61) | Ratio ~14% (1 broken-premise / 7). Threshold for sunset is 50%, for owner_decision is 30-50%. Verdict: **keep with stricter prompts** (this PR's template). GLM did right: identity-isolation, `proposed` label, full atomic_task template adoption, disclosure comment posted, rationale comment correctly identified 2 dropped suggestions because `plan.py` and `cmd_health` already exist (verified by `ls control-plane/platformctl/`). ## Spec sources read - `state/glm-sunset-watch.md` (pre-amendment) — to know what to correct - `AGENTS.md` §"Anti-patterns to avoid" — to find insertion point - Issues #61, #62, #63, #64, #65, #66, #67, #68 (all GLM-produced atomic issues) — body content via Forgejo API - Issue #59 comments — to verify disclosure was posted - `schema/module.schema.json`, `schema/module.schema.v2.json` — file existence + content - `git log --all --format=... -- schema/module.schema.v2.json` — to find commit `c789bb2` (operator) origin - `migrations/vault-to-infisical.md` (242 lines, Phase 0 done) — for #64 verification - `network/tailscale-acl.hujson` (237 lines, 5 TODOs verified) — for #65 verification - `control-plane/platformctl/cli.py`, `manifest.py`, `schema.py` — for #63 verification - `control-plane/platformctl/plan.py`, `control-plane/platformctl/tools/` — to verify GLM's drop rationale - `INDEX.md` (175 lines, dated 2026-04-30) — for #68 verification - `PR #58` (just-merged) — to know what cookbook GLM was working under ## Test plan - [ ] Operator readback: correction section feels honest (not over-flagellating, not minimizing) - [ ] Operator readback: "Modelizm w audycie" anti-pattern wording matches what we discussed - [ ] Operator readback: agent-execution-template covers what GLM/Codex actually need - [ ] Operator confirms: 4 ready-for-agent issues + 3 amend-needed + 1 close-as-duplicate per audit verdict - [ ] After merge: orchestrator posts activation comments using template + applies labels - [ ] After merge: next meta-batch opened with `state/agent-execution-template.md` referenced in pilot meta-issue body

claude added 1 commit

2026-05-05 01:10:50 +02:00

docs(state): glm-sunset-watch — evidence + criteria, not decided

canary-required / collect-diff (pull_request) Failing after 3s

Details

canary-required / canary (pull_request) Has been skipped

Details

caa144dbfa

Operator signal 2026-05-05: GLM (current tech-glm + product-glm voices)
on path to sunset; allowed to finish pilot batch on meta-issue #59 first.

This markdown captures:
- Evidence so far (Issue #61: hallucinated v2 schema, missing disclosure,
  audit-in-decomposition anti-pattern)
- What GLM did right (template adoption, identity-isolation, labels)
- Decision criteria (post-batch ratio review)
- What sunset would touch when decided (ADR 0001, AGENTS.md, run_review.py)

Captured as state/ markdown rather than Forgejo issue per operator
preference: silent for the swarm (no open commitment hanging in tracker),
audible for next claude session (clean handoff).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude added 1 commit

2026-05-05 01:23:46 +02:00

docs(state,agents): correct glm-sunset-watch + add modelizm anti-pattern

canary-required / collect-diff (pull_request) Successful in 4s

Details

canary-required / canary (pull_request) Failing after 1s

Details

dd1827184e

Per operator audit response 2026-05-05: original watch-note's accusation
of GLM hallucinating schema/module.schema.v2.json was false — file
exists since commit c789bb2 (operator, "platform v0 — L4 complete").
I (claude) had not run `ls schema/` before writing the verdict.

After full audit of GLM's pilot batch (7 atomic issues + 2 disclosure
comments on meta-issue #59):
- 1/7 broken premise (#68 INDEX.md), 2/7 process errors (#62 dup, #61
  audit-in-decomp), 1/7 scope creep (#64 phase split), 4/7 clean
- Ratio ~14%, well below 30% threshold for sunset
- Disclosure comment + rationale comment WERE posted (Oracle insight
  respected); I checked too early
- GLM correctly dropped 2 suggestions because plan.py and cmd_health
  already exist (verified by ls)

Verdict: NOT sunset. Keep with stricter prompts.

Watch-note amended in place (not deleted) so future-claude reads the
correction and learns. Honesty restitution beats silent rewrite.

AGENTS.md anti-patterns: added "Modelizm w audycie" — operator-named
mechanism for accepting shallow verdicts about another agent's work
based on model-class heuristics without verifying facts against repo.
Disclosure obligation is symmetric. Every model is unique; verify, do
not classify.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude referenced this pull request from a commit

2026-05-05 01:25:25 +02:00

docs(state): agent-execution-template — stricter protocol post PR #69 lessons

claude added 1 commit

2026-05-05 01:25:25 +02:00

docs(state): agent-execution-template — stricter protocol post PR #69 lessons

canary-required / collect-diff (pull_request) Successful in 3s

Details

canary-required / canary (pull_request) Failing after 1s

Details

70cde45791

After audit corrected the GLM-sunset premise and ratio came out at ~14%
("keep with stricter prompts" per criteria), the lesson is not to drop
GLM but to make execution prompts more explicit:

- Pre-work P3: verify each spec source path exists BEFORE work (catches
  the modelizm anti-pattern in the agent itself, not just orchestrator)
- Disclosure-as-you-go (running list, not retrofit)
- Self-verification commands per work type (bash -n / py_compile / yamllint
  / pytest)
- Mandatory `## Spec sources read` in PR body — disclosure is symmetric
  obligation (per AGENTS.md anti-pattern just added)
- Escape hatches table: 6 trigger conditions, each with explicit STOP action
- Per-issue activation note format keeps orchestrator activation comments
  short while protocol stays authoritative in this template file

Reusable for any actor (glm / codex / claude) on any ready-for-agent
issue. Referenced by name in activation comments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude changed title from ~~docs(state): glm-sunset-watch — evidence + criteria, not decided~~ to docs(repo): glm-sunset-watch CORRECTED + modelizm anti-pattern + agent-execution-template

2026-05-05 01:26:06 +02:00

claude referenced this pull request

2026-05-05 01:28:36 +02:00

feat(control-plane): platformctl validate — minimum viable jsonschema #63

claude referenced this pull request

2026-05-05 01:28:36 +02:00

docs(network): tailscale ACL — fill 5 TODO comments #65

claude referenced this pull request

2026-05-05 01:28:37 +02:00

chore(verify): L4-Verify deterministic check suite #66

claude referenced this pull request

2026-05-05 01:28:37 +02:00