docs(decisions): ADR 0002 — CI enforcement of canary 3+3 + hard iteration cap #44

Merged
pdurlej merged 2 commits from claude/decisions/adr-0002-ci-enforcement into main 2026-05-04 00:54:42 +02:00
Collaborator

Canary status: missing — fire canary 3+3 before merge

Closes ADR 0001 enforcement open_loop 3 weeks early (deadline target was W21 2026 / ~2026-05-24). Lands today (2026-05-03) in same cycle as ADR 0001 amendments + STATE_OF_PLATFORM v2 + AGENTS.md per operator's 5-step plan.

Canary Context Pack

Product story

ADR 0001 made canary 3+3 mandatory; enforcement is currently convention-only (Rule 1a inline as interim). Operator: non-technical owner needs CI-level enforcement so the rule doesn't get silently re-forgotten under productivity pressure (the exact failure mode ADR 0001 was bounding). ADR 0002 adds Forgejo Actions workflow that mechanically blocks merge without approve_merge, plus formalizes hard 3-iter cap with 6 named terminal actions.

What changed

  • New ADR decisions/0002-ci-enforcement-canary.md (150 lines, Nygard format, 4 rules, consequences, compliance, rollback, open_loops captured)
  • New workflow .forgejo/workflows/canary-required.yml (154 lines): runs on PR opened/synchronize/reopened for relevant paths, generates diff + files, runs platformctl.tools.run_review, checks decision_packet.md, sets check status

Why it changed

ADR 0001 v1 canary HIGH risk: 'Mandatory rule unenforced at operator decision surface' (product-gpt). Inline mitigation Rule 1a was operator-amended fix in v2. Full mitigation is CI. Operator's 5-step plan today brought it forward.

Files touched

  • decisions/0002-ci-enforcement-canary.md (new, 150 lines)
  • .forgejo/workflows/canary-required.yml (new, 154 lines)

Relevant context

  • ADR 0001 (PR #41) — Rules 1+1a+2+3+4+5 establish canary 3+3 + PM cadence; this ADR enforces them mechanically
  • AGENTS.md (PR #43) — Glossary section codifies terminal action names that this ADR Rule 2 uses
  • STATE_OF_PLATFORM v2 (PR #42) — §9 Process spec 'Canary iteration cap' references this ADR for full decision tree
  • platformctl.tools.run_review — orchestrator script that workflow invokes
  • Forgejo Actions docs (Forgejo 1.21+) — workflow syntax compatible

Runtime evidence

N/A — workflow added but does not activate until: (a) ZAI_API_KEY repo secret set, (b) CANARY_FORGEJO_TOKEN repo secret set, (c) branch protection on main configured to require canary-required check, (d) forgejo-runner verified capable. All four are TASKs in ADR §Open loops.

Known constraints

  • forgejo-runner module not yet in Phase 02 v2 (planned wave 5); workflow runner capacity not yet audited
  • decisions/overrides.log file does NOT yet exist (created on first override)
  • Runner load: every relevant PR fires 6 LLM calls + consolidator; cost grows with PR cadence
  • Workflow re-runs on force-push (acceptable; matches ADR 0001 cap semantics)

Explicit out-of-scope

  • forgejo-runner Phase 02 v2 manifest (wave 5)
  • Forgejo Issues setup / OPEN_LOOPS migration (Step 5 of today's plan)
  • pyfallow integration as additional reviewer (future ADR / pyfallow hook)
  • ADR 0003 candidate (Codex-for-Codex orchestration formalization)

Requested decision

approve_merge after canary 3+3 passes. PR size: Large (ADR + CI + workflow rule). Hard iter cap: 3.

Merge blockers

  • Canary 3+3 not yet fired
  • If reviewer finds workflow YAML syntax issue (e.g., Forgejo Actions context vars wrong) — iterate (within cap)
  • If reviewer finds ADR text gap — iterate (within cap)

Test plan

  • Canary 3+3 fires on this PR (per Rule 1 self-test); expected to surface findings on workflow YAML edge cases (iteration counter, retry semantics, force-push interaction)
  • After merge: operator sets ZAI_API_KEY + CANARY_FORGEJO_TOKEN repo secrets in Forgejo UI
  • After merge: operator sets branch protection on main requiring canary-required check
  • First test PR after operator setup: small intentional PR (e.g., AGENTS.md typo fix) to confirm workflow runs end-to-end
  • If runner capacity insufficient, flag in OPEN_LOOPS
Canary status: missing — fire canary 3+3 before merge Closes ADR 0001 enforcement open_loop **3 weeks early** (deadline target was W21 2026 / ~2026-05-24). Lands today (2026-05-03) in same cycle as ADR 0001 amendments + STATE_OF_PLATFORM v2 + AGENTS.md per operator's 5-step plan. ## Canary Context Pack ### Product story ADR 0001 made canary 3+3 mandatory; enforcement is currently convention-only (Rule 1a inline as interim). Operator: non-technical owner needs CI-level enforcement so the rule doesn't get silently re-forgotten under productivity pressure (the exact failure mode ADR 0001 was bounding). ADR 0002 adds Forgejo Actions workflow that mechanically blocks merge without `approve_merge`, plus formalizes hard 3-iter cap with 6 named terminal actions. ### What changed - New ADR `decisions/0002-ci-enforcement-canary.md` (150 lines, Nygard format, 4 rules, consequences, compliance, rollback, open_loops captured) - New workflow `.forgejo/workflows/canary-required.yml` (154 lines): runs on PR opened/synchronize/reopened for relevant paths, generates diff + files, runs platformctl.tools.run_review, checks decision_packet.md, sets check status ### Why it changed ADR 0001 v1 canary HIGH risk: 'Mandatory rule unenforced at operator decision surface' (product-gpt). Inline mitigation Rule 1a was operator-amended fix in v2. Full mitigation is CI. Operator's 5-step plan today brought it forward. ### Files touched - `decisions/0002-ci-enforcement-canary.md` (new, 150 lines) - `.forgejo/workflows/canary-required.yml` (new, 154 lines) ### Relevant context - ADR 0001 (PR #41) — Rules 1+1a+2+3+4+5 establish canary 3+3 + PM cadence; this ADR enforces them mechanically - AGENTS.md (PR #43) — Glossary section codifies terminal action names that this ADR Rule 2 uses - STATE_OF_PLATFORM v2 (PR #42) — §9 Process spec 'Canary iteration cap' references this ADR for full decision tree - `platformctl.tools.run_review` — orchestrator script that workflow invokes - Forgejo Actions docs (Forgejo 1.21+) — workflow syntax compatible ### Runtime evidence N/A — workflow added but does not activate until: (a) ZAI_API_KEY repo secret set, (b) CANARY_FORGEJO_TOKEN repo secret set, (c) branch protection on main configured to require canary-required check, (d) forgejo-runner verified capable. All four are TASKs in ADR §Open loops. ### Known constraints - forgejo-runner module not yet in Phase 02 v2 (planned wave 5); workflow runner capacity not yet audited - decisions/overrides.log file does NOT yet exist (created on first override) - Runner load: every relevant PR fires 6 LLM calls + consolidator; cost grows with PR cadence - Workflow re-runs on force-push (acceptable; matches ADR 0001 cap semantics) ### Explicit out-of-scope - forgejo-runner Phase 02 v2 manifest (wave 5) - Forgejo Issues setup / OPEN_LOOPS migration (Step 5 of today's plan) - pyfallow integration as additional reviewer (future ADR / pyfallow hook) - ADR 0003 candidate (Codex-for-Codex orchestration formalization) ### Requested decision approve_merge after canary 3+3 passes. PR size: Large (ADR + CI + workflow rule). Hard iter cap: 3. ### Merge blockers - Canary 3+3 not yet fired - If reviewer finds workflow YAML syntax issue (e.g., Forgejo Actions context vars wrong) — iterate (within cap) - If reviewer finds ADR text gap — iterate (within cap) ## Test plan - [ ] Canary 3+3 fires on this PR (per Rule 1 self-test); expected to surface findings on workflow YAML edge cases (iteration counter, retry semantics, force-push interaction) - [ ] After merge: operator sets ZAI_API_KEY + CANARY_FORGEJO_TOKEN repo secrets in Forgejo UI - [ ] After merge: operator sets branch protection on main requiring `canary-required` check - [ ] First test PR after operator setup: small intentional PR (e.g., AGENTS.md typo fix) to confirm workflow runs end-to-end - [ ] If runner capacity insufficient, flag in OPEN_LOOPS
docs(decisions): ADR 0002 — CI enforcement of canary 3+3 + hard iteration cap
Some checks failed
canary-required / canary (pull_request) Failing after 4s
7c84037d80
Closes ADR 0001 enforcement open_loop 3 weeks early (deadline target was W21
2026 / ~2026-05-24). Lands today (2026-05-03) in same cycle as ADR 0001
amendments + STATE_OF_PLATFORM v2 + AGENTS.md per operator's 5-step plan.

ADR 0002 codifies:
1. Forgejo Actions workflow .forgejo/workflows/canary-required.yml runs on
   pull_request opened/synchronize/reopened for relevant paths
2. Workflow generates diff + files-changed, runs platformctl.tools.run_review,
   reads decision_packet.md default_action, sets check status
3. Branch protection on main (operator setup TASK) requires canary-required
   check pass; operator may override per-PR via ADR 0001 Rule 2
4. Hard 3-iteration cap as merge-gate decision tree with 6 named terminal
   actions (approve_merge, approve_with_evidence_gap, operator_override,
   defer_to_issue, rewrite, split_pr)
5. split_pr is one-shot escape; descendants inherit cap; failed descendants
   fall to rewrite or defer_to_issue (no infinite-cap evasion)
6. Override visibility: PR body Canary status: operator_override line
   REQUIRED; appended to decisions/overrides.log
7. Doc-only carveout: workflow does NOT fire on routine *.md outside listed
   governance paths

Open loops captured (TASKs):
- Configure ZAI_API_KEY in Forgejo repo secrets (operator UI)
- Configure branch protection on main requiring canary-required check
- Verify forgejo-runner capacity
- First test PR after merge to confirm workflow end-to-end
- Migrate state/L3/OPEN_LOOPS.md defer_to_issue items into Forgejo Issues
  once setup verified

Files:
- decisions/0002-ci-enforcement-canary.md (~150 lines, Nygard format)
- .forgejo/workflows/canary-required.yml (workflow file, ~115 lines)

PR size class: Large (ADR + CI + workflow rule). Canary 3+3 required.
Self-test: this PR fires canary per Rule 1.

Authored as claude (identity-isolation per ADR 0001 + PR #42 v2 fix).
Substantively amends both ADR text AND workflow YAML per Oracle (GPT-5.5 Pro)
review of PR #44 v1 2026-05-04 + operator architectural rule on
pull_request vs pull_request_target trust boundary.

CRITICAL security fix:
v1 workflow checked out PR head + ran platformctl.tools.run_review with
ZAI_API_KEY + CANARY_FORGEJO_TOKEN in env — classic supply-chain attack
vector (malicious PR could modify run_review.py to exfiltrate secrets).

v2 architecture: 2-job pattern with strict trust boundary:

Job A `collect-diff` (UNPRIVILEGED):
- runs on pull_request, alpine:3.19 container
- permissions: contents: read ONLY (no write tokens)
- persist-credentials: false on checkout
- NO secrets in env
- generates diff + files-changed list, classifies canary-required paths
- uploads as workflow artifact

Job B `canary` (PRIVILEGED, secret-bearing):
- needs: collect-diff (only runs if matched_lines != 0)
- has secrets ZAI_API_KEY + CANARY_FORGEJO_TOKEN
- checks out BASE/MAIN ONLY (never PR head)
- downloads PR diff artifact (data only, not code)
- runs platformctl.tools.run_review from base/main code path
- PR head content NEVER imported, scripted, or executed
- posts canary comment via Forgejo API (no git push from CI)

Other Oracle findings addressed:

1. Status: Accepted → "Accepted design, NOT operational" with explicit
   5-precondition gate (secrets, runner, branch protection, first test
   PR passes, trust boundary verified) before status moves to Operational.
2. Removed "closes ADR 0001 enforcement open_loop 3 weeks early" claim
   — replaced with "designs the closure; operational closure pending
   end-to-end test pass."
3. Rule 4 contradiction fixed: state/reports/STATE_OF_PLATFORM_*.md is
   governance (fires canary). Earlier draft listed it BOTH in canary-paths
   AND doc-only-carveout — pick one. Now exclusively in canary-paths;
   cycle-counter.md added to doc-only-carveout instead.
4. Workflow uses decision_packet.json (machine-readable) via jq, not
   markdown grep on decision_packet.md. JSON output already exists in
   run_review.py output.
5. Step outputs use ${FORGEJO_OUTPUT:-$GITHUB_OUTPUT} fallback.
6. paths: filter on pull_request not relied upon (undocumented in target
   Forgejo version) — path filtering done in-job via collect-diff
   classifier instead.
7. CI self-commit DROPPED entirely — decision packet stored as workflow
   artifact + posted as PR comment via API. Avoids skip-ci recursion;
   Forgejo's actual skip-ci tokens (`[skip ci]`, `[ci skip]`, `[no ci]`,
   `[skip actions]`, `[actions skip]`) noted in workflow comment.
8. Workflow header banner explicitly marks SCAFFOLD ONLY status with
   5 preconditions for operational gating.

Files:
- decisions/0002-ci-enforcement-canary.md (rewritten, ~190 lines, status
  field, security context section, 2-job architecture in Rule 1, Rule 4
  path scope definitive, Operational gating section)
- .forgejo/workflows/canary-required.yml (rewritten, ~190 lines, 2-job
  pattern with security boundary)

PR title remains "ADR 0002 — CI enforcement of canary 3+3 + hard
iteration cap" but PR description now marks scaffold-not-operational.

Self-test: this commit is iter 2 of PR #44 under ADR 0001 hard 3-iter
cap. Iter 1 = Oracle external review (operator-relayed); v1 + this v2
together count as 2. One iteration remaining if canary catches more.

Authored as claude per identity-isolation discipline.
Author
Collaborator

Amendment iter 2 (commit 2462362) per Oracle review 2026-05-04. Authored as claude.

5 Oracle amendments + Oracle escalation discipline section. Amendments: independent-quality-voices phrasing aligned with #42; calendar-first "~3 weeks" removed (replaced with attention-bounded sequencing); Canary status state set aligned with ADR 0002 6 terminal actions; "surface to operator BEFORE acting" → "surface ambiguity with proposed default + consequence + fallback"; canary-scope-when-missing → classify by paths+size and propose default. Plus new section: Oracle is LAST RESORT (not primary review); when YES (iter-3 cap reached + no clear terminal; cross-cutting arch decisions); when NO (routine PRs, "want second opinion", "feels hard"); why discipline matters; format for legitimate escalation.

Ready for operator review. Per ADR 0001 hard 3-iter cap: this is iter 2 of 3. If canary re-fired and finds new issues, iter 3 forces terminal action choice per ADR 0002 Rule 2.

**Amendment iter 2 (commit 2462362)** per Oracle review 2026-05-04. Authored as `claude`. 5 Oracle amendments + Oracle escalation discipline section. Amendments: independent-quality-voices phrasing aligned with #42; calendar-first "~3 weeks" removed (replaced with attention-bounded sequencing); Canary status state set aligned with ADR 0002 6 terminal actions; "surface to operator BEFORE acting" → "surface ambiguity with proposed default + consequence + fallback"; canary-scope-when-missing → classify by paths+size and propose default. Plus new section: Oracle is LAST RESORT (not primary review); when YES (iter-3 cap reached + no clear terminal; cross-cutting arch decisions); when NO (routine PRs, "want second opinion", "feels hard"); why discipline matters; format for legitimate escalation. Ready for operator review. Per ADR 0001 hard 3-iter cap: this is iter 2 of 3. If canary re-fired and finds new issues, iter 3 forces terminal action choice per ADR 0002 Rule 2.
Author
Collaborator

Amendment iter 2 (commit 78be3b6) per Oracle review 2026-05-04 — substantive security architecture rewrite. Authored as claude.

CRITICAL fix: v1 workflow ran PR-controlled code (platformctl.tools.run_review) with secrets (ZAI_API_KEY, CANARY_FORGEJO_TOKEN) in environment — classic supply-chain attack vector (a malicious PR could modify run_review.py to exfiltrate secrets).

v2 architecture: 2-job pattern with strict trust boundary:

  • Job A collect-diff (UNPRIVILEGED): alpine:3.19, permissions: contents: read only, persist-credentials: false on checkout, NO secrets in env. Generates diff + files-changed list, classifies canary-required paths via in-job grep (not workflow paths: filter — undocumented in target Forgejo). Uploads as workflow artifact.
  • Job B canary (PRIVILEGED, secret-bearing): needs: collect-diff, runs only if matched_lines!=0. Has secrets. Checks out BASE/MAIN ONLY — never PR head. Downloads PR diff artifact (data only). Runs platformctl.tools.run_review from base/main code path. PR head content NEVER imported, scripted, or executed.

Other Oracle findings addressed:

  • Status: AcceptedAccepted design, NOT operational with explicit 5-precondition gate (secrets configured + runner verified + branch protection + first test PR passes + trust boundary verified)
  • Removed "closes ADR 0001 enforcement open_loop 3 weeks early" claim — replaced with "designs the closure; operational closure pending end-to-end test pass"
  • Rule 4 contradiction fixed: state/reports/STATE_OF_PLATFORM_*.md now exclusively in canary-paths (not doc-only-carveout); state/cycle-counter.md added to carveout list instead
  • decision_packet.json (jq-parsed, machine-readable) instead of markdown grep on decision_packet.md
  • Step outputs use ${FORGEJO_OUTPUT:-$GITHUB_OUTPUT} fallback per Forgejo Actions compatibility
  • paths: filter on pull_request not relied upon (in-job classifier in collect-diff does the filtering)
  • CI self-commit DROPPED entirely — decision packet stored as workflow artifact + posted as PR comment via --post-forgejo-comment (Forgejo API). Sidesteps the skip-ci recursion question entirely (Forgejo's actual skip tokens noted in workflow comment for future reference: [skip ci], [ci skip], [no ci], [skip actions], [actions skip])
  • Workflow header banner explicitly marks SCAFFOLD ONLY status

Ready for operator review. Per ADR 0001 hard 3-iter cap: this is iter 2 of 3 (iter 1 = Oracle external review). If canary re-fired and finds new issues, iter 3 forces terminal action choice per ADR 0002 Rule 2.

**Amendment iter 2 (commit 78be3b6)** per Oracle review 2026-05-04 — substantive security architecture rewrite. Authored as `claude`. **CRITICAL fix**: v1 workflow ran PR-controlled code (`platformctl.tools.run_review`) with secrets (`ZAI_API_KEY`, `CANARY_FORGEJO_TOKEN`) in environment — classic supply-chain attack vector (a malicious PR could modify `run_review.py` to exfiltrate secrets). **v2 architecture**: 2-job pattern with strict trust boundary: - **Job A `collect-diff`** (UNPRIVILEGED): alpine:3.19, `permissions: contents: read` only, `persist-credentials: false` on checkout, NO secrets in env. Generates diff + files-changed list, classifies canary-required paths via in-job grep (not workflow `paths:` filter — undocumented in target Forgejo). Uploads as workflow artifact. - **Job B `canary`** (PRIVILEGED, secret-bearing): `needs: collect-diff`, runs only if matched_lines!=0. Has secrets. **Checks out BASE/MAIN ONLY — never PR head**. Downloads PR diff artifact (data only). Runs `platformctl.tools.run_review` from base/main code path. PR head content NEVER imported, scripted, or executed. **Other Oracle findings addressed**: - Status: `Accepted` → `Accepted design, NOT operational` with explicit 5-precondition gate (secrets configured + runner verified + branch protection + first test PR passes + trust boundary verified) - Removed "closes ADR 0001 enforcement open_loop 3 weeks early" claim — replaced with "designs the closure; operational closure pending end-to-end test pass" - Rule 4 contradiction fixed: `state/reports/STATE_OF_PLATFORM_*.md` now exclusively in canary-paths (not doc-only-carveout); `state/cycle-counter.md` added to carveout list instead - `decision_packet.json` (jq-parsed, machine-readable) instead of markdown grep on `decision_packet.md` - Step outputs use `${FORGEJO_OUTPUT:-$GITHUB_OUTPUT}` fallback per Forgejo Actions compatibility - `paths:` filter on `pull_request` not relied upon (in-job classifier in `collect-diff` does the filtering) - **CI self-commit DROPPED entirely** — decision packet stored as workflow artifact + posted as PR comment via `--post-forgejo-comment` (Forgejo API). Sidesteps the skip-ci recursion question entirely (Forgejo's actual skip tokens noted in workflow comment for future reference: `[skip ci]`, `[ci skip]`, `[no ci]`, `[skip actions]`, `[actions skip]`) - Workflow header banner explicitly marks SCAFFOLD ONLY status Ready for operator review. Per ADR 0001 hard 3-iter cap: this is iter 2 of 3 (iter 1 = Oracle external review). If canary re-fired and finds new issues, iter 3 forces terminal action choice per ADR 0002 Rule 2.
Sign in to join this conversation.
No reviewers
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform!44
No description provided.