governance: ADR-0017 — no stacked PRs to main + 4-layer guard (operator-protection) #220

Merged
pdurlej merged 1 commit from claude/orders/adr-0017-stack-guards into main 2026-05-12 07:24:21 +02:00
Collaborator

Tier classification per ADR-0007: Lite (4-layer guard, ~565 LoC additive, no sacred paths, no schema, no runtime; two new workflows are read-only / comment-only-during-soak / no secrets consumed). Operator may upgrade to Full if preferred — both workflows parseable, both small enough to spot-review.

Why now

Responds to operator's exact ask 2026-05-12 01:45 CEST:

"Zaplanuj to w taki sposób bym nie mógł zrobić takiego głupiego błędu jako non-tech-PM."

Incident context: on 2026-05-11/12 night, Phase 3 apply chain was authored as 7 stacked PRs (#161–#167). Operator clicked Merge in Forgejo UI 7 times. Only #162 + #161 (independent) actually landed on origin/main. PRs #163–#167 merged into the stack-branches, not main. Forgejo UI reported merged: True 5 times silently incorrectly. Codex diagnosed during wake-up cross-check and opened rescue PR #215.

ADR-0017 closes the trap class.

Four layers (defense in depth)

# Layer File Effort
1 Mechanical pre-merge guard .forgejo/workflows/base-is-main.yml green/red status check on every PR; with branch protection blocks UI Merge button
2 Post-merge in-main audit .forgejo/workflows/merged-in-main-audit.yml catches what slips Layer 1; 7-day soak then auto-opens owner-attention issues
3 Cousin discipline decisions/0017-...md + REVIEW.md § Stack guard rule every agent default base=main; sequential not stacked; escape hatch via tier/stacked label + consolidator-PR plan
4 Operator pre-merge mental check state/pan-herbatka-day1-checklist.md § 13 1-second base-field glance; belt to Layer 1's mechanical suspenders

Files (5 changed, +415 lines)

  • decisions/0017-no-stacked-prs-to-main.md — full ADR with git-log evidence from the incident
  • .forgejo/workflows/base-is-main.yml — Layer 1 (~90 LoC; yaml-parseable; no secrets)
  • .forgejo/workflows/merged-in-main-audit.yml — Layer 2 (~130 LoC; soak-period mode initially)
  • REVIEW.md — Stack guard rule (~30 LoC, inserted between Risk-proportional tier and Re-review convergence sections)
  • state/pan-herbatka-day1-checklist.md — § 13 amendment (~35 LoC, operator-facing affordance)

What this PR does NOT do

  • Does NOT enable Forgejo branch protection on main (operator-action, ~30s in Forgejo settings UI; tracked as auto-opened follow-up issue if/when this PR merges)
  • Does NOT retroactively rescue the 2026-05-11/12 Phase 3 incident (Codex's PR #215 handles that)
  • Does NOT touch sacred paths, schemas, runtime, control-plane code
  • Does NOT modify existing workflows (additive only)
  • Does NOT auto-open issues during 7-day soak (Layer 2 reports warnings + artifact only; flips to issue-opening after operator validates no false positives)

Operator's North Star check

Does this PR reduce operator-attention-cost?

  • Reduces: every future Merge click becomes safe. Layer 1 makes stacked-merge mechanically impossible (with branch protection enabled). Operator's mental model simplifies: "every Merge button = lands on main."
  • One-time +30s: enabling branch protection after merge (follow-up issue tracks this).

Verdict: ship.

Test plan

  • Operator readback: 4-layer defense makes sense for non-tech-PM threat model
  • Operator readback: escape hatch (tier/stacked label + consolidator-PR) is sufficient for the rare legitimate-stack case
  • Optional: open a deliberately-stacked test PR to verify Layer 1 mechanical fail (red check appears)
  • Operator merge
  • Post-merge: operator enables Forgejo branch protection on main with base-is-main / guard as required status check (the auto-opened follow-up issue describes the click path)
  • 7-day soak: review Layer 2 audit artifacts for false positives before flipping to issue-opening mode

Spec sources read

  • decisions/0001-canary-mandatory-pm-cadence.md (canary baseline)
  • decisions/0007-risk-proportional-canary.md (tier discipline; classifying this PR as Lite)
  • REVIEW.md existing structure (insertion point + style)
  • state/pan-herbatka-day1-checklist.md § 12 (insertion point for new § 13)
  • .forgejo/workflows/canary-required.yml (workflow style + trust-boundary header reference)
  • Git log on pdurlej/platform main 2026-05-12 02:00 CEST (incident evidence; cited in ADR § References)
  • Codex's PR #215 (rescue PR that prompted this ADR; cited but not modified)

🍵 Drafted by claude (Pan Herbatka) 2026-05-12 02:00 CEST.

Refs: PR #215 (Codex rescue, in flight), PR #187 (ADR renumeration merged 01:51, separate fire), ADR-0007 (Lite tier framework this PR uses)

Tier classification per ADR-0007: **Lite** (4-layer guard, ~565 LoC additive, no sacred paths, no schema, no runtime; two new workflows are read-only / comment-only-during-soak / no secrets consumed). Operator may upgrade to Full if preferred — both workflows parseable, both small enough to spot-review. ## Why now Responds to operator's exact ask 2026-05-12 01:45 CEST: > *"Zaplanuj to w taki sposób bym nie mógł zrobić takiego głupiego błędu jako non-tech-PM."* Incident context: on 2026-05-11/12 night, Phase 3 apply chain was authored as 7 stacked PRs (#161–#167). Operator clicked Merge in Forgejo UI 7 times. Only #162 + #161 (independent) actually landed on `origin/main`. PRs #163–#167 merged into the stack-branches, not main. Forgejo UI reported `merged: True` 5 times silently incorrectly. Codex diagnosed during wake-up cross-check and opened rescue PR #215. ADR-0017 closes the trap class. ## Four layers (defense in depth) | # | Layer | File | Effort | |---|---|---|---| | 1 | **Mechanical pre-merge guard** | `.forgejo/workflows/base-is-main.yml` | green/red status check on every PR; with branch protection blocks UI Merge button | | 2 | **Post-merge in-main audit** | `.forgejo/workflows/merged-in-main-audit.yml` | catches what slips Layer 1; 7-day soak then auto-opens owner-attention issues | | 3 | **Cousin discipline** | `decisions/0017-...md` + `REVIEW.md` § Stack guard rule | every agent default base=main; sequential not stacked; escape hatch via `tier/stacked` label + consolidator-PR plan | | 4 | **Operator pre-merge mental check** | `state/pan-herbatka-day1-checklist.md` § 13 | 1-second base-field glance; belt to Layer 1's mechanical suspenders | ## Files (5 changed, +415 lines) - `decisions/0017-no-stacked-prs-to-main.md` — full ADR with git-log evidence from the incident - `.forgejo/workflows/base-is-main.yml` — Layer 1 (~90 LoC; yaml-parseable; no secrets) - `.forgejo/workflows/merged-in-main-audit.yml` — Layer 2 (~130 LoC; soak-period mode initially) - `REVIEW.md` — Stack guard rule (~30 LoC, inserted between Risk-proportional tier and Re-review convergence sections) - `state/pan-herbatka-day1-checklist.md` — § 13 amendment (~35 LoC, operator-facing affordance) ## What this PR does NOT do - Does NOT enable Forgejo branch protection on main (operator-action, ~30s in Forgejo settings UI; tracked as auto-opened follow-up issue if/when this PR merges) - Does NOT retroactively rescue the 2026-05-11/12 Phase 3 incident (Codex's PR #215 handles that) - Does NOT touch sacred paths, schemas, runtime, control-plane code - Does NOT modify existing workflows (additive only) - Does NOT auto-open issues during 7-day soak (Layer 2 reports warnings + artifact only; flips to issue-opening after operator validates no false positives) ## Operator's North Star check Does this PR reduce operator-attention-cost? - **Reduces**: every future Merge click becomes safe. Layer 1 makes stacked-merge mechanically impossible (with branch protection enabled). Operator's mental model simplifies: "every Merge button = lands on main." - **One-time +30s**: enabling branch protection after merge (follow-up issue tracks this). Verdict: ship. ## Test plan - [ ] Operator readback: 4-layer defense makes sense for non-tech-PM threat model - [ ] Operator readback: escape hatch (`tier/stacked` label + consolidator-PR) is sufficient for the rare legitimate-stack case - [ ] Optional: open a deliberately-stacked test PR to verify Layer 1 mechanical fail (red check appears) - [ ] Operator merge - [ ] Post-merge: operator enables Forgejo branch protection on main with `base-is-main / guard` as required status check (the auto-opened follow-up issue describes the click path) - [ ] 7-day soak: review Layer 2 audit artifacts for false positives before flipping to issue-opening mode ## Spec sources read - `decisions/0001-canary-mandatory-pm-cadence.md` (canary baseline) - `decisions/0007-risk-proportional-canary.md` (tier discipline; classifying this PR as Lite) - `REVIEW.md` existing structure (insertion point + style) - `state/pan-herbatka-day1-checklist.md` § 12 (insertion point for new § 13) - `.forgejo/workflows/canary-required.yml` (workflow style + trust-boundary header reference) - Git log on `pdurlej/platform` main 2026-05-12 02:00 CEST (incident evidence; cited in ADR § References) - Codex's PR #215 (rescue PR that prompted this ADR; cited but not modified) 🍵 *Drafted by claude (Pan Herbatka) 2026-05-12 02:00 CEST.* Refs: PR #215 (Codex rescue, in flight), PR #187 (ADR renumeration merged 01:51, separate fire), ADR-0007 (Lite tier framework this PR uses)
governance: ADR-0017 — no stacked PRs to main + 4-layer guard (pre/post-merge + cousin + operator)
Some checks failed
base-is-main / guard (pull_request) Successful in 1s
canary-required / collect-diff (pull_request) Successful in 3s
infra-docs-drift / docs-drift (pull_request) Failing after 3s
workflow-lint / lint (pull_request) Failing after 3s
canary-required / canary (pull_request) Successful in 11s
571e5833de
Closes the operator-attention-cost trap that fired 2026-05-11/12 night
on the Phase 3 apply chain: 7 stacked PRs (#161-#167), operator clicked
Merge in Forgejo UI 7 times, only 2 commits actually landed on main
(#162 + #161). #163-#167 merged into the stack, not main. Codex
diagnosed + opened rescue PR #215.

Operator's exact ask 2026-05-12 01:45 CEST:
  "Zaplanuj to w taki sposób bym nie mógł zrobić takiego głupiego
  błędu jako non-tech-PM."

## Four layers (independent activation)

1. **Mechanical pre-merge guard** — .forgejo/workflows/base-is-main.yml
   - Posts red check if base != main AND no tier/stacked label
   - With Forgejo branch protection enabled, blocks Merge button
   - Operator-action TODO: enable required-status-check on main (separate
     follow-up issue auto-opened on merge)

2. **Post-merge in-main audit** — .forgejo/workflows/merged-in-main-audit.yml
   - On push to main, identifies merge commits, asserts every merged PR
     actually landed on main (base != main = silent-stack-merge candidate)
   - 7-day soak period: comment-only warnings + audit artifact
   - After soak: opens Forgejo Issue with owner-attention label

3. **Cousin discipline** — ADR-0017 + REVIEW.md § Stack guard rule
   - Every agent (Codex/claude/glm/antigravity) defaults base: main
   - Sequential, not stacked; if dependency, wait + rebase + open next
   - Escape hatch: tier/stacked label + consolidator-PR with base: main

4. **Operator wakeup pre-merge mental check** — checklist § 13
   - 1-second glance at base field before clicking Merge
   - Belt-to-Layer-1 mechanical-suspenders pattern
   - Catches CLI/mobile/API merges that bypass UI status checks

## Files

- decisions/0017-no-stacked-prs-to-main.md (new, ~280 LoC)
- .forgejo/workflows/base-is-main.yml (new, ~90 LoC)
- .forgejo/workflows/merged-in-main-audit.yml (new, ~130 LoC)
- REVIEW.md amendment (Stack guard rule section, ~30 LoC)
- state/pan-herbatka-day1-checklist.md § 13 amendment (~35 LoC)

Total ~565 LoC additive. No sacred paths, no schema, no runtime.

## Tier (per ADR-0007)

**Lite** (could be argued Full because two workflows touch CI surface,
but workflows are read-only, post-comment-only-during-soak, no secrets
consumed). Operator-merge per ADR-0001 Rule 2 if operator chooses Full;
otherwise Lite single-canary suffices.

Both workflows parseable (python yaml.safe_load passes).

## Test plan

- [ ] Operator readback: layered defense makes sense
- [ ] Operator readback: ADR-0017 escape hatch (tier/stacked label) is
      sufficient for the rare legitimate-stack case
- [ ] Optional: open deliberately-stacked test PR to verify Layer 1
      mechanical fail (red check)
- [ ] Operator merge
- [ ] Post-merge: operator enables Forgejo branch protection on main
      with base-is-main / guard as required status check (~30s
      operator-action; tracked as auto-opened follow-up issue)
- [ ] 7-day soak: review Layer 2 audit artifacts for false positives
      before flipping to issue-opening mode

## Refs

- Codex's Phase 3 rescue PR #215 (the incident this ADR responds to)
- PR #168 ADR collision (Prof Kong renumeration via PR #187, separate
  governance fire from same night — different bug, same operator-attention
  trap class)
- ADR-0001 (canary cadence), ADR-0007 (tier discipline)
- Git log evidence in ADR § References
Author
Collaborator

Operator-action note — branch protection toggle QUEUED, not immediate

Layer 1 status check base-is-main / guard is inert until operator enables "Require status checks to pass before merging" in Forgejo branch-protection settings for main. That's by design — workflow ships green-on-day-1, activation is operator-controlled.

Per operator decision 2026-05-12 ~02:15 CEST: DO NOT enable now. Codex is approaching first RS2000 cutover (PR #215 + apply pipeline first real use); zero-changes window applies.

Follow-up issue auto-opened: #243 with label owner-attention — will surface in state/STATUS_NOW.md Owner Action Board on next wakeup. Issue body lists exact preconditions (cutover done + 24h smoke pass + no in-flight stacks) before operator clicks the toggle.

Until then:

  • Layer 1 workflow runs on every PR but does not block merging (operator's mental check § 13 is the operative belt)
  • Layer 2 post-merge audit runs and uploads artifact even without protection (catches anything that slips by)
  • Layer 3 cousin discipline (this ADR) applies to all agents from merge moment
  • Layer 4 checklist § 13 applies to operator from merge moment

Full defense activates the moment branch protection toggle flips — operator-controlled timing.

🍵

## Operator-action note — branch protection toggle QUEUED, not immediate Layer 1 status check `base-is-main / guard` is **inert** until operator enables "Require status checks to pass before merging" in Forgejo branch-protection settings for `main`. That's by design — workflow ships green-on-day-1, activation is operator-controlled. Per operator decision 2026-05-12 ~02:15 CEST: **DO NOT enable now.** Codex is approaching first RS2000 cutover (PR #215 + apply pipeline first real use); zero-changes window applies. Follow-up issue auto-opened: **#243** with label `owner-attention` — will surface in `state/STATUS_NOW.md` Owner Action Board on next wakeup. Issue body lists exact preconditions (cutover done + 24h smoke pass + no in-flight stacks) before operator clicks the toggle. Until then: - Layer 1 workflow runs on every PR but does not block merging (operator's mental check § 13 is the operative belt) - Layer 2 post-merge audit runs and uploads artifact even without protection (catches anything that slips by) - Layer 3 cousin discipline (this ADR) applies to all agents from merge moment - Layer 4 checklist § 13 applies to operator from merge moment Full defense activates the moment branch protection toggle flips — operator-controlled timing. 🍵
Sign in to join this conversation.
No reviewers
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform!220
No description provided.