test(v0): D20 architectural boundary lint (Swiss Cheese Layer 6) #55

Merged
pdurlej merged 1 commit from claude/patchwarden-d20-architectural-lint into main 2026-05-27 14:41:06 +02:00
Collaborator

What

Closes Swiss Cheese Layer 6 gap (Scenariusz C z docs/operations/code-vs-vision-snapshot-2026-05-27.md).

Before this PR, D20 boundary was enforced by absence + conventionforgejo_client.py simply didn't define merge_pull_request / submit_approval / create_pr_review, relying on every future cousin (or sub-agent) to remember the rule. That's fragile: a Sonnet sub-agent could plausibly add forgejo_client.merge_pull_request "to be helpful" and no automated check would catch it until a human noticed in the PR diff.

This PR converts that into a build-time test. If the rule drifts, the lint fails loudly.

This is PR B of 3 defense-hardening PRs spun off the 2026-05-27 crosscheck.

What the lint checks

tests/test_d20_architectural_boundary.py (+216 lines):

D20BoundaryLintTests — 4 tests against real source

Test Catches
test_src_dir_was_found Sanity — lint sees ≥6 modules (won't pass vacuously on empty dir)
test_no_module_calls_forbidden_function Any *.py calling merge_pull_request(, submit_approval(, create_pr_review(, or passing event="APPROVED"
test_no_module_uses_forbidden_url Raw /merge URL substring — blocks urllib bypass
test_forgejo_client_defines_no_forbidden_helpers forgejo_client.py specifically must not contain def merge_pull_request / def submit_approval / def create_pr_review

D20BoundarySelfTest — 4 meta-tests (regex regression)

These prevent vacuous green: a regex typo could make all lint tests pass silently. Self-tests prove the patterns actually match real violations.

Test Proves
test_lint_would_catch_a_synthetic_merge_call merge_pull_request(123) matches
test_lint_would_catch_a_synthetic_approval event="APPROVED" matches
test_lint_would_catch_a_synthetic_merge_url .../pulls/1/merge matches
test_lint_ignores_safe_word_merge_base Regressionmerge_base (legit field) does NOT trigger

Comment-stripping crudity

_strip_line_comments removes everything after # per line so bare mentions in # NEVER call merge_pull_request don't false-positive. Limitation: a # inside a string literal would split the line incorrectly — but call patterns require ( after the symbol anyway, so this is belt-and-suspenders, not the primary defense. False-negatives are acceptable; false-positives would be annoying.

Why this matters

D20 boundary is a fundamental architectural choice — "Patchwarden never merges, never APPROVES." Before this PR:

  • Lived only in docs/architecture.md prose + the absence of helpers
  • No automated check
  • Drift was possible: a sub-agent could add a forbidden helper "for the next PR"; would only be caught by human PR review

After this PR:

  • Test fails immediately if any module calls a forbidden function
  • Test fails immediately if forgejo_client.py defines a forbidden helper
  • Self-tests prove the lint itself works
  • D20 cannot silently drift

This is precisely the kind of architectural enforcement that Bet 3 (Swiss Cheese stack defensibility) depends on.

Tests

144 → 152 green (+8 new, all pass on first run including the 4 self-tests).

Atomic per ADR-0017

  • 1 file, +216 lines, 0 src changes, 0 deletions
  • No new module, no new dependency, stdlib-only (re + pathlib)
  • base=main, no stacking

Swiss Cheese context

PR Layer Status
#54 PR A Layer 8 resolver hand-off merged
THIS PR B Layer 6 architectural lint awaiting merge
PR C (next) Layer 1 secret-by-filename doc queued

NOT breaking M2 gate

Test-only PR over existing v0. No new feature. Per docs/decisions.md D16 + M2 milestone notes, hardening existing capability is in-scope; this is exactly that.

Token-accounting

~3-4% weekly Opus. Wrote regex + 4 self-tests directly without sub-agent — small enough that brief overhead would exceed delegation savings.

## What Closes Swiss Cheese **Layer 6** gap (Scenariusz C z `docs/operations/code-vs-vision-snapshot-2026-05-27.md`). Before this PR, D20 boundary was enforced **by absence + convention** — `forgejo_client.py` simply didn't define `merge_pull_request` / `submit_approval` / `create_pr_review`, relying on every future cousin (or sub-agent) to remember the rule. That's fragile: a Sonnet sub-agent could plausibly add `forgejo_client.merge_pull_request` "to be helpful" and no automated check would catch it until a human noticed in the PR diff. **This PR converts that into a build-time test.** If the rule drifts, the lint fails loudly. This is **PR B of 3 defense-hardening PRs** spun off the 2026-05-27 crosscheck. ## What the lint checks `tests/test_d20_architectural_boundary.py` (+216 lines): ### `D20BoundaryLintTests` — 4 tests against real source | Test | Catches | |---|---| | `test_src_dir_was_found` | Sanity — lint sees ≥6 modules (won't pass vacuously on empty dir) | | `test_no_module_calls_forbidden_function` | Any `*.py` calling `merge_pull_request(`, `submit_approval(`, `create_pr_review(`, or passing `event="APPROVED"` | | `test_no_module_uses_forbidden_url` | Raw `/merge` URL substring — blocks urllib bypass | | `test_forgejo_client_defines_no_forbidden_helpers` | `forgejo_client.py` specifically must not contain `def merge_pull_request` / `def submit_approval` / `def create_pr_review` | ### `D20BoundarySelfTest` — 4 meta-tests (regex regression) These prevent vacuous green: a regex typo could make all lint tests pass silently. Self-tests prove the patterns actually match real violations. | Test | Proves | |---|---| | `test_lint_would_catch_a_synthetic_merge_call` | `merge_pull_request(123)` matches | | `test_lint_would_catch_a_synthetic_approval` | `event="APPROVED"` matches | | `test_lint_would_catch_a_synthetic_merge_url` | `.../pulls/1/merge` matches | | `test_lint_ignores_safe_word_merge_base` | **Regression** — `merge_base` (legit field) does NOT trigger | ## Comment-stripping crudity `_strip_line_comments` removes everything after `#` per line so bare mentions in `# NEVER call merge_pull_request` don't false-positive. Limitation: a `#` inside a string literal would split the line incorrectly — but call patterns require `(` after the symbol anyway, so this is belt-and-suspenders, not the primary defense. False-negatives are acceptable; false-positives would be annoying. ## Why this matters D20 boundary is a **fundamental architectural choice** — "Patchwarden never merges, never APPROVES." Before this PR: - Lived only in `docs/architecture.md` prose + the **absence** of helpers - No automated check - Drift was possible: a sub-agent could add a forbidden helper "for the next PR"; would only be caught by human PR review After this PR: - Test fails immediately if any module calls a forbidden function - Test fails immediately if `forgejo_client.py` defines a forbidden helper - Self-tests prove the lint itself works - D20 cannot silently drift This is precisely the kind of architectural enforcement that Bet 3 (Swiss Cheese stack defensibility) depends on. ## Tests **144 → 152 green** (+8 new, all pass on first run including the 4 self-tests). ## Atomic per ADR-0017 - 1 file, +216 lines, 0 src changes, 0 deletions - No new module, no new dependency, stdlib-only (`re` + `pathlib`) - `base=main`, no stacking ## Swiss Cheese context | PR | Layer | Status | |---|---|---| | #54 PR A | Layer 8 resolver hand-off | ✅ merged | | **THIS PR B** | Layer 6 architectural lint | ⏳ awaiting merge | | PR C (next) | Layer 1 secret-by-filename doc | queued | ## NOT breaking M2 gate Test-only PR over existing v0. No new feature. Per [`docs/decisions.md`](../docs/decisions.md) D16 + M2 milestone notes, hardening existing capability is in-scope; this is exactly that. ## Token-accounting ~3-4% weekly Opus. Wrote regex + 4 self-tests directly without sub-agent — small enough that brief overhead would exceed delegation savings.
Converts D20 enforcement from "by absence + convention" to a build-time
check. Before this PR, the boundary was held only by `forgejo_client.py`
not defining `merge_pull_request` / `submit_approval` / `create_pr_review`
— relying on every future cousin (or sub-agent) to remember the rule.

This is **PR B of 3 defense-hardening PRs** spun off the 2026-05-27
crosscheck. Closes Layer 6 Swiss Cheese gap (Scenariusz C in
`docs/operations/code-vs-vision-snapshot-2026-05-27.md`).

## What the lint checks

`tests/test_d20_architectural_boundary.py`:

**D20BoundaryLintTests** (4 tests, real-source scans):
- `test_src_dir_was_found` — sanity check that lint sees ≥6 modules
- `test_no_module_calls_forbidden_function` — regex scan for
  `merge_pull_request(`, `submit_approval(`, `create_pr_review(`, and
  `event="APPROVED"` across all `src/patchwarden/*.py`
- `test_no_module_uses_forbidden_url` — regex scan for raw `/merge`
  URL substring (prevents bypass via urllib directly)
- `test_forgejo_client_defines_no_forbidden_helpers` — `forgejo_client.py`
  specifically must not contain `def merge_pull_request` / `def submit_approval` /
  `def create_pr_review` definitions, even if no caller uses them yet

**D20BoundarySelfTest** (4 tests, meta-lint regression checks):
- `test_lint_would_catch_a_synthetic_merge_call` — proves the regex
  actually matches a real call
- `test_lint_would_catch_a_synthetic_approval` — proves the regex
  catches `event="APPROVED"` reviews
- `test_lint_would_catch_a_synthetic_merge_url` — proves the URL regex
  catches `.../pulls/1/merge`
- `test_lint_ignores_safe_word_merge_base` — regression: `merge_base`
  is legit (we use it as a field name), lint must skip it

Self-tests protect against regex typos: without them, a broken pattern
could let the entire lint pass vacuously and nobody notices until D20
breaks for real.

## Comment-stripping crudity

`_strip_line_comments` removes everything after `#` per line so that
bare mentions inside `# NEVER call merge_pull_request` don't false-
positive. Limitation: a `#` inside a string literal would split the
line incorrectly — but call patterns require `(` after the symbol
anyway, so this is belt-and-suspenders, not the primary defense.
False-negatives are acceptable; false-positives would be annoying.

## Why this matters

D20 boundary is a fundamental architectural choice — "Patchwarden never
merges, never APPROVES." Before this PR, that choice lived only in
`docs/architecture.md` "Architectural enforcement" prose + the absence
of the helpers themselves. A Sonnet sub-agent doing PR work could
plausibly add `forgejo_client.merge_pull_request` "to be helpful" and
no automated check would catch it until the operator (or a human
reviewer) noticed in the PR diff.

Now there's a test that fails loudly. The rule cannot quietly drift.

## Tests

144 → 152 green (+8 new, all pass on first run including the 4 self-tests
that exercise both positive and negative paths of the lint).

## Atomic per ADR-0017

- 1 file, +199 lines, 0 src changes, 0 deletions.
- No new module, no new dependency, stdlib-only (`re` + `pathlib`).
- `base=main`, no stacking on prior PRs.

## Swiss Cheese context

| PR | Layer | Status |
|---|---|---|
| #54 (PR A merged) | Layer 8 resolver hand-off |  |
| **THIS PR B** | Layer 6 architectural lint |  awaiting merge |
| PR C (next) | Layer 1 secret-by-filename doc | queued |

## NOT breaking M2 gate

Test-only PR over existing v0. No new feature, no new capability. Per
`docs/decisions.md` D16 + M2 milestone notes, hardening existing
capability is in-scope; this is exactly that.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign in to join this conversation.
No reviewers
No labels
agent/claude-code
agent/codex
agent/gemini
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
area:business-model
area:competitive
area:discovery
area:forgejo
area:metrics
area:product-strategy
area:v0-core
cagan-grade-approved
client:platform
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
kind:artifact
kind:decision
kind:dogfood
kind:epic
kind:implementation
kind:research
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
priority:p0
priority:p1
priority:p2
priority:p3
ready-for-agent
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:blocked-on-discovery
status:cagan-grade-review-pending
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:needs-operator-decision
status:operator-needed
status:parked
tier:0-anchor
tier:0-platform-substrate
tier:1-core
tier:1-iskra-value-layer
tier:2-supporting
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
wave:1-foundation
wave:2-positioning
wave:3-validation
wave:4-economics
wave:5-operating
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/patchwarden!55
No description provided.