test(v0): D20 architectural boundary lint (Swiss Cheese Layer 6) #55

Merged

pdurlej merged 1 commit from claude/patchwarden-d20-architectural-lint into main

2026-05-27 14:41:06 +02:00

claude commented

2026-05-27 14:36:27 +02:00

Collaborator

What

Closes Swiss Cheese Layer 6 gap (Scenariusz C z docs/operations/code-vs-vision-snapshot-2026-05-27.md).

Before this PR, D20 boundary was enforced by absence + convention — forgejo_client.py simply didn't define merge_pull_request / submit_approval / create_pr_review, relying on every future cousin (or sub-agent) to remember the rule. That's fragile: a Sonnet sub-agent could plausibly add forgejo_client.merge_pull_request "to be helpful" and no automated check would catch it until a human noticed in the PR diff.

This PR converts that into a build-time test. If the rule drifts, the lint fails loudly.

This is PR B of 3 defense-hardening PRs spun off the 2026-05-27 crosscheck.

What the lint checks

tests/test_d20_architectural_boundary.py (+216 lines):

`D20BoundaryLintTests` — 4 tests against real source

Test	Catches
`test_src_dir_was_found`	Sanity — lint sees ≥6 modules (won't pass vacuously on empty dir)
`test_no_module_calls_forbidden_function`	Any `*.py` calling `merge_pull_request(`, `submit_approval(`, `create_pr_review(`, or passing `event="APPROVED"`
`test_no_module_uses_forbidden_url`	Raw `/merge` URL substring — blocks urllib bypass
`test_forgejo_client_defines_no_forbidden_helpers`	`forgejo_client.py` specifically must not contain `def merge_pull_request` / `def submit_approval` / `def create_pr_review`

`D20BoundarySelfTest` — 4 meta-tests (regex regression)

These prevent vacuous green: a regex typo could make all lint tests pass silently. Self-tests prove the patterns actually match real violations.

Test	Proves
`test_lint_would_catch_a_synthetic_merge_call`	`merge_pull_request(123)` matches
`test_lint_would_catch_a_synthetic_approval`	`event="APPROVED"` matches
`test_lint_would_catch_a_synthetic_merge_url`	`.../pulls/1/merge` matches
`test_lint_ignores_safe_word_merge_base`	Regression — `merge_base` (legit field) does NOT trigger

Comment-stripping crudity

_strip_line_comments removes everything after # per line so bare mentions in # NEVER call merge_pull_request don't false-positive. Limitation: a # inside a string literal would split the line incorrectly — but call patterns require ( after the symbol anyway, so this is belt-and-suspenders, not the primary defense. False-negatives are acceptable; false-positives would be annoying.

Why this matters

D20 boundary is a fundamental architectural choice — "Patchwarden never merges, never APPROVES." Before this PR:

Lived only in docs/architecture.md prose + the absence of helpers
No automated check
Drift was possible: a sub-agent could add a forbidden helper "for the next PR"; would only be caught by human PR review

After this PR:

Test fails immediately if any module calls a forbidden function
Test fails immediately if forgejo_client.py defines a forbidden helper
Self-tests prove the lint itself works
D20 cannot silently drift

This is precisely the kind of architectural enforcement that Bet 3 (Swiss Cheese stack defensibility) depends on.

Tests

144 → 152 green (+8 new, all pass on first run including the 4 self-tests).

Atomic per ADR-0017

1 file, +216 lines, 0 src changes, 0 deletions
No new module, no new dependency, stdlib-only (re + pathlib)
base=main, no stacking

Swiss Cheese context

PR	Layer	Status
#54 PR A	Layer 8 resolver hand-off	✅ merged
THIS PR B	Layer 6 architectural lint	⏳ awaiting merge
PR C (next)	Layer 1 secret-by-filename doc	queued

NOT breaking M2 gate

Test-only PR over existing v0. No new feature. Per docs/decisions.md D16 + M2 milestone notes, hardening existing capability is in-scope; this is exactly that.

Token-accounting

~3-4% weekly Opus. Wrote regex + 4 self-tests directly without sub-agent — small enough that brief overhead would exceed delegation savings.

## What Closes Swiss Cheese **Layer 6** gap (Scenariusz C z `docs/operations/code-vs-vision-snapshot-2026-05-27.md`). Before this PR, D20 boundary was enforced **by absence + convention** — `forgejo_client.py` simply didn't define `merge_pull_request` / `submit_approval` / `create_pr_review`, relying on every future cousin (or sub-agent) to remember the rule. That's fragile: a Sonnet sub-agent could plausibly add `forgejo_client.merge_pull_request` "to be helpful" and no automated check would catch it until a human noticed in the PR diff. **This PR converts that into a build-time test.** If the rule drifts, the lint fails loudly. This is **PR B of 3 defense-hardening PRs** spun off the 2026-05-27 crosscheck. ## What the lint checks `tests/test_d20_architectural_boundary.py` (+216 lines): ### `D20BoundaryLintTests` — 4 tests against real source | Test | Catches | |---|---| | `test_src_dir_was_found` | Sanity — lint sees ≥6 modules (won't pass vacuously on empty dir) | | `test_no_module_calls_forbidden_function` | Any `*.py` calling `merge_pull_request(`, `submit_approval(`, `create_pr_review(`, or passing `event="APPROVED"` | | `test_no_module_uses_forbidden_url` | Raw `/merge` URL substring — blocks urllib bypass | | `test_forgejo_client_defines_no_forbidden_helpers` | `forgejo_client.py` specifically must not contain `def merge_pull_request` / `def submit_approval` / `def create_pr_review` | ### `D20BoundarySelfTest` — 4 meta-tests (regex regression) These prevent vacuous green: a regex typo could make all lint tests pass silently. Self-tests prove the patterns actually match real violations. | Test | Proves | |---|---| | `test_lint_would_catch_a_synthetic_merge_call` | `merge_pull_request(123)` matches | | `test_lint_would_catch_a_synthetic_approval` | `event="APPROVED"` matches | | `test_lint_would_catch_a_synthetic_merge_url` | `.../pulls/1/merge` matches | | `test_lint_ignores_safe_word_merge_base` | **Regression** — `merge_base` (legit field) does NOT trigger | ## Comment-stripping crudity `_strip_line_comments` removes everything after `#` per line so bare mentions in `# NEVER call merge_pull_request` don't false-positive. Limitation: a `#` inside a string literal would split the line incorrectly — but call patterns require `(` after the symbol anyway, so this is belt-and-suspenders, not the primary defense. False-negatives are acceptable; false-positives would be annoying. ## Why this matters D20 boundary is a **fundamental architectural choice** — "Patchwarden never merges, never APPROVES." Before this PR: - Lived only in `docs/architecture.md` prose + the **absence** of helpers - No automated check - Drift was possible: a sub-agent could add a forbidden helper "for the next PR"; would only be caught by human PR review After this PR: - Test fails immediately if any module calls a forbidden function - Test fails immediately if `forgejo_client.py` defines a forbidden helper - Self-tests prove the lint itself works - D20 cannot silently drift This is precisely the kind of architectural enforcement that Bet 3 (Swiss Cheese stack defensibility) depends on. ## Tests **144 → 152 green** (+8 new, all pass on first run including the 4 self-tests). ## Atomic per ADR-0017 - 1 file, +216 lines, 0 src changes, 0 deletions - No new module, no new dependency, stdlib-only (`re` + `pathlib`) - `base=main`, no stacking ## Swiss Cheese context | PR | Layer | Status | |---|---|---| | #54 PR A | Layer 8 resolver hand-off | ✅ merged | | **THIS PR B** | Layer 6 architectural lint | ⏳ awaiting merge | | PR C (next) | Layer 1 secret-by-filename doc | queued | ## NOT breaking M2 gate Test-only PR over existing v0. No new feature. Per [`docs/decisions.md`](../docs/decisions.md) D16 + M2 milestone notes, hardening existing capability is in-scope; this is exactly that. ## Token-accounting ~3-4% weekly Opus. Wrote regex + 4 self-tests directly without sub-agent — small enough that brief overhead would exceed delegation savings.

claude added 1 commit

2026-05-27 14:36:27 +02:00

test(v0): D20 architectural boundary lint (Swiss Cheese Layer 6) 8df1670183

Converts D20 enforcement from "by absence + convention" to a build-time
check. Before this PR, the boundary was held only by `forgejo_client.py`
not defining `merge_pull_request` / `submit_approval` / `create_pr_review`
— relying on every future cousin (or sub-agent) to remember the rule.

This is **PR B of 3 defense-hardening PRs** spun off the 2026-05-27
crosscheck. Closes Layer 6 Swiss Cheese gap (Scenariusz C in
`docs/operations/code-vs-vision-snapshot-2026-05-27.md`).

## What the lint checks

`tests/test_d20_architectural_boundary.py`:

**D20BoundaryLintTests** (4 tests, real-source scans):
- `test_src_dir_was_found` — sanity check that lint sees ≥6 modules
- `test_no_module_calls_forbidden_function` — regex scan for
  `merge_pull_request(`, `submit_approval(`, `create_pr_review(`, and
  `event="APPROVED"` across all `src/patchwarden/*.py`
- `test_no_module_uses_forbidden_url` — regex scan for raw `/merge`
  URL substring (prevents bypass via urllib directly)
- `test_forgejo_client_defines_no_forbidden_helpers` — `forgejo_client.py`
  specifically must not contain `def merge_pull_request` / `def submit_approval` /
  `def create_pr_review` definitions, even if no caller uses them yet

**D20BoundarySelfTest** (4 tests, meta-lint regression checks):
- `test_lint_would_catch_a_synthetic_merge_call` — proves the regex
  actually matches a real call
- `test_lint_would_catch_a_synthetic_approval` — proves the regex
  catches `event="APPROVED"` reviews
- `test_lint_would_catch_a_synthetic_merge_url` — proves the URL regex
  catches `.../pulls/1/merge`
- `test_lint_ignores_safe_word_merge_base` — regression: `merge_base`
  is legit (we use it as a field name), lint must skip it

Self-tests protect against regex typos: without them, a broken pattern
could let the entire lint pass vacuously and nobody notices until D20
breaks for real.

## Comment-stripping crudity

`_strip_line_comments` removes everything after `#` per line so that
bare mentions inside `# NEVER call merge_pull_request` don't false-
positive. Limitation: a `#` inside a string literal would split the
line incorrectly — but call patterns require `(` after the symbol
anyway, so this is belt-and-suspenders, not the primary defense.
False-negatives are acceptable; false-positives would be annoying.

## Why this matters

D20 boundary is a fundamental architectural choice — "Patchwarden never
merges, never APPROVES." Before this PR, that choice lived only in
`docs/architecture.md` "Architectural enforcement" prose + the absence
of the helpers themselves. A Sonnet sub-agent doing PR work could
plausibly add `forgejo_client.merge_pull_request` "to be helpful" and
no automated check would catch it until the operator (or a human
reviewer) noticed in the PR diff.

Now there's a test that fails loudly. The rule cannot quietly drift.

## Tests

144 → 152 green (+8 new, all pass on first run including the 4 self-tests
that exercise both positive and negative paths of the lint).

## Atomic per ADR-0017

- 1 file, +199 lines, 0 src changes, 0 deletions.
- No new module, no new dependency, stdlib-only (`re` + `pathlib`).
- `base=main`, no stacking on prior PRs.

## Swiss Cheese context

| PR | Layer | Status |
|---|---|---|
| #54 (PR A merged) | Layer 8 resolver hand-off | ✅ |
| **THIS PR B** | Layer 6 architectural lint | ⏳ awaiting merge |
| PR C (next) | Layer 1 secret-by-filename doc | queued |

## NOT breaking M2 gate

Test-only PR over existing v0. No new feature, no new capability. Per
`docs/decisions.md` D16 + M2 milestone notes, hardening existing
capability is in-scope; this is exactly that.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>