policy(design): "all tests/ → policy_governance" conflates guard-tests with feature-tests — refine before auto-approval is enabled on product repos #138

New issue

Closed

opened 2026-06-23 20:49:40 +02:00 by claude · 1 comment

claude commented

2026-06-23 20:49:40 +02:00

Collaborator

Context

#136 added tests/ to policy_governance (policy_bundle.py DEFAULT_BUNDLE + policies/platform.v0.toml), so any PR touching tests/ now classifies policy_governance → needs_human (hard-manual). This is correct and important — it closes a real hole: without it, the deterministic approval adapter (#129) could auto-approve a PR that guts a safety guard (every drift-guard — test_d20_architectural_boundary, test_controller_approval, test_status_html, the schema guards — lives in tests/). It pairs perfectly with #127. No objection to the change.

The forward design-tension (nothing is broken today; auto-approval isn't enabled yet)

The rule is coarse: it treats all of tests/ identically. That conflates two very different things:

Guard / safety tests (the drift-guards, the D20 lint, the controller/approval/executor contracts) — modifying these genuinely should be hard-manual. ✅
Ordinary feature tests (the unit tests that ship with a feature) — these are good practice and, in the steady-state vision, should be auto-approvable on the same footing as the code they cover.

For Patchwarden's own repo this is moot — its tests are its safety guards, so "tests = governance" is exactly right. The tension is for governed product repos (the general case the platform.v0.toml dogfood foreshadows): once auto-approval is enabled there, "any test change → hard-manual" means a well-tested product PR is never auto-approvable, while a test-light one is. That's a perverse incentive — in a safety tool, the policy would nudge agents toward writing fewer tests to stay auto-approvable. That directly tensions with the D24 vision of auto-merging routine product work.

Suggested refinement (for the auto-approval phase, not now)

Before auto-approval is turned on for any product repo, distinguish guard-tests from feature-tests — e.g.:

a configurable safety_test_prefixes / governance_test_globs (e.g. tests/test_*_guard.py, tests/test_d20_*, the drift-guard set) that stays hard-manual, while
other tests/** follows the class of the non-test code in the same PR (or a dedicated lower tier).

Keep Patchwarden's own bundle strict (all tests = governance). This is really about the product-repo policy template.

Conservative-first is the right default per D24, so this is explicitly low-priority / forward — flagging it now so the coarse rule isn't silently inherited as the product-repo default when auto-approval lands. Refs #136, #129; D24. — claude (architect loop)

## Context #136 added `tests/` to `policy_governance` (`policy_bundle.py` `DEFAULT_BUNDLE` + `policies/platform.v0.toml`), so any PR touching `tests/` now classifies `policy_governance` → `needs_human` (hard-manual). **This is correct and important** — it closes a real hole: without it, the deterministic approval adapter (#129) could auto-approve a PR that guts a safety guard (every drift-guard — `test_d20_architectural_boundary`, `test_controller_approval`, `test_status_html`, the schema guards — lives in `tests/`). It pairs perfectly with #127. No objection to the change. ## The forward design-tension (nothing is broken today; auto-approval isn't enabled yet) The rule is **coarse**: it treats *all* of `tests/` identically. That conflates two very different things: - **Guard / safety tests** (the drift-guards, the D20 lint, the controller/approval/executor contracts) — modifying these genuinely should be hard-manual. ✅ - **Ordinary feature tests** (the unit tests that ship *with* a feature) — these are good practice and, in the steady-state vision, should be auto-approvable on the same footing as the code they cover. For **Patchwarden's own repo** this is moot — its tests *are* its safety guards, so "tests = governance" is exactly right. The tension is for **governed product repos** (the general case the `platform.v0.toml` dogfood foreshadows): once auto-approval is enabled there, "any test change → hard-manual" means a *well-tested* product PR is never auto-approvable, while a test-light one is. That's a **perverse incentive — in a safety tool, the policy would nudge agents toward writing fewer tests to stay auto-approvable.** That directly tensions with the D24 vision of auto-merging routine product work. ## Suggested refinement (for the auto-approval phase, not now) Before auto-approval is turned on for any product repo, distinguish guard-tests from feature-tests — e.g.: - a configurable `safety_test_prefixes` / `governance_test_globs` (e.g. `tests/test_*_guard.py`, `tests/test_d20_*`, the drift-guard set) that stays hard-manual, while - other `tests/**` follows the class of the non-test code in the same PR (or a dedicated lower tier). Keep Patchwarden's own bundle strict (all tests = governance). This is really about the product-repo policy template. Conservative-first is the right default per D24, so this is explicitly **low-priority / forward** — flagging it now so the coarse rule isn't silently inherited as the product-repo default when auto-approval lands. Refs #136, #129; D24. — claude (architect loop)

claude added the

area:v0-core

judge/codex-candidate

labels

2026-06-23 20:49:40 +02:00

codex referenced this issue from a pull request that will close it,

2026-06-23 20:55:05 +02:00

fix(policy): split guard and feature tests #139

pdurlej closed this issue

2026-06-23 20:57:54 +02:00

codex referenced this issue

2026-06-23 21:04:17 +02:00

feat(policy): add product repo template #140

claude commented

2026-06-23 21:06:11 +02:00

Author

Collaborator

✅ Resolved by #139 — verified fail-closed; one forward caution for the product-config phase

#139 implements this exactly and fail-closed. Verified on origin/main (633 green):

governance_test_globs is checked before the feature-test exclusion, so governance wins precedence (a test matching both → policy_governance). ✅
The verdict map makes only safe_docs_status → eligible_clean; everything else, including "unknown", → needs_human (pr_check.py). A feature-test alone → classification_paths empty → "unknown" → needs_human. So a feature test can never upgrade a PR to eligible — it can only avoid blocking an otherwise all-safe-docs PR. ✅
DEFAULT_BUNDLE keeps governance_test_globs=("tests/**",) → Patchwarden's own repo stays strict (all tests governance). ✅
Tests lock both ends: test_policy_guard.py → policy_governance, test_roadmap.py → unknown.

One concrete forward caution (defense-in-depth, for whoever wires a product-repo bundle — not a regression in #139): when a product bundle opts into a broad feature_prefixes=("tests/",), safety then depends entirely on governance_test_globs enumerating every guard test. The example patterns in #139 (tests/test_*_guard.py, tests/test_d20_*.py) would miss most of this repo's actual guards — test_controller_approval, test_cloud_review_enforcement, test_docs_module_inventory, test_status_html, test_artifact_schema_contract, test_handoff_contract_registry. A guard test that doesn't match the globs, bundled with only safe-docs, classifies safe_docs_status → eligible. Mitigation when that phase arrives: ship a recommended default governance_test_globs that covers the drift-guard set (or adopt a test_*_guard.py naming convention + a lint that guard tests match it), so a broad feature opt-in can't silently expose a guard. Recording here rather than reopening — it's a product-config concern, moot until auto-approval is enabled. — claude (architect loop)

## ✅ Resolved by #139 — verified fail-closed; one forward caution for the product-config phase #139 implements this exactly and **fail-closed**. Verified on `origin/main` (633 green): - `governance_test_globs` is checked **before** the feature-test exclusion, so governance wins precedence (a test matching both → `policy_governance`). ✅ - The verdict map makes only `safe_docs_status` → `eligible_clean`; **everything else, including `"unknown"`, → `needs_human`** (`pr_check.py`). A feature-test alone → `classification_paths` empty → `"unknown"` → `needs_human`. So a feature test can never *upgrade* a PR to eligible — it can only avoid blocking an otherwise all-safe-docs PR. ✅ - `DEFAULT_BUNDLE` keeps `governance_test_globs=("tests/**",)` → Patchwarden's own repo stays strict (all tests governance). ✅ - Tests lock both ends: `test_policy_guard.py` → `policy_governance`, `test_roadmap.py` → `unknown`. **One concrete forward caution (defense-in-depth, for whoever wires a product-repo bundle — not a regression in #139):** when a product bundle opts into a broad `feature_prefixes=("tests/",)`, safety then depends **entirely** on `governance_test_globs` enumerating *every* guard test. The example patterns in #139 (`tests/test_*_guard.py`, `tests/test_d20_*.py`) would **miss most of this repo's actual guards** — `test_controller_approval`, `test_cloud_review_enforcement`, `test_docs_module_inventory`, `test_status_html`, `test_artifact_schema_contract`, `test_handoff_contract_registry`. A guard test that doesn't match the globs, bundled with only safe-docs, classifies `safe_docs_status` → eligible. Mitigation when that phase arrives: ship a **recommended default `governance_test_globs`** that covers the drift-guard set (or adopt a `test_*_guard.py` naming convention + a lint that guard tests match it), so a broad feature opt-in can't silently expose a guard. Recording here rather than reopening — it's a product-config concern, moot until auto-approval is enabled. — claude (architect loop)