policy(design): "all tests/ → policy_governance" conflates guard-tests with feature-tests — refine before auto-approval is enabled on product repos #138

Closed
opened 2026-06-23 20:49:40 +02:00 by claude · 1 comment
Collaborator

Context

#136 added tests/ to policy_governance (policy_bundle.py DEFAULT_BUNDLE + policies/platform.v0.toml), so any PR touching tests/ now classifies policy_governanceneeds_human (hard-manual). This is correct and important — it closes a real hole: without it, the deterministic approval adapter (#129) could auto-approve a PR that guts a safety guard (every drift-guard — test_d20_architectural_boundary, test_controller_approval, test_status_html, the schema guards — lives in tests/). It pairs perfectly with #127. No objection to the change.

The forward design-tension (nothing is broken today; auto-approval isn't enabled yet)

The rule is coarse: it treats all of tests/ identically. That conflates two very different things:

  • Guard / safety tests (the drift-guards, the D20 lint, the controller/approval/executor contracts) — modifying these genuinely should be hard-manual.
  • Ordinary feature tests (the unit tests that ship with a feature) — these are good practice and, in the steady-state vision, should be auto-approvable on the same footing as the code they cover.

For Patchwarden's own repo this is moot — its tests are its safety guards, so "tests = governance" is exactly right. The tension is for governed product repos (the general case the platform.v0.toml dogfood foreshadows): once auto-approval is enabled there, "any test change → hard-manual" means a well-tested product PR is never auto-approvable, while a test-light one is. That's a perverse incentive — in a safety tool, the policy would nudge agents toward writing fewer tests to stay auto-approvable. That directly tensions with the D24 vision of auto-merging routine product work.

Suggested refinement (for the auto-approval phase, not now)

Before auto-approval is turned on for any product repo, distinguish guard-tests from feature-tests — e.g.:

  • a configurable safety_test_prefixes / governance_test_globs (e.g. tests/test_*_guard.py, tests/test_d20_*, the drift-guard set) that stays hard-manual, while
  • other tests/** follows the class of the non-test code in the same PR (or a dedicated lower tier).

Keep Patchwarden's own bundle strict (all tests = governance). This is really about the product-repo policy template.

Conservative-first is the right default per D24, so this is explicitly low-priority / forward — flagging it now so the coarse rule isn't silently inherited as the product-repo default when auto-approval lands. Refs #136, #129; D24. — claude (architect loop)

## Context #136 added `tests/` to `policy_governance` (`policy_bundle.py` `DEFAULT_BUNDLE` + `policies/platform.v0.toml`), so any PR touching `tests/` now classifies `policy_governance` → `needs_human` (hard-manual). **This is correct and important** — it closes a real hole: without it, the deterministic approval adapter (#129) could auto-approve a PR that guts a safety guard (every drift-guard — `test_d20_architectural_boundary`, `test_controller_approval`, `test_status_html`, the schema guards — lives in `tests/`). It pairs perfectly with #127. No objection to the change. ## The forward design-tension (nothing is broken today; auto-approval isn't enabled yet) The rule is **coarse**: it treats *all* of `tests/` identically. That conflates two very different things: - **Guard / safety tests** (the drift-guards, the D20 lint, the controller/approval/executor contracts) — modifying these genuinely should be hard-manual. ✅ - **Ordinary feature tests** (the unit tests that ship *with* a feature) — these are good practice and, in the steady-state vision, should be auto-approvable on the same footing as the code they cover. For **Patchwarden's own repo** this is moot — its tests *are* its safety guards, so "tests = governance" is exactly right. The tension is for **governed product repos** (the general case the `platform.v0.toml` dogfood foreshadows): once auto-approval is enabled there, "any test change → hard-manual" means a *well-tested* product PR is never auto-approvable, while a test-light one is. That's a **perverse incentive — in a safety tool, the policy would nudge agents toward writing fewer tests to stay auto-approvable.** That directly tensions with the D24 vision of auto-merging routine product work. ## Suggested refinement (for the auto-approval phase, not now) Before auto-approval is turned on for any product repo, distinguish guard-tests from feature-tests — e.g.: - a configurable `safety_test_prefixes` / `governance_test_globs` (e.g. `tests/test_*_guard.py`, `tests/test_d20_*`, the drift-guard set) that stays hard-manual, while - other `tests/**` follows the class of the non-test code in the same PR (or a dedicated lower tier). Keep Patchwarden's own bundle strict (all tests = governance). This is really about the product-repo policy template. Conservative-first is the right default per D24, so this is explicitly **low-priority / forward** — flagging it now so the coarse rule isn't silently inherited as the product-repo default when auto-approval lands. Refs #136, #129; D24. — claude (architect loop)
Author
Collaborator

Resolved by #139 — verified fail-closed; one forward caution for the product-config phase

#139 implements this exactly and fail-closed. Verified on origin/main (633 green):

  • governance_test_globs is checked before the feature-test exclusion, so governance wins precedence (a test matching both → policy_governance).
  • The verdict map makes only safe_docs_statuseligible_clean; everything else, including "unknown", → needs_human (pr_check.py). A feature-test alone → classification_paths empty → "unknown"needs_human. So a feature test can never upgrade a PR to eligible — it can only avoid blocking an otherwise all-safe-docs PR.
  • DEFAULT_BUNDLE keeps governance_test_globs=("tests/**",) → Patchwarden's own repo stays strict (all tests governance).
  • Tests lock both ends: test_policy_guard.pypolicy_governance, test_roadmap.pyunknown.

One concrete forward caution (defense-in-depth, for whoever wires a product-repo bundle — not a regression in #139): when a product bundle opts into a broad feature_prefixes=("tests/",), safety then depends entirely on governance_test_globs enumerating every guard test. The example patterns in #139 (tests/test_*_guard.py, tests/test_d20_*.py) would miss most of this repo's actual guardstest_controller_approval, test_cloud_review_enforcement, test_docs_module_inventory, test_status_html, test_artifact_schema_contract, test_handoff_contract_registry. A guard test that doesn't match the globs, bundled with only safe-docs, classifies safe_docs_status → eligible. Mitigation when that phase arrives: ship a recommended default governance_test_globs that covers the drift-guard set (or adopt a test_*_guard.py naming convention + a lint that guard tests match it), so a broad feature opt-in can't silently expose a guard. Recording here rather than reopening — it's a product-config concern, moot until auto-approval is enabled. — claude (architect loop)

## ✅ Resolved by #139 — verified fail-closed; one forward caution for the product-config phase #139 implements this exactly and **fail-closed**. Verified on `origin/main` (633 green): - `governance_test_globs` is checked **before** the feature-test exclusion, so governance wins precedence (a test matching both → `policy_governance`). ✅ - The verdict map makes only `safe_docs_status` → `eligible_clean`; **everything else, including `"unknown"`, → `needs_human`** (`pr_check.py`). A feature-test alone → `classification_paths` empty → `"unknown"` → `needs_human`. So a feature test can never *upgrade* a PR to eligible — it can only avoid blocking an otherwise all-safe-docs PR. ✅ - `DEFAULT_BUNDLE` keeps `governance_test_globs=("tests/**",)` → Patchwarden's own repo stays strict (all tests governance). ✅ - Tests lock both ends: `test_policy_guard.py` → `policy_governance`, `test_roadmap.py` → `unknown`. **One concrete forward caution (defense-in-depth, for whoever wires a product-repo bundle — not a regression in #139):** when a product bundle opts into a broad `feature_prefixes=("tests/",)`, safety then depends **entirely** on `governance_test_globs` enumerating *every* guard test. The example patterns in #139 (`tests/test_*_guard.py`, `tests/test_d20_*.py`) would **miss most of this repo's actual guards** — `test_controller_approval`, `test_cloud_review_enforcement`, `test_docs_module_inventory`, `test_status_html`, `test_artifact_schema_contract`, `test_handoff_contract_registry`. A guard test that doesn't match the globs, bundled with only safe-docs, classifies `safe_docs_status` → eligible. Mitigation when that phase arrives: ship a **recommended default `governance_test_globs`** that covers the drift-guard set (or adopt a `test_*_guard.py` naming convention + a lint that guard tests match it), so a broad feature opt-in can't silently expose a guard. Recording here rather than reopening — it's a product-config concern, moot until auto-approval is enabled. — claude (architect loop)
Sign in to join this conversation.
No labels
agent/claude-code
agent/codex
agent/gemini
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
area:business-model
area:competitive
area:discovery
area:forgejo
area:metrics
area:product-strategy
area:v0-core
cagan-grade-approved
client:platform
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
kind:artifact
kind:decision
kind:dogfood
kind:epic
kind:implementation
kind:research
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
priority:p0
priority:p1
priority:p2
priority:p3
ready-for-agent
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:blocked-on-discovery
status:cagan-grade-review-pending
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:needs-operator-decision
status:operator-needed
status:parked
tier:0-anchor
tier:0-platform-substrate
tier:1-core
tier:1-iskra-value-layer
tier:2-supporting
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
wave:1-foundation
wave:2-positioning
wave:3-validation
wave:4-economics
wave:5-operating
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/patchwarden#138
No description provided.