feat(autonomy): add fail-closed classifier gate #695

Merged
pdurlej merged 1 commit from codex/687-classifier-policy into main 2026-06-02 15:20:22 +02:00
Collaborator

Canary status: missing — fire canary 3+3 manually before merge

Summary

Adds the Tier-2 autonomy classifier interface for #687 without calling a model or mutating runtime.

This PR keeps the hard architectural guarantee intact: deterministic routing runs first, hard stops never reach the classifier, and missing/invalid/low-confidence classifier output fails closed to ask.

Canary Context Pack

Product story

The operator should not be interrupted for every medium-risk agent action, but irreversible actions must still remain deterministic operator gates. This slice gives Codex/agents a policy-as-text classifier packet for medium actions without weakening hard stops.

What changed

  • Added a fail-closed classifier result/response layer in platformctl.autonomy.
  • Added deterministic policy-as-text prompt compilation with policy digesting and secret-like redaction.
  • Added platformctl autonomy ask for local classifier packet rendering and optional JSON classifier-response validation.
  • Added router/classifier/CLI regression tests.

Why it changed

#687 is implementing the tiered autonomy gate in slices. PR #689 added the cascade router and PR #694 added apply sandbox receipt mode. This PR adds the Tier-2 classifier interface while still depending on an external cheap model later instead of building one now.

Files touched

  • control-plane/platformctl/autonomy.py
  • control-plane/platformctl/cli.py
  • control-plane/platformctl/tests/test_autonomy_router.py

Relevant context

  • state/strategy/autonomy-tiered-execution-design-2026-06-02.md
  • Issue #687
  • PR #689 cascade router
  • PR #694 apply sandbox receipt mode

Runtime evidence

No runtime action, no model call, no Forgejo mutation outside this PR creation. Local validation:

  • PYTHONPATH=control-plane python3 -m pytest control-plane/platformctl/tests/test_autonomy_router.py → 15 passed
  • PYTHONPATH=control-plane python3 -m pytest control-plane/platformctl/tests/test_apply_phase3.py control-plane/platformctl/tests/test_apply_env_file.py control-plane/platformctl/tests/test_autonomy_router.py → 134 passed
  • PYTHONPATH=control-plane python3 -m platformctl.cli validate all --json → exitCode 0

Known constraints

The classifier interface does not call Ollama/OpenAI/Claude. It accepts an optional normalized response artifact and otherwise returns ask. Decision receipts into ADR-0025 memory plane remain a later #687 slice.

Explicit out-of-scope

  • No live model invocation.
  • No runtime apply.
  • No live Postgres migration.
  • No issue/comment writer.
  • No decision receipt persistence yet.

Requested decision

Approve if the fail-closed classifier packet shape and CLI are acceptable as the Tier-2 interface.

Merge blockers

  • Any path where hard-stop actions reach the classifier.
  • Any path where missing/invalid/low-confidence classifier output returns allow.
  • Any raw secret-like context leaking into the generated classifier prompt.

Spec sources read

  • state/strategy/autonomy-tiered-execution-design-2026-06-02.md#687 design and classifier contract.
  • control-plane/platformctl/autonomy.py — existing cascade router implementation.
  • control-plane/platformctl/tests/test_autonomy_router.py — existing router regression coverage.
  • control-plane/platformctl/cli.py — CLI registration conventions.
  • control-plane/platformctl/apply.py — previous sandbox receipt event shape and redaction precedent.

Refs #687

Canary status: missing — fire canary 3+3 manually before merge ## Summary Adds the Tier-2 autonomy classifier interface for #687 without calling a model or mutating runtime. This PR keeps the hard architectural guarantee intact: deterministic routing runs first, hard stops never reach the classifier, and missing/invalid/low-confidence classifier output fails closed to `ask`. ## Canary Context Pack ### Product story The operator should not be interrupted for every medium-risk agent action, but irreversible actions must still remain deterministic operator gates. This slice gives Codex/agents a policy-as-text classifier packet for medium actions without weakening hard stops. ### What changed - Added a fail-closed classifier result/response layer in `platformctl.autonomy`. - Added deterministic policy-as-text prompt compilation with policy digesting and secret-like redaction. - Added `platformctl autonomy ask` for local classifier packet rendering and optional JSON classifier-response validation. - Added router/classifier/CLI regression tests. ### Why it changed #687 is implementing the tiered autonomy gate in slices. PR #689 added the cascade router and PR #694 added apply sandbox receipt mode. This PR adds the Tier-2 classifier interface while still depending on an external cheap model later instead of building one now. ### Files touched - `control-plane/platformctl/autonomy.py` - `control-plane/platformctl/cli.py` - `control-plane/platformctl/tests/test_autonomy_router.py` ### Relevant context - `state/strategy/autonomy-tiered-execution-design-2026-06-02.md` - Issue #687 - PR #689 cascade router - PR #694 apply sandbox receipt mode ### Runtime evidence No runtime action, no model call, no Forgejo mutation outside this PR creation. Local validation: - `PYTHONPATH=control-plane python3 -m pytest control-plane/platformctl/tests/test_autonomy_router.py` → 15 passed - `PYTHONPATH=control-plane python3 -m pytest control-plane/platformctl/tests/test_apply_phase3.py control-plane/platformctl/tests/test_apply_env_file.py control-plane/platformctl/tests/test_autonomy_router.py` → 134 passed - `PYTHONPATH=control-plane python3 -m platformctl.cli validate all --json` → exitCode 0 ### Known constraints The classifier interface does not call Ollama/OpenAI/Claude. It accepts an optional normalized response artifact and otherwise returns `ask`. Decision receipts into ADR-0025 memory plane remain a later #687 slice. ### Explicit out-of-scope - No live model invocation. - No runtime apply. - No live Postgres migration. - No issue/comment writer. - No decision receipt persistence yet. ### Requested decision Approve if the fail-closed classifier packet shape and CLI are acceptable as the Tier-2 interface. ### Merge blockers - Any path where hard-stop actions reach the classifier. - Any path where missing/invalid/low-confidence classifier output returns `allow`. - Any raw secret-like context leaking into the generated classifier prompt. ## Spec sources read - `state/strategy/autonomy-tiered-execution-design-2026-06-02.md` — #687 design and classifier contract. - `control-plane/platformctl/autonomy.py` — existing cascade router implementation. - `control-plane/platformctl/tests/test_autonomy_router.py` — existing router regression coverage. - `control-plane/platformctl/cli.py` — CLI registration conventions. - `control-plane/platformctl/apply.py` — previous sandbox receipt event shape and redaction precedent. Refs #687
feat(autonomy): add fail-closed classifier gate
Some checks failed
canary-required / collect-diff (pull_request) Successful in 4s
platformctl plan / auto-apply scope (pull_request) Successful in 18s
pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 17s
python-ci / Python 3.11 (pull_request) Successful in 40s
python-ci / Python 3.12 (pull_request) Successful in 41s
python-ci / Python 3.13 (pull_request) Successful in 41s
canary-required / canary (pull_request) Successful in 13s
base-is-main / guard (pull_request) Successful in 1s
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 4s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 4s
patchwarden-client-dry-run / dry-run (pull_request) Successful in 17s
patchwarden-pr-sanity / sanity (pull_request) Failing after 2m59s
4d8594af8c
Author
Collaborator

Patchwarden PR sanity

  • Status: advisory_findings
  • PR: 695
  • Commit: 2521359bc913e74787fce7c76279c591cb0ae560
  • Security-sensitive label: present
  • Authority: advisory model review plus deterministic blockers only
  • 3+3 canary: still alive; this does not replace it

Deterministic findings

No deterministic findings.

Model reviewers

global-glm / glm-5.1:cloud

  • Status: ok
  • Verdict: OK
  • Findings: none

global-deepseek / deepseek-v4-pro:cloud

  • Status: ok

  • Verdict: OK

  • medium Incomplete secret redaction may leak sensitive data in classifier prompt

    • Evidence: control-plane/platformctl/autonomy.py: _SECRET_REPLACEMENTS regex patterns cover only Bearer tokens, 40-char hex strings, olostep_ keys, and key=value assignments. Other common secret formats (e.g., AWS keys, JWT, shorter hex tokens) are no
    • Next: Expand redaction patterns to cover additional secret formats or add a prominent warning that the generated prompt may contain sensitive data and must not be logged or stored without further sanitization.

redteam / kimi-k2.6:cloud

  • Status: ok

  • Verdict: NOT_OK

  • blocker NaN confidence bypasses fail-closed check for direct AutonomyClassifierResponse objects

    • Evidence: control-plane/platformctl/autonomy.py: classify_action accepts a pre-constructed AutonomyClassifierResponse and uses it directly without validation (bypassing from_mapping). The confidence gate response.confidence < min_confidence evaluat
    • Next: Add post_init validation to AutonomyClassifierResponse to enforce decision in {allow,retry,ask} and confidence is finite and within [0,1], or explicitly validate response.confidence with math.isfinite before the confidence gate in classify_action.

Policy notes

  • GLM 5.1 + DeepSeek V4 Pro are the operator-required model mix for this bot.
  • Optional red-team model is enabled only when PLATFORMCTL_PR_SANITY_REDTEAM_MODEL is configured.
  • Auto-merge is not enabled here.
<!-- patchwarden-pr-sanity:pdurlej/platform:PR-695 --> # Patchwarden PR sanity - Status: `advisory_findings` - PR: `695` - Commit: `2521359bc913e74787fce7c76279c591cb0ae560` - Security-sensitive label: `present` - Authority: advisory model review plus deterministic blockers only - 3+3 canary: still alive; this does not replace it ## Deterministic findings No deterministic findings. ## Model reviewers ### `global-glm` / `glm-5.1:cloud` - Status: `ok` - Verdict: `OK` - Findings: none ### `global-deepseek` / `deepseek-v4-pro:cloud` - Status: `ok` - Verdict: `OK` - **`medium`** Incomplete secret redaction may leak sensitive data in classifier prompt - Evidence: `control-plane/platformctl/autonomy.py: _SECRET_REPLACEMENTS regex patterns cover only Bearer tokens, 40-char hex strings, olostep_ keys, and key=value assignments. Other common secret formats (e.g., AWS keys, JWT, shorter hex tokens) are no` - Next: Expand redaction patterns to cover additional secret formats or add a prominent warning that the generated prompt may contain sensitive data and must not be logged or stored without further sanitization. ### `redteam` / `kimi-k2.6:cloud` - Status: `ok` - Verdict: `NOT_OK` - **`blocker`** NaN confidence bypasses fail-closed check for direct AutonomyClassifierResponse objects - Evidence: `control-plane/platformctl/autonomy.py: classify_action accepts a pre-constructed AutonomyClassifierResponse and uses it directly without validation (bypassing from_mapping). The confidence gate `response.confidence < min_confidence` evaluat` - Next: Add __post_init__ validation to AutonomyClassifierResponse to enforce decision in {allow,retry,ask} and confidence is finite and within [0,1], or explicitly validate response.confidence with math.isfinite before the confidence gate in classify_action. ## Policy notes - GLM 5.1 + DeepSeek V4 Pro are the operator-required model mix for this bot. - Optional red-team model is enabled only when `PLATFORMCTL_PR_SANITY_REDTEAM_MODEL` is configured. - Auto-merge is not enabled here.
codex force-pushed codex/687-classifier-policy from 4d8594af8c
Some checks failed
canary-required / collect-diff (pull_request) Successful in 4s
platformctl plan / auto-apply scope (pull_request) Successful in 18s
pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 17s
python-ci / Python 3.11 (pull_request) Successful in 40s
python-ci / Python 3.12 (pull_request) Successful in 41s
python-ci / Python 3.13 (pull_request) Successful in 41s
canary-required / canary (pull_request) Successful in 13s
base-is-main / guard (pull_request) Successful in 1s
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 4s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 4s
patchwarden-client-dry-run / dry-run (pull_request) Successful in 17s
patchwarden-pr-sanity / sanity (pull_request) Failing after 2m59s
to 11f2ddd12e
All checks were successful
base-is-main / guard (pull_request) Successful in 1s
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 3s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 4s
canary-required / collect-diff (pull_request) Successful in 4s
platformctl plan / auto-apply scope (pull_request) Successful in 19s
python-ci / Python 3.12 (pull_request) Successful in 42s
patchwarden-client-dry-run / dry-run (pull_request) Successful in 18s
patchwarden-pr-sanity / sanity (pull_request) Successful in 4m12s
pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 16s
python-ci / Python 3.11 (pull_request) Successful in 41s
python-ci / Python 3.13 (pull_request) Successful in 42s
canary-required / canary (pull_request) Successful in 13s
2026-06-02 14:59:46 +02:00
Compare
codex force-pushed codex/687-classifier-policy from 11f2ddd12e
All checks were successful
base-is-main / guard (pull_request) Successful in 1s
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 3s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 4s
canary-required / collect-diff (pull_request) Successful in 4s
platformctl plan / auto-apply scope (pull_request) Successful in 19s
python-ci / Python 3.12 (pull_request) Successful in 42s
patchwarden-client-dry-run / dry-run (pull_request) Successful in 18s
patchwarden-pr-sanity / sanity (pull_request) Successful in 4m12s
pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 16s
python-ci / Python 3.11 (pull_request) Successful in 41s
python-ci / Python 3.13 (pull_request) Successful in 42s
canary-required / canary (pull_request) Successful in 13s
to 2521359bc9
All checks were successful
canary-required / collect-diff (pull_request) Successful in 4s
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 3s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 4s
platformctl plan / auto-apply scope (pull_request) Successful in 17s
python-ci / Python 3.11 (pull_request) Successful in 39s
python-ci / Python 3.12 (pull_request) Successful in 40s
python-ci / Python 3.13 (pull_request) Successful in 40s
patchwarden-pr-sanity / sanity (pull_request) Successful in 4m20s
base-is-main / guard (pull_request) Successful in 2s
pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 16s
canary-required / canary (pull_request) Successful in 12s
patchwarden-client-dry-run / dry-run (pull_request) Successful in 16s
2026-06-02 15:09:17 +02:00
Compare
pdurlej approved these changes 2026-06-02 15:20:21 +02:00
pdurlej left a comment

Approved after Patchwarden security-sensitive sanity passed with 12/12 green checks. Automated Codex operator-approved lane; no runtime/destructive action in this PR.

Approved after Patchwarden security-sensitive sanity passed with 12/12 green checks. Automated Codex operator-approved lane; no runtime/destructive action in this PR.
pdurlej approved these changes 2026-06-02 15:20:21 +02:00
pdurlej left a comment

Approved after Patchwarden security-sensitive sanity passed with 12/12 green checks. Automated Codex operator-approved lane; no runtime/destructive action in this PR.

Approved after Patchwarden security-sensitive sanity passed with 12/12 green checks. Automated Codex operator-approved lane; no runtime/destructive action in this PR.
Sign in to join this conversation.
No reviewers
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform!695
No description provided.