pdurlej/platform

Fork 0

feat(autonomy): add fail-closed classifier gate #695

Merged

pdurlej merged 1 commit from codex/687-classifier-policy into main

2026-06-02 15:20:22 +02:00

codex commented

2026-06-02 14:47:31 +02:00

Collaborator

Canary status: missing — fire canary 3+3 manually before merge

Summary

Adds the Tier-2 autonomy classifier interface for #687 without calling a model or mutating runtime.

This PR keeps the hard architectural guarantee intact: deterministic routing runs first, hard stops never reach the classifier, and missing/invalid/low-confidence classifier output fails closed to ask.

Canary Context Pack

Product story

The operator should not be interrupted for every medium-risk agent action, but irreversible actions must still remain deterministic operator gates. This slice gives Codex/agents a policy-as-text classifier packet for medium actions without weakening hard stops.

What changed

Added a fail-closed classifier result/response layer in platformctl.autonomy.
Added deterministic policy-as-text prompt compilation with policy digesting and secret-like redaction.
Added platformctl autonomy ask for local classifier packet rendering and optional JSON classifier-response validation.
Added router/classifier/CLI regression tests.

Why it changed

#687 is implementing the tiered autonomy gate in slices. PR #689 added the cascade router and PR #694 added apply sandbox receipt mode. This PR adds the Tier-2 classifier interface while still depending on an external cheap model later instead of building one now.

Files touched

control-plane/platformctl/autonomy.py
control-plane/platformctl/cli.py
control-plane/platformctl/tests/test_autonomy_router.py

Relevant context

state/strategy/autonomy-tiered-execution-design-2026-06-02.md
Issue #687
PR #689 cascade router
PR #694 apply sandbox receipt mode

Runtime evidence

No runtime action, no model call, no Forgejo mutation outside this PR creation. Local validation:

PYTHONPATH=control-plane python3 -m pytest control-plane/platformctl/tests/test_autonomy_router.py → 15 passed
PYTHONPATH=control-plane python3 -m pytest control-plane/platformctl/tests/test_apply_phase3.py control-plane/platformctl/tests/test_apply_env_file.py control-plane/platformctl/tests/test_autonomy_router.py → 134 passed
PYTHONPATH=control-plane python3 -m platformctl.cli validate all --json → exitCode 0

Known constraints

The classifier interface does not call Ollama/OpenAI/Claude. It accepts an optional normalized response artifact and otherwise returns ask. Decision receipts into ADR-0025 memory plane remain a later #687 slice.

Explicit out-of-scope

No live model invocation.
No runtime apply.
No live Postgres migration.
No issue/comment writer.
No decision receipt persistence yet.

Requested decision

Approve if the fail-closed classifier packet shape and CLI are acceptable as the Tier-2 interface.

Merge blockers

Any path where hard-stop actions reach the classifier.
Any path where missing/invalid/low-confidence classifier output returns allow.
Any raw secret-like context leaking into the generated classifier prompt.

Spec sources read

state/strategy/autonomy-tiered-execution-design-2026-06-02.md — #687 design and classifier contract.
control-plane/platformctl/autonomy.py — existing cascade router implementation.
control-plane/platformctl/tests/test_autonomy_router.py — existing router regression coverage.
control-plane/platformctl/cli.py — CLI registration conventions.
control-plane/platformctl/apply.py — previous sandbox receipt event shape and redaction precedent.

Refs #687

Canary status: missing — fire canary 3+3 manually before merge ## Summary Adds the Tier-2 autonomy classifier interface for #687 without calling a model or mutating runtime. This PR keeps the hard architectural guarantee intact: deterministic routing runs first, hard stops never reach the classifier, and missing/invalid/low-confidence classifier output fails closed to `ask`. ## Canary Context Pack ### Product story The operator should not be interrupted for every medium-risk agent action, but irreversible actions must still remain deterministic operator gates. This slice gives Codex/agents a policy-as-text classifier packet for medium actions without weakening hard stops. ### What changed - Added a fail-closed classifier result/response layer in `platformctl.autonomy`. - Added deterministic policy-as-text prompt compilation with policy digesting and secret-like redaction. - Added `platformctl autonomy ask` for local classifier packet rendering and optional JSON classifier-response validation. - Added router/classifier/CLI regression tests. ### Why it changed #687 is implementing the tiered autonomy gate in slices. PR #689 added the cascade router and PR #694 added apply sandbox receipt mode. This PR adds the Tier-2 classifier interface while still depending on an external cheap model later instead of building one now. ### Files touched - `control-plane/platformctl/autonomy.py` - `control-plane/platformctl/cli.py` - `control-plane/platformctl/tests/test_autonomy_router.py` ### Relevant context - `state/strategy/autonomy-tiered-execution-design-2026-06-02.md` - Issue #687 - PR #689 cascade router - PR #694 apply sandbox receipt mode ### Runtime evidence No runtime action, no model call, no Forgejo mutation outside this PR creation. Local validation: - `PYTHONPATH=control-plane python3 -m pytest control-plane/platformctl/tests/test_autonomy_router.py` → 15 passed - `PYTHONPATH=control-plane python3 -m pytest control-plane/platformctl/tests/test_apply_phase3.py control-plane/platformctl/tests/test_apply_env_file.py control-plane/platformctl/tests/test_autonomy_router.py` → 134 passed - `PYTHONPATH=control-plane python3 -m platformctl.cli validate all --json` → exitCode 0 ### Known constraints The classifier interface does not call Ollama/OpenAI/Claude. It accepts an optional normalized response artifact and otherwise returns `ask`. Decision receipts into ADR-0025 memory plane remain a later #687 slice. ### Explicit out-of-scope - No live model invocation. - No runtime apply. - No live Postgres migration. - No issue/comment writer. - No decision receipt persistence yet. ### Requested decision Approve if the fail-closed classifier packet shape and CLI are acceptable as the Tier-2 interface. ### Merge blockers - Any path where hard-stop actions reach the classifier. - Any path where missing/invalid/low-confidence classifier output returns `allow`. - Any raw secret-like context leaking into the generated classifier prompt. ## Spec sources read - `state/strategy/autonomy-tiered-execution-design-2026-06-02.md` — #687 design and classifier contract. - `control-plane/platformctl/autonomy.py` — existing cascade router implementation. - `control-plane/platformctl/tests/test_autonomy_router.py` — existing router regression coverage. - `control-plane/platformctl/cli.py` — CLI registration conventions. - `control-plane/platformctl/apply.py` — previous sandbox receipt event shape and redaction precedent. Refs #687

codex added 1 commit

2026-06-02 14:47:31 +02:00

feat(autonomy): add fail-closed classifier gate

canary-required / collect-diff (pull_request) Successful in 4s

Details

platformctl plan / auto-apply scope (pull_request) Successful in 18s

Details

pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 17s

Details

python-ci / Python 3.11 (pull_request) Successful in 40s

Details

python-ci / Python 3.13 (pull_request) Successful in 41s

Details

patchwarden-pr-sanity / sanity (pull_request) Failing after 2m59s

Details

python-ci / Python 3.12 (pull_request) Successful in 41s

Details

canary-required / canary (pull_request) Successful in 13s

Details

base-is-main / guard (pull_request) Successful in 1s

Details

patchwarden-client-dry-run / collect-diff (pull_request) Successful in 4s

Details

patchwarden-pr-sanity / collect-diff (pull_request) Successful in 4s

Details

patchwarden-client-dry-run / dry-run (pull_request) Successful in 17s

Details

4d8594af8c

codex added the

class/security-sensitive

label

2026-06-02 14:47:55 +02:00

codex commented

2026-06-02 14:55:10 +02:00

Author

Collaborator

Patchwarden PR sanity

Status: advisory_findings
PR: 695
Commit: 2521359bc913e74787fce7c76279c591cb0ae560
Security-sensitive label: present
Authority: advisory model review plus deterministic blockers only
3+3 canary: still alive; this does not replace it

Deterministic findings

No deterministic findings.

Model reviewers

`global-glm` / `glm-5.1:cloud`

Status: ok
Verdict: OK
Findings: none

`global-deepseek` / `deepseek-v4-pro:cloud`

Status: ok
Verdict: OK
medium Incomplete secret redaction may leak sensitive data in classifier prompt
- Evidence: control-plane/platformctl/autonomy.py: _SECRET_REPLACEMENTS regex patterns cover only Bearer tokens, 40-char hex strings, olostep_ keys, and key=value assignments. Other common secret formats (e.g., AWS keys, JWT, shorter hex tokens) are no
- Next: Expand redaction patterns to cover additional secret formats or add a prominent warning that the generated prompt may contain sensitive data and must not be logged or stored without further sanitization.

`redteam` / `kimi-k2.6:cloud`

Status: ok
Verdict: NOT_OK
blocker NaN confidence bypasses fail-closed check for direct AutonomyClassifierResponse objects
- Evidence: control-plane/platformctl/autonomy.py: classify_action accepts a pre-constructed AutonomyClassifierResponse and uses it directly without validation (bypassing from_mapping). The confidence gate response.confidence < min_confidence evaluat
- Next: Add post_init validation to AutonomyClassifierResponse to enforce decision in {allow,retry,ask} and confidence is finite and within [0,1], or explicitly validate response.confidence with math.isfinite before the confidence gate in classify_action.

Policy notes

GLM 5.1 + DeepSeek V4 Pro are the operator-required model mix for this bot.
Optional red-team model is enabled only when PLATFORMCTL_PR_SANITY_REDTEAM_MODEL is configured.
Auto-merge is not enabled here.

# Patchwarden PR sanity - Status: `advisory_findings` - PR: `695` - Commit: `2521359bc913e74787fce7c76279c591cb0ae560` - Security-sensitive label: `present` - Authority: advisory model review plus deterministic blockers only - 3+3 canary: still alive; this does not replace it ## Deterministic findings No deterministic findings. ## Model reviewers ### `global-glm` / `glm-5.1:cloud` - Status: `ok` - Verdict: `OK` - Findings: none ### `global-deepseek` / `deepseek-v4-pro:cloud` - Status: `ok` - Verdict: `OK` - **`medium`** Incomplete secret redaction may leak sensitive data in classifier prompt - Evidence: `control-plane/platformctl/autonomy.py: _SECRET_REPLACEMENTS regex patterns cover only Bearer tokens, 40-char hex strings, olostep_ keys, and key=value assignments. Other common secret formats (e.g., AWS keys, JWT, shorter hex tokens) are no` - Next: Expand redaction patterns to cover additional secret formats or add a prominent warning that the generated prompt may contain sensitive data and must not be logged or stored without further sanitization. ### `redteam` / `kimi-k2.6:cloud` - Status: `ok` - Verdict: `NOT_OK` - **`blocker`** NaN confidence bypasses fail-closed check for direct AutonomyClassifierResponse objects - Evidence: `control-plane/platformctl/autonomy.py: classify_action accepts a pre-constructed AutonomyClassifierResponse and uses it directly without validation (bypassing from_mapping). The confidence gate `response.confidence < min_confidence` evaluat` - Next: Add __post_init__ validation to AutonomyClassifierResponse to enforce decision in {allow,retry,ask} and confidence is finite and within [0,1], or explicitly validate response.confidence with math.isfinite before the confidence gate in classify_action. ## Policy notes - GLM 5.1 + DeepSeek V4 Pro are the operator-required model mix for this bot. - Optional red-team model is enabled only when `PLATFORMCTL_PR_SANITY_REDTEAM_MODEL` is configured. - Auto-merge is not enabled here.

codex force-pushed codex/687-classifier-policy from 4d8594af8c

canary-required / collect-diff (pull_request) Successful in 4s

Details

platformctl plan / auto-apply scope (pull_request) Successful in 18s

Details

pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 17s

Details

python-ci / Python 3.11 (pull_request) Successful in 40s

Details

python-ci / Python 3.13 (pull_request) Successful in 41s

Details

patchwarden-pr-sanity / sanity (pull_request) Failing after 2m59s

Details

python-ci / Python 3.12 (pull_request) Successful in 41s

Details

canary-required / canary (pull_request) Successful in 13s

Details

base-is-main / guard (pull_request) Successful in 1s

Details

patchwarden-client-dry-run / collect-diff (pull_request) Successful in 4s

Details

patchwarden-pr-sanity / collect-diff (pull_request) Successful in 4s

Details

patchwarden-client-dry-run / dry-run (pull_request) Successful in 17s

Details

to 11f2ddd12e

base-is-main / guard (pull_request) Successful in 1s

Details

patchwarden-client-dry-run / collect-diff (pull_request) Successful in 3s

Details

patchwarden-pr-sanity / collect-diff (pull_request) Successful in 4s

Details

canary-required / collect-diff (pull_request) Successful in 4s

Details

platformctl plan / auto-apply scope (pull_request) Successful in 19s

Details

python-ci / Python 3.12 (pull_request) Successful in 42s

Details

patchwarden-client-dry-run / dry-run (pull_request) Successful in 18s

Details

patchwarden-pr-sanity / sanity (pull_request) Successful in 4m12s

Details

pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 16s

Details

python-ci / Python 3.11 (pull_request) Successful in 41s

Details

python-ci / Python 3.13 (pull_request) Successful in 42s

Details

canary-required / canary (pull_request) Successful in 13s

Details

2026-06-02 14:59:46 +02:00

Compare

codex force-pushed codex/687-classifier-policy from 11f2ddd12e

base-is-main / guard (pull_request) Successful in 1s

Details

patchwarden-client-dry-run / collect-diff (pull_request) Successful in 3s

Details

patchwarden-pr-sanity / collect-diff (pull_request) Successful in 4s

Details

canary-required / collect-diff (pull_request) Successful in 4s

Details

platformctl plan / auto-apply scope (pull_request) Successful in 19s

Details

python-ci / Python 3.12 (pull_request) Successful in 42s

Details

patchwarden-client-dry-run / dry-run (pull_request) Successful in 18s

Details

patchwarden-pr-sanity / sanity (pull_request) Successful in 4m12s

Details

pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 16s

Details

python-ci / Python 3.11 (pull_request) Successful in 41s

Details

python-ci / Python 3.13 (pull_request) Successful in 42s

Details

canary-required / canary (pull_request) Successful in 13s

Details

to 2521359bc9

canary-required / collect-diff (pull_request) Successful in 4s

Details

patchwarden-client-dry-run / collect-diff (pull_request) Successful in 3s

Details

patchwarden-pr-sanity / collect-diff (pull_request) Successful in 4s

Details

platformctl plan / auto-apply scope (pull_request) Successful in 17s

Details

python-ci / Python 3.11 (pull_request) Successful in 39s

Details

python-ci / Python 3.12 (pull_request) Successful in 40s

Details

python-ci / Python 3.13 (pull_request) Successful in 40s

Details

patchwarden-pr-sanity / sanity (pull_request) Successful in 4m20s

Details

base-is-main / guard (pull_request) Successful in 2s

Details

pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 16s

Details

canary-required / canary (pull_request) Successful in 12s

Details

patchwarden-client-dry-run / dry-run (pull_request) Successful in 16s

Details

2026-06-02 15:09:17 +02:00

Compare

pdurlej approved these changes

2026-06-02 15:20:21 +02:00

pdurlej left a comment

Approved after Patchwarden security-sensitive sanity passed with 12/12 green checks. Automated Codex operator-approved lane; no runtime/destructive action in this PR.

pdurlej approved these changes

2026-06-02 15:20:21 +02:00

pdurlej left a comment

Approved after Patchwarden security-sensitive sanity passed with 12/12 green checks. Automated Codex operator-approved lane; no runtime/destructive action in this PR.

pdurlej merged commit 3af827d0ec into main

2026-06-02 15:20:22 +02:00

pdurlej referenced this pull request from a commit

2026-06-02 15:20:22 +02:00

Merge pull request #695 from codex/687-classifier-policy

codex referenced this pull request

2026-06-02 15:24:53 +02:00

feat(autonomy): write decision receipts #696