feat(autonomy): add fail-closed classifier gate #695
No reviewers
Labels
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
pdurlej/platform!695
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "codex/687-classifier-policy"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Canary status: missing — fire canary 3+3 manually before merge
Summary
Adds the Tier-2 autonomy classifier interface for #687 without calling a model or mutating runtime.
This PR keeps the hard architectural guarantee intact: deterministic routing runs first, hard stops never reach the classifier, and missing/invalid/low-confidence classifier output fails closed to
ask.Canary Context Pack
Product story
The operator should not be interrupted for every medium-risk agent action, but irreversible actions must still remain deterministic operator gates. This slice gives Codex/agents a policy-as-text classifier packet for medium actions without weakening hard stops.
What changed
platformctl.autonomy.platformctl autonomy askfor local classifier packet rendering and optional JSON classifier-response validation.Why it changed
#687 is implementing the tiered autonomy gate in slices. PR #689 added the cascade router and PR #694 added apply sandbox receipt mode. This PR adds the Tier-2 classifier interface while still depending on an external cheap model later instead of building one now.
Files touched
control-plane/platformctl/autonomy.pycontrol-plane/platformctl/cli.pycontrol-plane/platformctl/tests/test_autonomy_router.pyRelevant context
state/strategy/autonomy-tiered-execution-design-2026-06-02.mdRuntime evidence
No runtime action, no model call, no Forgejo mutation outside this PR creation. Local validation:
PYTHONPATH=control-plane python3 -m pytest control-plane/platformctl/tests/test_autonomy_router.py→ 15 passedPYTHONPATH=control-plane python3 -m pytest control-plane/platformctl/tests/test_apply_phase3.py control-plane/platformctl/tests/test_apply_env_file.py control-plane/platformctl/tests/test_autonomy_router.py→ 134 passedPYTHONPATH=control-plane python3 -m platformctl.cli validate all --json→ exitCode 0Known constraints
The classifier interface does not call Ollama/OpenAI/Claude. It accepts an optional normalized response artifact and otherwise returns
ask. Decision receipts into ADR-0025 memory plane remain a later #687 slice.Explicit out-of-scope
Requested decision
Approve if the fail-closed classifier packet shape and CLI are acceptable as the Tier-2 interface.
Merge blockers
allow.Spec sources read
state/strategy/autonomy-tiered-execution-design-2026-06-02.md— #687 design and classifier contract.control-plane/platformctl/autonomy.py— existing cascade router implementation.control-plane/platformctl/tests/test_autonomy_router.py— existing router regression coverage.control-plane/platformctl/cli.py— CLI registration conventions.control-plane/platformctl/apply.py— previous sandbox receipt event shape and redaction precedent.Refs #687
Patchwarden PR sanity
advisory_findings6952521359bc913e74787fce7c76279c591cb0ae560presentDeterministic findings
No deterministic findings.
Model reviewers
global-glm/glm-5.1:cloudokOKglobal-deepseek/deepseek-v4-pro:cloudStatus:
okVerdict:
OKmediumIncomplete secret redaction may leak sensitive data in classifier promptcontrol-plane/platformctl/autonomy.py: _SECRET_REPLACEMENTS regex patterns cover only Bearer tokens, 40-char hex strings, olostep_ keys, and key=value assignments. Other common secret formats (e.g., AWS keys, JWT, shorter hex tokens) are noredteam/kimi-k2.6:cloudStatus:
okVerdict:
NOT_OKblockerNaN confidence bypasses fail-closed check for direct AutonomyClassifierResponse objectscontrol-plane/platformctl/autonomy.py: classify_action accepts a pre-constructed AutonomyClassifierResponse and uses it directly without validation (bypassing from_mapping). The confidence gateresponse.confidence < min_confidenceevaluatPolicy notes
PLATFORMCTL_PR_SANITY_REDTEAM_MODELis configured.4d8594af8c11f2ddd12e11f2ddd12e2521359bc9Approved after Patchwarden security-sensitive sanity passed with 12/12 green checks. Automated Codex operator-approved lane; no runtime/destructive action in this PR.
Approved after Patchwarden security-sensitive sanity passed with 12/12 green checks. Automated Codex operator-approved lane; no runtime/destructive action in this PR.