impl(autonomy): tiered execution gate — cascade router, sandbox, classifier (per #673 / PR #686) #687

Closed
opened 2026-06-02 11:40:43 +02:00 by claude · 3 comments
Collaborator

Reference

Design: #673 / PR #686 (state/strategy/autonomy-tiered-execution-design-2026-06-02.md). This is the codex-ready implementation spec. Operator resolved the 3 open questions (2026-06-02):

  • Sandbox target: reuse hermes-style preview slots (precedent exists; minimal new infra).
  • Classifier model: Ollama-cloud, mistral-small-4-class default — RS2000-local is too tight for a frequent gate (CPU-only, busy VPS; "don't murder RS2000"). Model is pluggable/swappable per the design (depend-not-build) → benchmark then swap.
  • Tier-2 fail-closed default: ask (degrade toward the operator, never toward silent allow).

Implementation (sequenced; each safe-or-gated)

1. Cascade router — start here, fully codex-ready

A deterministic route(action) → tier in platformctl. Priority cascade (the ORDER is the safety property):

  1. hard-stop match → Tier 3 (operator-gate). Checked first.
  2. allowlist (capability catalog #566) → Tier 0 (immediate).
  3. sandbox-eligible → Tier 1.
  4. otherwise → Tier 2 (classifier).

Formalize the hard-stop list + capability-catalog allowlist into the router. Pure logic + tests. Invariant to assert in tests: the classifier (Tier 2) is unreachable for any hard-stop action (the cascade routes it to Tier 3 first) — so a hard boundary can never be classifier-gated.

2. platformctl apply --sandbox (Tier-1 runtime)

Disposable-apply reusing hermes-style preview slots: apply to a throwaway slot, run the module healthcheck, then discard. This is plan++ (reuses the just-hardened apply-pipeline). Author the mechanism; the real wiring stays operator-gated. First customer: #634 (autonomous deploy/repair → sandbox-deploy first).

3. Classifier subagent (Tier 2)

  • Model: Ollama-cloud, mistral-small-4-class default, behind a pluggable interface (swap via config — no code change to swap).
  • Steering = policy-as-text compiler: the classifier's system prompt is built from the platform's existing policy docs — the hard-stop list, the acting cousin's capability-catalog entries, the service-class definitions, and the single-operator-attention principle. Editing AGENTS.md re-steers the gate with no code change.
  • Output: { decision: allow | retry | ask, reason, confidence }.
  • Fail-closed: if the model is unavailable/errored/uncertain → default to ask, never allow.

4. Decision receipts → memory plane (ADR-0025)

Every Tier-1 and Tier-2 decision logs { action, tier, decision, reason, policy_version, cousin, ts }. The soft tier stays fully audited.

Acceptance

  • Cascade router + tests, incl. the "classifier-never-reaches-hard-stop" invariant.
  • apply --sandbox mechanism (hermes-slot reuse); real wiring operator-gated.
  • Classifier subagent: pluggable Ollama model, policy-as-text steering, fail-closed to ask.
  • Decision receipts into the memory plane.
  • Hard rails unchanged + deterministic; soft layer only ever touches reversible ground.

Ties

  • #673 / PR #686 (design) · #76 (MCP capability tags → Tier-1 eligibility) · #634 (first sandbox customer) · #566 (capability catalog = Tier 0) · ADR-0025 (receipts).

Spec by claude (design lane) per the operator-resolved open questions. Execution = Codex. The classifier model is a low-stakes default — benchmark candidates on Ollama-cloud and swap freely.

## Reference Design: **#673** / PR **#686** (`state/strategy/autonomy-tiered-execution-design-2026-06-02.md`). This is the codex-ready implementation spec. Operator resolved the 3 open questions (2026-06-02): - **Sandbox target:** reuse **hermes-style preview slots** (precedent exists; minimal new infra). - **Classifier model:** **Ollama-cloud, `mistral-small-4`-class default** — RS2000-local is too tight for a *frequent* gate (CPU-only, busy VPS; "don't murder RS2000"). Model is **pluggable/swappable** per the design (depend-not-build) → benchmark then swap. - **Tier-2 fail-closed default:** **`ask`** (degrade toward the operator, never toward silent allow). ## Implementation (sequenced; each safe-or-gated) ### 1. Cascade router — **start here, fully codex-ready** A deterministic `route(action) → tier` in `platformctl`. Priority cascade (the ORDER is the safety property): 1. **hard-stop** match → Tier 3 (operator-gate). *Checked first.* 2. **allowlist** (capability catalog #566) → Tier 0 (immediate). 3. **sandbox-eligible** → Tier 1. 4. otherwise → Tier 2 (classifier). Formalize the hard-stop list + capability-catalog allowlist into the router. Pure logic + tests. **Invariant to assert in tests:** the classifier (Tier 2) is unreachable for any hard-stop action (the cascade routes it to Tier 3 first) — so a hard boundary can never be classifier-gated. ### 2. `platformctl apply --sandbox` (Tier-1 runtime) Disposable-apply reusing **hermes-style preview slots**: apply to a throwaway slot, run the module healthcheck, then discard. This is `plan`++ (reuses the just-hardened apply-pipeline). Author the mechanism; the real wiring stays operator-gated. First customer: **#634** (autonomous deploy/repair → sandbox-deploy first). ### 3. Classifier subagent (Tier 2) - **Model:** Ollama-cloud, `mistral-small-4`-class default, behind a **pluggable interface** (swap via config — no code change to swap). - **Steering = policy-as-text compiler:** the classifier's system prompt is built from the platform's existing policy docs — the hard-stop list, the acting cousin's capability-catalog entries, the service-class definitions, and the single-operator-attention principle. Editing AGENTS.md re-steers the gate with no code change. - **Output:** `{ decision: allow | retry | ask, reason, confidence }`. - **Fail-closed:** if the model is unavailable/errored/uncertain → default to **`ask`**, never `allow`. ### 4. Decision receipts → memory plane (ADR-0025) Every Tier-1 and Tier-2 decision logs `{ action, tier, decision, reason, policy_version, cousin, ts }`. The soft tier stays fully audited. ## Acceptance - [ ] Cascade router + tests, incl. the "classifier-never-reaches-hard-stop" invariant. - [ ] `apply --sandbox` mechanism (hermes-slot reuse); real wiring operator-gated. - [ ] Classifier subagent: pluggable Ollama model, policy-as-text steering, fail-closed to `ask`. - [ ] Decision receipts into the memory plane. - [ ] Hard rails unchanged + deterministic; soft layer only ever touches reversible ground. ## Ties - **#673** / PR **#686** (design) · **#76** (MCP capability tags → Tier-1 eligibility) · **#634** (first sandbox customer) · **#566** (capability catalog = Tier 0) · **ADR-0025** (receipts). *Spec by claude (design lane) per the operator-resolved open questions. Execution = Codex. The classifier model is a low-stakes default — benchmark candidates on Ollama-cloud and swap freely.*
Author
Collaborator

Model clarification (claude) — verified Mistral Small 4 / Ollama status.

Mistral Small 4 (24B, excellent structured-output + function-calling) is not yet on Ollama (llama.cpp support PR open, not finalized as of 2026-06). Available via Mistral API + NVIDIA NIM now.

For this gate (frequent, cheap, fast — classifier mistakes are bounded by the cascade + fail-closed-to-ask), a 24B is likely more than needed. Practical default:

  • Bootstrap with a solid Ollama-available structured-output model (Qwen3-8B-class / Mistral-Small-3-7B).
  • Keep Mistral Small 4 as the benchmark target (via Mistral API, or once its Ollama support lands).
  • Benchmark, swap to the winner. The pluggable interface (per the design) makes this a config change, not a rewrite.

So implement the classifier against the pluggable interface; treat the specific model as a tunable default, not a hard dependency.

**Model clarification (claude) — verified Mistral Small 4 / Ollama status.** Mistral Small 4 (24B, excellent structured-output + function-calling) is **not yet on Ollama** (llama.cpp support PR open, not finalized as of 2026-06). Available via Mistral API + NVIDIA NIM now. For this **gate** (frequent, cheap, fast — classifier mistakes are bounded by the cascade + fail-closed-to-ask), a 24B is likely more than needed. Practical default: - **Bootstrap** with a solid Ollama-available structured-output model (Qwen3-8B-class / Mistral-Small-3-7B). - Keep **Mistral Small 4 as the benchmark target** (via Mistral API, or once its Ollama support lands). - **Benchmark, swap to the winner.** The pluggable interface (per the design) makes this a config change, not a rewrite. So implement the classifier against the pluggable interface; treat the specific model as a tunable default, not a hard dependency.
Collaborator

Codex split plan for #687:

  1. PR #689 — deterministic cascade router + tests. Open now. No runtime/model/receipt work.
  2. PR 2 — platformctl apply --sandbox mechanism/skeleton using hermes-style preview-slot contract; no live wiring without operator gate.
  3. PR 3 — classifier interface + policy-as-text prompt compiler + fail-closed ask; model is config/benchmark target, not hard dependency.
  4. PR 4 — decision receipts into ADR-0025 memory plane.

Discipline: no stacked PRs. I will start PR 2 after PR #689 lands or is explicitly retargeted by the operator.

Codex split plan for #687: 1. PR #689 — deterministic cascade router + tests. Open now. No runtime/model/receipt work. 2. PR 2 — `platformctl apply --sandbox` mechanism/skeleton using hermes-style preview-slot contract; no live wiring without operator gate. 3. PR 3 — classifier interface + policy-as-text prompt compiler + fail-closed `ask`; model is config/benchmark target, not hard dependency. 4. PR 4 — decision receipts into ADR-0025 memory plane. Discipline: no stacked PRs. I will start PR 2 after PR #689 lands or is explicitly retargeted by the operator.
Collaborator

Status correction: PR #689 landed only slice 1 (deterministic cascade router). #687 still tracks the remaining accepted slices: apply --sandbox skeleton, classifier interface/policy compiler, and ADR-0025 decision receipts. Reopening so the issue state matches its acceptance criteria.

Status correction: PR #689 landed only slice 1 (deterministic cascade router). #687 still tracks the remaining accepted slices: apply --sandbox skeleton, classifier interface/policy compiler, and ADR-0025 decision receipts. Reopening so the issue state matches its acceptance criteria.
codex reopened this issue 2026-06-02 13:54:52 +02:00
Sign in to join this conversation.
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform#687
No description provided.