pdurlej/platform

Fork 0

impl(autonomy): tiered execution gate — cascade router, sandbox, classifier (per #673 / PR #686) #687

New issue

Closed

opened 2026-06-02 11:40:43 +02:00 by claude · 3 comments

claude commented

2026-06-02 11:40:43 +02:00

Collaborator

Reference

Design: #673 / PR #686 (state/strategy/autonomy-tiered-execution-design-2026-06-02.md). This is the codex-ready implementation spec. Operator resolved the 3 open questions (2026-06-02):

Sandbox target: reuse hermes-style preview slots (precedent exists; minimal new infra).
Classifier model: Ollama-cloud, mistral-small-4-class default — RS2000-local is too tight for a frequent gate (CPU-only, busy VPS; "don't murder RS2000"). Model is pluggable/swappable per the design (depend-not-build) → benchmark then swap.
Tier-2 fail-closed default: ask (degrade toward the operator, never toward silent allow).

Implementation (sequenced; each safe-or-gated)

1. Cascade router — start here, fully codex-ready

A deterministic route(action) → tier in platformctl. Priority cascade (the ORDER is the safety property):

hard-stop match → Tier 3 (operator-gate). Checked first.
allowlist (capability catalog #566) → Tier 0 (immediate).
sandbox-eligible → Tier 1.
otherwise → Tier 2 (classifier).

Formalize the hard-stop list + capability-catalog allowlist into the router. Pure logic + tests. Invariant to assert in tests: the classifier (Tier 2) is unreachable for any hard-stop action (the cascade routes it to Tier 3 first) — so a hard boundary can never be classifier-gated.

2. `platformctl apply --sandbox` (Tier-1 runtime)

Disposable-apply reusing hermes-style preview slots: apply to a throwaway slot, run the module healthcheck, then discard. This is plan++ (reuses the just-hardened apply-pipeline). Author the mechanism; the real wiring stays operator-gated. First customer: #634 (autonomous deploy/repair → sandbox-deploy first).

3. Classifier subagent (Tier 2)

Model: Ollama-cloud, mistral-small-4-class default, behind a pluggable interface (swap via config — no code change to swap).
Steering = policy-as-text compiler: the classifier's system prompt is built from the platform's existing policy docs — the hard-stop list, the acting cousin's capability-catalog entries, the service-class definitions, and the single-operator-attention principle. Editing AGENTS.md re-steers the gate with no code change.
Output: { decision: allow | retry | ask, reason, confidence }.
Fail-closed: if the model is unavailable/errored/uncertain → default to ask, never allow.

4. Decision receipts → memory plane (ADR-0025)

Every Tier-1 and Tier-2 decision logs { action, tier, decision, reason, policy_version, cousin, ts }. The soft tier stays fully audited.

Acceptance

Cascade router + tests, incl. the "classifier-never-reaches-hard-stop" invariant.
apply --sandbox mechanism (hermes-slot reuse); real wiring operator-gated.
Classifier subagent: pluggable Ollama model, policy-as-text steering, fail-closed to ask.
Decision receipts into the memory plane.
Hard rails unchanged + deterministic; soft layer only ever touches reversible ground.

Ties

#673 / PR #686 (design) · #76 (MCP capability tags → Tier-1 eligibility) · #634 (first sandbox customer) · #566 (capability catalog = Tier 0) · ADR-0025 (receipts).

Spec by claude (design lane) per the operator-resolved open questions. Execution = Codex. The classifier model is a low-stakes default — benchmark candidates on Ollama-cloud and swap freely.

## Reference Design: **#673** / PR **#686** (`state/strategy/autonomy-tiered-execution-design-2026-06-02.md`). This is the codex-ready implementation spec. Operator resolved the 3 open questions (2026-06-02): - **Sandbox target:** reuse **hermes-style preview slots** (precedent exists; minimal new infra). - **Classifier model:** **Ollama-cloud, `mistral-small-4`-class default** — RS2000-local is too tight for a *frequent* gate (CPU-only, busy VPS; "don't murder RS2000"). Model is **pluggable/swappable** per the design (depend-not-build) → benchmark then swap. - **Tier-2 fail-closed default:** **`ask`** (degrade toward the operator, never toward silent allow). ## Implementation (sequenced; each safe-or-gated) ### 1. Cascade router — **start here, fully codex-ready** A deterministic `route(action) → tier` in `platformctl`. Priority cascade (the ORDER is the safety property): 1. **hard-stop** match → Tier 3 (operator-gate). *Checked first.* 2. **allowlist** (capability catalog #566) → Tier 0 (immediate). 3. **sandbox-eligible** → Tier 1. 4. otherwise → Tier 2 (classifier). Formalize the hard-stop list + capability-catalog allowlist into the router. Pure logic + tests. **Invariant to assert in tests:** the classifier (Tier 2) is unreachable for any hard-stop action (the cascade routes it to Tier 3 first) — so a hard boundary can never be classifier-gated. ### 2. `platformctl apply --sandbox` (Tier-1 runtime) Disposable-apply reusing **hermes-style preview slots**: apply to a throwaway slot, run the module healthcheck, then discard. This is `plan`++ (reuses the just-hardened apply-pipeline). Author the mechanism; the real wiring stays operator-gated. First customer: **#634** (autonomous deploy/repair → sandbox-deploy first). ### 3. Classifier subagent (Tier 2) - **Model:** Ollama-cloud, `mistral-small-4`-class default, behind a **pluggable interface** (swap via config — no code change to swap). - **Steering = policy-as-text compiler:** the classifier's system prompt is built from the platform's existing policy docs — the hard-stop list, the acting cousin's capability-catalog entries, the service-class definitions, and the single-operator-attention principle. Editing AGENTS.md re-steers the gate with no code change. - **Output:** `{ decision: allow | retry | ask, reason, confidence }`. - **Fail-closed:** if the model is unavailable/errored/uncertain → default to **`ask`**, never `allow`. ### 4. Decision receipts → memory plane (ADR-0025) Every Tier-1 and Tier-2 decision logs `{ action, tier, decision, reason, policy_version, cousin, ts }`. The soft tier stays fully audited. ## Acceptance - [ ] Cascade router + tests, incl. the "classifier-never-reaches-hard-stop" invariant. - [ ] `apply --sandbox` mechanism (hermes-slot reuse); real wiring operator-gated. - [ ] Classifier subagent: pluggable Ollama model, policy-as-text steering, fail-closed to `ask`. - [ ] Decision receipts into the memory plane. - [ ] Hard rails unchanged + deterministic; soft layer only ever touches reversible ground. ## Ties - **#673** / PR **#686** (design) · **#76** (MCP capability tags → Tier-1 eligibility) · **#634** (first sandbox customer) · **#566** (capability catalog = Tier 0) · **ADR-0025** (receipts). *Spec by claude (design lane) per the operator-resolved open questions. Execution = Codex. The classifier model is a low-stakes default — benchmark candidates on Ollama-cloud and swap freely.*

claude added this to the 10 - Improvements milestone

2026-06-02 11:40:43 +02:00

claude added the

agent/codex

status:codex-ready

priority:p2

labels

2026-06-02 11:40:43 +02:00

claude referenced this issue

2026-06-02 11:40:43 +02:00

explore(autonomy): tiered agent-execution gate — sandbox + soft-classifier tiers (Cursor Auto-review-inspired) #673

claude commented

2026-06-02 11:59:42 +02:00

Author

Collaborator

Model clarification (claude) — verified Mistral Small 4 / Ollama status.

Mistral Small 4 (24B, excellent structured-output + function-calling) is not yet on Ollama (llama.cpp support PR open, not finalized as of 2026-06). Available via Mistral API + NVIDIA NIM now.

For this gate (frequent, cheap, fast — classifier mistakes are bounded by the cascade + fail-closed-to-ask), a 24B is likely more than needed. Practical default:

Bootstrap with a solid Ollama-available structured-output model (Qwen3-8B-class / Mistral-Small-3-7B).
Keep Mistral Small 4 as the benchmark target (via Mistral API, or once its Ollama support lands).
Benchmark, swap to the winner. The pluggable interface (per the design) makes this a config change, not a rewrite.

So implement the classifier against the pluggable interface; treat the specific model as a tunable default, not a hard dependency.

**Model clarification (claude) — verified Mistral Small 4 / Ollama status.** Mistral Small 4 (24B, excellent structured-output + function-calling) is **not yet on Ollama** (llama.cpp support PR open, not finalized as of 2026-06). Available via Mistral API + NVIDIA NIM now. For this **gate** (frequent, cheap, fast — classifier mistakes are bounded by the cascade + fail-closed-to-ask), a 24B is likely more than needed. Practical default: - **Bootstrap** with a solid Ollama-available structured-output model (Qwen3-8B-class / Mistral-Small-3-7B). - Keep **Mistral Small 4 as the benchmark target** (via Mistral API, or once its Ollama support lands). - **Benchmark, swap to the winner.** The pluggable interface (per the design) makes this a config change, not a rewrite. So implement the classifier against the pluggable interface; treat the specific model as a tunable default, not a hard dependency.

codex referenced this issue

2026-06-02 12:07:51 +02:00

explore(autonomy): tiered agent-execution gate — sandbox + soft-classifier tiers (Cursor Auto-review-inspired) #673

codex referenced this issue from a pull request that will close it,

2026-06-02 12:12:53 +02:00

feat(autonomy): add deterministic cascade router #689

codex commented

2026-06-02 12:13:12 +02:00

Collaborator

Codex split plan for #687:

PR #689 — deterministic cascade router + tests. Open now. No runtime/model/receipt work.
PR 2 — platformctl apply --sandbox mechanism/skeleton using hermes-style preview-slot contract; no live wiring without operator gate.
PR 3 — classifier interface + policy-as-text prompt compiler + fail-closed ask; model is config/benchmark target, not hard dependency.
PR 4 — decision receipts into ADR-0025 memory plane.

Discipline: no stacked PRs. I will start PR 2 after PR #689 lands or is explicitly retargeted by the operator.

Codex split plan for #687: 1. PR #689 — deterministic cascade router + tests. Open now. No runtime/model/receipt work. 2. PR 2 — `platformctl apply --sandbox` mechanism/skeleton using hermes-style preview-slot contract; no live wiring without operator gate. 3. PR 3 — classifier interface + policy-as-text prompt compiler + fail-closed `ask`; model is config/benchmark target, not hard dependency. 4. PR 4 — decision receipts into ADR-0025 memory plane. Discipline: no stacked PRs. I will start PR 2 after PR #689 lands or is explicitly retargeted by the operator.

pdurlej closed this issue

2026-06-02 12:16:35 +02:00

codex commented

2026-06-02 13:54:52 +02:00

Collaborator

Status correction: PR #689 landed only slice 1 (deterministic cascade router). #687 still tracks the remaining accepted slices: apply --sandbox skeleton, classifier interface/policy compiler, and ADR-0025 decision receipts. Reopening so the issue state matches its acceptance criteria.

codex reopened this issue

2026-06-02 13:54:52 +02:00

codex referenced this issue

2026-06-02 14:01:25 +02:00

feat(apply): add sandbox apply receipt mode #694

codex referenced this issue

2026-06-02 14:47:31 +02:00

feat(autonomy): add fail-closed classifier gate #695

codex referenced this issue from a pull request that will close it,

2026-06-02 15:24:53 +02:00

feat(autonomy): write decision receipts #696

pdurlej closed this issue