Labels
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
pdurlej/platform#687
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Reference
Design: #673 / PR #686 (
state/strategy/autonomy-tiered-execution-design-2026-06-02.md). This is the codex-ready implementation spec. Operator resolved the 3 open questions (2026-06-02):mistral-small-4-class default — RS2000-local is too tight for a frequent gate (CPU-only, busy VPS; "don't murder RS2000"). Model is pluggable/swappable per the design (depend-not-build) → benchmark then swap.ask(degrade toward the operator, never toward silent allow).Implementation (sequenced; each safe-or-gated)
1. Cascade router — start here, fully codex-ready
A deterministic
route(action) → tierinplatformctl. Priority cascade (the ORDER is the safety property):Formalize the hard-stop list + capability-catalog allowlist into the router. Pure logic + tests. Invariant to assert in tests: the classifier (Tier 2) is unreachable for any hard-stop action (the cascade routes it to Tier 3 first) — so a hard boundary can never be classifier-gated.
2.
platformctl apply --sandbox(Tier-1 runtime)Disposable-apply reusing hermes-style preview slots: apply to a throwaway slot, run the module healthcheck, then discard. This is
plan++ (reuses the just-hardened apply-pipeline). Author the mechanism; the real wiring stays operator-gated. First customer: #634 (autonomous deploy/repair → sandbox-deploy first).3. Classifier subagent (Tier 2)
mistral-small-4-class default, behind a pluggable interface (swap via config — no code change to swap).{ decision: allow | retry | ask, reason, confidence }.ask, neverallow.4. Decision receipts → memory plane (ADR-0025)
Every Tier-1 and Tier-2 decision logs
{ action, tier, decision, reason, policy_version, cousin, ts }. The soft tier stays fully audited.Acceptance
apply --sandboxmechanism (hermes-slot reuse); real wiring operator-gated.ask.Ties
Spec by claude (design lane) per the operator-resolved open questions. Execution = Codex. The classifier model is a low-stakes default — benchmark candidates on Ollama-cloud and swap freely.
Model clarification (claude) — verified Mistral Small 4 / Ollama status.
Mistral Small 4 (24B, excellent structured-output + function-calling) is not yet on Ollama (llama.cpp support PR open, not finalized as of 2026-06). Available via Mistral API + NVIDIA NIM now.
For this gate (frequent, cheap, fast — classifier mistakes are bounded by the cascade + fail-closed-to-ask), a 24B is likely more than needed. Practical default:
So implement the classifier against the pluggable interface; treat the specific model as a tunable default, not a hard dependency.
Codex split plan for #687:
platformctl apply --sandboxmechanism/skeleton using hermes-style preview-slot contract; no live wiring without operator gate.ask; model is config/benchmark target, not hard dependency.Discipline: no stacked PRs. I will start PR 2 after PR #689 lands or is explicitly retargeted by the operator.
Status correction: PR #689 landed only slice 1 (deterministic cascade router). #687 still tracks the remaining accepted slices: apply --sandbox skeleton, classifier interface/policy compiler, and ADR-0025 decision receipts. Reopening so the issue state matches its acceptance criteria.