feat(autoheal): runtime-repair evidence evaluator (#68 first slice, refs #65) #70

Merged
pdurlej merged 1 commit from claude/autoheal-runtime-repair-evaluator into main 2026-06-08 18:14:46 +02:00
Collaborator

What

The first useful slice of #68 (supervised runtime repair gate): a read-only Patchwarden evaluator that consumes an Iskra/OpenClaw runtime-evidence bundle and emits a deterministic verdict. It is the policy-gate sibling of the PR-level openclaw-runtime-maintenance-gate that landed in #66.

patchwarden runtime-repair-check --evidence-file bundle.json [--operator-approval]

Never mutates runtime, never calls the network — same discipline as every Patchwarden evaluator (D20).

Boundary (operator decision 2026-06-07)

  • Iskra/OpenClaw owns runtime evidence + execution (probes, simulators, dashboards, apply scripts).
  • Patchwarden owns the gate (which repairs are eligible, what evidence is required, whether mutation may be considered).
  • Operator owns approval — every runtime mutation stays a human decision.

Verdicts (fail-closed)

Verdict Exit When
eligible_repair_dry_run 0 read-only class (e.g. deploy_drift_probe from #65) + all required evidence current-success
needs_human 1 mutation class + green evidence + operator approval — Patchwarden never auto-authorizes mutation
blocked 1 evidence missing/stale, OR mutation class without approval, OR unknown repair class

Key invariant (#68 acceptance): a runtime mutation NEVER reaches eligible_repair_dry_run. Even green + approved, a mutation class is at most needs_human. --operator-approval only lifts a mutation from blockedneeds_human; it never makes Patchwarden say "go". Evidence counts only when status==success and sha==target_sha (stale evidence doesn't count).

Files

File
src/patchwarden/runtime_repair.py evaluator + REPAIR_CLASS_POLICIES (5 classes: 2 read-only, 3 mutation)
src/patchwarden/cli.py new runtime-repair-check subcommand
spec/schemas/runtime-repair-verdict.schema.json + example output contract
docs/operations/openclaw-runtime-repair-evaluator.md boundary + verdict reference
tests/test_runtime_repair.py 18 tests

#68 acceptance check

  • Fixture derived from Iskra runtime evidence can be evaluated without touching production (pure function over JSON).
  • Missing runtime-truth-report / openclaw-upgrade-simulator / autoheal-dashboard blocks runtime-maintenance candidates.
  • Runtime mutation remains needs_human even when evidence is green.
  • deploy_drift_probe (#65) is one repair class, not the whole system.
  • Default mode read-only / proposal-only.

Tests

PYTHONPATH=src python3 -m unittest discover tests
Ran 178 tests in 0.180s
OK

160 baseline + 18 new. Example validates against its schema. D20 boundary lint still green; new module has zero merge/approve/network surface.

Out of scope (next slices)

  • No apply/execute path — proposal-only by design.
  • No network/Forgejo writes from the evaluator.
  • Wiring into an iskra-openclaw workflow (separate, like the platform dogfood wiring was).
  • clawsweeper read-only-AI + deterministic-executor patterns (#69 triage) — design fold-in later.

Closes the first slice of #68. Refs #65.

## What The first useful slice of **#68** (supervised runtime repair gate): a read-only Patchwarden evaluator that consumes an Iskra/OpenClaw **runtime-evidence bundle** and emits a deterministic verdict. It is the policy-gate sibling of the PR-level `openclaw-runtime-maintenance-gate` that landed in #66. `patchwarden runtime-repair-check --evidence-file bundle.json [--operator-approval]` Never mutates runtime, never calls the network — same discipline as every Patchwarden evaluator (D20). ## Boundary (operator decision 2026-06-07) - **Iskra/OpenClaw** owns runtime evidence + execution (probes, simulators, dashboards, apply scripts). - **Patchwarden** owns the gate (which repairs are eligible, what evidence is required, whether mutation may be considered). - **Operator** owns approval — every runtime mutation stays a human decision. ## Verdicts (fail-closed) | Verdict | Exit | When | |---|---|---| | `eligible_repair_dry_run` | 0 | read-only class (e.g. `deploy_drift_probe` from #65) + all required evidence current-success | | `needs_human` | 1 | mutation class + green evidence + operator approval — Patchwarden never auto-authorizes mutation | | `blocked` | 1 | evidence missing/stale, OR mutation class without approval, OR unknown repair class | **Key invariant (#68 acceptance):** a runtime mutation NEVER reaches `eligible_repair_dry_run`. Even green + approved, a mutation class is at most `needs_human`. `--operator-approval` only lifts a mutation from `blocked` → `needs_human`; it never makes Patchwarden say "go". Evidence counts only when `status==success` **and** `sha==target_sha` (stale evidence doesn't count). ## Files | File | | |---|---| | `src/patchwarden/runtime_repair.py` | evaluator + `REPAIR_CLASS_POLICIES` (5 classes: 2 read-only, 3 mutation) | | `src/patchwarden/cli.py` | new `runtime-repair-check` subcommand | | `spec/schemas/runtime-repair-verdict.schema.json` + example | output contract | | `docs/operations/openclaw-runtime-repair-evaluator.md` | boundary + verdict reference | | `tests/test_runtime_repair.py` | 18 tests | ## #68 acceptance check - ✅ Fixture derived from Iskra runtime evidence can be evaluated without touching production (pure function over JSON). - ✅ Missing `runtime-truth-report` / `openclaw-upgrade-simulator` / `autoheal-dashboard` blocks runtime-maintenance candidates. - ✅ Runtime mutation remains `needs_human` even when evidence is green. - ✅ `deploy_drift_probe` (#65) is one repair class, not the whole system. - ✅ Default mode read-only / proposal-only. ## Tests ``` PYTHONPATH=src python3 -m unittest discover tests Ran 178 tests in 0.180s OK ``` 160 baseline + 18 new. Example validates against its schema. D20 boundary lint still green; new module has zero merge/approve/network surface. ## Out of scope (next slices) - No apply/execute path — proposal-only by design. - No network/Forgejo writes from the evaluator. - Wiring into an iskra-openclaw workflow (separate, like the platform dogfood wiring was). - `clawsweeper` read-only-AI + deterministic-executor patterns (#69 triage) — design fold-in later. Closes the first slice of #68. Refs #65.
Adds the Patchwarden policy-gate half of the Iskra/OpenClaw auto-heal
boundary: a read-only evaluator that consumes a runtime-evidence bundle
and emits a deterministic verdict. Never mutates runtime, never calls the
network — same discipline as every Patchwarden evaluator (D20).

Boundary (operator decision 2026-06-07): Iskra/OpenClaw owns runtime
evidence + execution; Patchwarden owns the gate; the operator owns
approval of every runtime mutation.

New:
- src/patchwarden/runtime_repair.py — evaluate_runtime_repair():
  REPAIR_CLASS_POLICIES maps repair_class -> (required_evidence, mutation).
  Verdicts: eligible_repair_dry_run (read-only class + green evidence, exit 0),
  needs_human (mutation class + approval + green — never auto-eligible),
  blocked (missing/stale evidence, mutation without approval, or unknown class).
  Evidence counts only when status==success AND sha==target_sha.
- cli.py: new `runtime-repair-check --evidence-file [--operator-approval]`.
- spec/schemas/runtime-repair-verdict.schema.json + example.
- docs/operations/openclaw-runtime-repair-evaluator.md — boundary + verdicts.
- tests/test_runtime_repair.py — 18 tests (all 3 verdicts, fail-closed cases,
  unknown class, payload contract, CLI smoke).

Key invariant (#68 acceptance): a runtime mutation NEVER reaches
eligible_repair_dry_run; even green+approved it is at most needs_human.
deploy_drift_probe (#65) is one read-only repair class, not the whole system.

Tests: PYTHONPATH=src python3 -m unittest discover tests -> 178/178 OK
(160 baseline + 18 new). Example validates against its schema. D20 boundary
clean (no merge/approve/network surface in the new module).

Refs: pdurlej/patchwarden#68, pdurlej/patchwarden#65

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pdurlej deleted branch claude/autoheal-runtime-repair-evaluator 2026-06-08 18:14:46 +02:00
Sign in to join this conversation.
No reviewers
No labels
agent/claude-code
agent/codex
agent/gemini
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
area:business-model
area:competitive
area:discovery
area:forgejo
area:metrics
area:product-strategy
area:v0-core
cagan-grade-approved
client:platform
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
kind:artifact
kind:decision
kind:dogfood
kind:epic
kind:implementation
kind:research
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
priority:p0
priority:p1
priority:p2
priority:p3
ready-for-agent
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:blocked-on-discovery
status:cagan-grade-review-pending
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:needs-operator-decision
status:operator-needed
status:parked
tier:0-anchor
tier:0-platform-substrate
tier:1-core
tier:1-iskra-value-layer
tier:2-supporting
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
wave:1-foundation
wave:2-positioning
wave:3-validation
wave:4-economics
wave:5-operating
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/patchwarden!70
No description provided.