feat(autoheal): runtime-repair evidence evaluator (#68 first slice, refs #65) #70

Merged

pdurlej merged 1 commit from claude/autoheal-runtime-repair-evaluator into main

2026-06-08 18:14:46 +02:00

claude commented

2026-06-08 18:04:35 +02:00

Collaborator

What

The first useful slice of #68 (supervised runtime repair gate): a read-only Patchwarden evaluator that consumes an Iskra/OpenClaw runtime-evidence bundle and emits a deterministic verdict. It is the policy-gate sibling of the PR-level openclaw-runtime-maintenance-gate that landed in #66.

patchwarden runtime-repair-check --evidence-file bundle.json [--operator-approval]

Never mutates runtime, never calls the network — same discipline as every Patchwarden evaluator (D20).

Boundary (operator decision 2026-06-07)

Iskra/OpenClaw owns runtime evidence + execution (probes, simulators, dashboards, apply scripts).
Patchwarden owns the gate (which repairs are eligible, what evidence is required, whether mutation may be considered).
Operator owns approval — every runtime mutation stays a human decision.

Verdicts (fail-closed)

Verdict	Exit	When
`eligible_repair_dry_run`	0	read-only class (e.g. `deploy_drift_probe` from #65) + all required evidence current-success
`needs_human`	1	mutation class + green evidence + operator approval — Patchwarden never auto-authorizes mutation
`blocked`	1	evidence missing/stale, OR mutation class without approval, OR unknown repair class

Key invariant (#68 acceptance): a runtime mutation NEVER reaches eligible_repair_dry_run. Even green + approved, a mutation class is at most needs_human. --operator-approval only lifts a mutation from blocked → needs_human; it never makes Patchwarden say "go". Evidence counts only when status==success and sha==target_sha (stale evidence doesn't count).

Files

File
`src/patchwarden/runtime_repair.py`	evaluator + `REPAIR_CLASS_POLICIES` (5 classes: 2 read-only, 3 mutation)
`src/patchwarden/cli.py`	new `runtime-repair-check` subcommand
`spec/schemas/runtime-repair-verdict.schema.json` + example	output contract
`docs/operations/openclaw-runtime-repair-evaluator.md`	boundary + verdict reference
`tests/test_runtime_repair.py`	18 tests

#68 acceptance check

✅ Fixture derived from Iskra runtime evidence can be evaluated without touching production (pure function over JSON).
✅ Missing runtime-truth-report / openclaw-upgrade-simulator / autoheal-dashboard blocks runtime-maintenance candidates.
✅ Runtime mutation remains needs_human even when evidence is green.
✅ deploy_drift_probe (#65) is one repair class, not the whole system.
✅ Default mode read-only / proposal-only.

Tests

PYTHONPATH=src python3 -m unittest discover tests
Ran 178 tests in 0.180s
OK

160 baseline + 18 new. Example validates against its schema. D20 boundary lint still green; new module has zero merge/approve/network surface.

Out of scope (next slices)

No apply/execute path — proposal-only by design.
No network/Forgejo writes from the evaluator.
Wiring into an iskra-openclaw workflow (separate, like the platform dogfood wiring was).
clawsweeper read-only-AI + deterministic-executor patterns (#69 triage) — design fold-in later.

Closes the first slice of #68. Refs #65.

## What The first useful slice of **#68** (supervised runtime repair gate): a read-only Patchwarden evaluator that consumes an Iskra/OpenClaw **runtime-evidence bundle** and emits a deterministic verdict. It is the policy-gate sibling of the PR-level `openclaw-runtime-maintenance-gate` that landed in #66. `patchwarden runtime-repair-check --evidence-file bundle.json [--operator-approval]` Never mutates runtime, never calls the network — same discipline as every Patchwarden evaluator (D20). ## Boundary (operator decision 2026-06-07) - **Iskra/OpenClaw** owns runtime evidence + execution (probes, simulators, dashboards, apply scripts). - **Patchwarden** owns the gate (which repairs are eligible, what evidence is required, whether mutation may be considered). - **Operator** owns approval — every runtime mutation stays a human decision. ## Verdicts (fail-closed) | Verdict | Exit | When | |---|---|---| | `eligible_repair_dry_run` | 0 | read-only class (e.g. `deploy_drift_probe` from #65) + all required evidence current-success | | `needs_human` | 1 | mutation class + green evidence + operator approval — Patchwarden never auto-authorizes mutation | | `blocked` | 1 | evidence missing/stale, OR mutation class without approval, OR unknown repair class | **Key invariant (#68 acceptance):** a runtime mutation NEVER reaches `eligible_repair_dry_run`. Even green + approved, a mutation class is at most `needs_human`. `--operator-approval` only lifts a mutation from `blocked` → `needs_human`; it never makes Patchwarden say "go". Evidence counts only when `status==success` **and** `sha==target_sha` (stale evidence doesn't count). ## Files | File | | |---|---| | `src/patchwarden/runtime_repair.py` | evaluator + `REPAIR_CLASS_POLICIES` (5 classes: 2 read-only, 3 mutation) | | `src/patchwarden/cli.py` | new `runtime-repair-check` subcommand | | `spec/schemas/runtime-repair-verdict.schema.json` + example | output contract | | `docs/operations/openclaw-runtime-repair-evaluator.md` | boundary + verdict reference | | `tests/test_runtime_repair.py` | 18 tests | ## #68 acceptance check - ✅ Fixture derived from Iskra runtime evidence can be evaluated without touching production (pure function over JSON). - ✅ Missing `runtime-truth-report` / `openclaw-upgrade-simulator` / `autoheal-dashboard` blocks runtime-maintenance candidates. - ✅ Runtime mutation remains `needs_human` even when evidence is green. - ✅ `deploy_drift_probe` (#65) is one repair class, not the whole system. - ✅ Default mode read-only / proposal-only. ## Tests ``` PYTHONPATH=src python3 -m unittest discover tests Ran 178 tests in 0.180s OK ``` 160 baseline + 18 new. Example validates against its schema. D20 boundary lint still green; new module has zero merge/approve/network surface. ## Out of scope (next slices) - No apply/execute path — proposal-only by design. - No network/Forgejo writes from the evaluator. - Wiring into an iskra-openclaw workflow (separate, like the platform dogfood wiring was). - `clawsweeper` read-only-AI + deterministic-executor patterns (#69 triage) — design fold-in later. Closes the first slice of #68. Refs #65.

Rows
Columns