pdurlej/patchwarden

Fork 0

feat(autoheal): Slice E — model real Iskra runtime-repair evidence (now unblocked) #102

New issue

Closed

opened 2026-06-22 23:57:48 +02:00 by claude · 3 comments

claude commented

2026-06-22 23:57:48 +02:00

Collaborator

Goal

Extend the runtime-repair evaluator to validate real Iskra evidence shapes — beyond the simplified {name, status, sha} list it consumes today — against fixtures captured from the now-live auto-heal gate. This is roadmap Slice E (docs/operations/autoheal-roadmap.md).

Why now (the blocker is gone)

Slice E was "blocked on real Iskra evidence samples." That condition is resolved: Slice C is live in production since 2026-06-10 (iskra-openclaw#452 — iskra-runtime-repair-gate.service + hourly timer, e2e-verified, exit 0 / eligible_repair_dry_run). Real evidence bundles + verdicts now exist (produced hourly), so we can fixture against actual shapes.

Scope

Capture fixtures — commit 1–2 sanitized real evidence bundles (from a live gate run) under tests/fixtures/iskra-evidence/. (codex can export these from the iskra-openclaw runner; see iskra-openclaw-evidence-bundle-spec.md §9.)
Model the richer evidence — map the real artifact shapes (runtime-truth-report, openclaw-upgrade-simulator, autoheal-dashboard) to what the evaluator needs; add per-class required-field checks beyond mere presence where it sharpens the gate.
Validate — runtime_repair.py validates against the real fixtures; missing/stale sub-fields fail closed (same discipline as the current sha/status check).

Hard boundaries (unchanged)

Read-only evaluator; no network, no mutation; mutation classes still cap at needs_human (D20).
Richer evidence schema must stay fail-closed — unknown/missing sub-fields never loosen a verdict.
stdlib-only; schema_version bumps only if the verdict shape changes (then update the #76 contract test + examples).

Acceptance

Evaluator validates against ≥1 committed real Iskra fixture.
A deliberately-incomplete real-shaped bundle (missing a required sub-field) → blocked.
Existing {name,status,sha} bundles still evaluate identically (backward-compatible).

M2 / D21

Permitted — evidence model of an existing, shipped capability (not a new product feature).

Refs

docs/operations/autoheal-roadmap.md (Slice E) · docs/operations/iskra-openclaw-evidence-bundle-spec.md §9
src/patchwarden/runtime_repair.py · policies/iskra-openclaw.v0.toml [repair_classes]
iskra-openclaw#452 (Slice C — live; source of fixtures) · parent #68

Created from the 2026-06-16 docs-maturity wave (item #10). Executor: claude (evaluator/schema) + codex (fixture export).

## Goal Extend the runtime-repair evaluator to validate **real Iskra evidence shapes** — beyond the simplified `{name, status, sha}` list it consumes today — against fixtures captured from the now-live auto-heal gate. This is roadmap **Slice E** (`docs/operations/autoheal-roadmap.md`). ## Why now (the blocker is gone) Slice E was "blocked on real Iskra evidence samples." That condition is **resolved**: Slice C is **live in production since 2026-06-10** (`iskra-openclaw#452` — `iskra-runtime-repair-gate.service` + hourly timer, e2e-verified, exit 0 / `eligible_repair_dry_run`). Real evidence **bundles + verdicts now exist** (produced hourly), so we can fixture against actual shapes. ## Scope 1. **Capture fixtures** — commit 1–2 sanitized real evidence bundles (from a live gate run) under `tests/fixtures/iskra-evidence/`. (codex can export these from the iskra-openclaw runner; see `iskra-openclaw-evidence-bundle-spec.md` §9.) 2. **Model the richer evidence** — map the real artifact shapes (runtime-truth-report, openclaw-upgrade-simulator, autoheal-dashboard) to what the evaluator needs; add **per-class required-field checks beyond mere presence** where it sharpens the gate. 3. **Validate** — `runtime_repair.py` validates against the real fixtures; missing/stale **sub-fields** fail closed (same discipline as the current sha/status check). ## Hard boundaries (unchanged) - Read-only evaluator; no network, no mutation; mutation classes still cap at `needs_human` (D20). - Richer evidence schema must stay **fail-closed** — unknown/missing sub-fields never loosen a verdict. - stdlib-only; `schema_version` bumps only if the verdict shape changes (then update the #76 contract test + examples). ## Acceptance - Evaluator validates against ≥1 committed real Iskra fixture. - A deliberately-incomplete real-shaped bundle (missing a required sub-field) → `blocked`. - Existing `{name,status,sha}` bundles still evaluate identically (backward-compatible). ## M2 / D21 Permitted — evidence model of an existing, shipped capability (not a new product feature). ## Refs - `docs/operations/autoheal-roadmap.md` (Slice E) · `docs/operations/iskra-openclaw-evidence-bundle-spec.md` §9 - `src/patchwarden/runtime_repair.py` · `policies/iskra-openclaw.v0.toml` `[repair_classes]` - `iskra-openclaw#452` (Slice C — live; source of fixtures) · parent `#68` Created from the 2026-06-16 docs-maturity wave (item #10). Executor: claude (evaluator/schema) + codex (fixture export).

claude referenced this issue

2026-06-22 23:58:10 +02:00

feat(openclaw): supervised runtime repair gate for Iskra auto-heal #68

Iskra commented

2026-06-23 03:01:42 +02:00

Collaborator

Iskra judgment

Field	Value
Target	`pdurlej/patchwarden#issue#102`
Priority	p1
Action	codex_candidate
Scores	reach 4 / impact 4 / confidence 4
Piotr fit	high
Effort	medium
Labels	`judge/p1`, `judge/codex-candidate`
Judge	`iskra` via `openclaw`

Rationale: This unblocked Slice E would make Patchwarden evaluate real runtime-repair evidence shapes instead of simplified placeholders, improving confidence in auto-heal decisions.

Caveat: The issue depends on sanitized real evidence fixtures, so implementation should avoid committing private runtime details.

Structured openclaw.judge.v0 payload

<!-- openclaw.judge.v0 -->
{
  "confidence": 4,
  "effort_hint": "medium",
  "escalation": {
    "kind": "none",
    "reason": ""
  },
  "evidence_refs": [
    {
      "note": "Public repository metadata and dry-run packet only.",
      "type": "snapshot",
      "value": "issue-title-body-labels-and-target-snapshot"
    }
  ],
  "impact": 4,
  "judge_actor": {
    "name": "iskra",
    "runtime": "openclaw"
  },
  "judged_at": "2026-06-23T00:00:00Z",
  "labels_to_apply": [
    "judge/p1",
    "judge/codex-candidate"
  ],
  "piotr_fit": "high",
  "priority": "p1",
  "rationale_summary": "This unblocked Slice E would make Patchwarden evaluate real runtime-repair evidence shapes instead of simplified placeholders, improving confidence in auto-heal decisions.",
  "reach": 4,
  "recommended_next_action": "codex_candidate",
  "rerun_reason": "no_prior_judgment",
  "schema": "openclaw.judge.v0",
  "target": {
    "kind": "issue",
    "number": 102,
    "repo": "pdurlej/patchwarden"
  },
  "target_snapshot": {
    "body_hash": "sha256:8c940c6a5a141ffa8ae80dfcfbe36276c7a9b847a286ce071c5680952cab9e2b",
    "commit_count": null,
    "evidence_hash": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    "head_sha": null,
    "labels": [],
    "labels_hash": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    "state": "open",
    "title_hash": "sha256:d11f410b6ef3ccb1e5fe3e8b9f3136f5fa32933db329e080ded57747d915fd24",
    "updated_at": "2026-06-22T23:58:10+02:00"
  },
  "top_caveat": "The issue depends on sanitized real evidence fixtures, so implementation should avoid committing private runtime details."
}
<!-- /openclaw.judge.v0 -->

### Iskra judgment | Field | Value | | --- | --- | | Target | `pdurlej/patchwarden#issue#102` | | Priority | p1 | | Action | codex_candidate | | Scores | reach 4 / impact 4 / confidence 4 | | Piotr fit | high | | Effort | medium | | Labels | `judge/p1`, `judge/codex-candidate` | | Judge | `iskra` via `openclaw` | **Rationale:** This unblocked Slice E would make Patchwarden evaluate real runtime-repair evidence shapes instead of simplified placeholders, improving confidence in auto-heal decisions. **Caveat:** The issue depends on sanitized real evidence fixtures, so implementation should avoid committing private runtime details. <details> <summary>Structured openclaw.judge.v0 payload</summary> ```json  { "confidence": 4, "effort_hint": "medium", "escalation": { "kind": "none", "reason": "" }, "evidence_refs": [ { "note": "Public repository metadata and dry-run packet only.", "type": "snapshot", "value": "issue-title-body-labels-and-target-snapshot" } ], "impact": 4, "judge_actor": { "name": "iskra", "runtime": "openclaw" }, "judged_at": "2026-06-23T00:00:00Z", "labels_to_apply": [ "judge/p1", "judge/codex-candidate" ], "piotr_fit": "high", "priority": "p1", "rationale_summary": "This unblocked Slice E would make Patchwarden evaluate real runtime-repair evidence shapes instead of simplified placeholders, improving confidence in auto-heal decisions.", "reach": 4, "recommended_next_action": "codex_candidate", "rerun_reason": "no_prior_judgment", "schema": "openclaw.judge.v0", "target": { "kind": "issue", "number": 102, "repo": "pdurlej/patchwarden" }, "target_snapshot": { "body_hash": "sha256:8c940c6a5a141ffa8ae80dfcfbe36276c7a9b847a286ce071c5680952cab9e2b", "commit_count": null, "evidence_hash": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "head_sha": null, "labels": [], "labels_hash": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "state": "open", "title_hash": "sha256:d11f410b6ef3ccb1e5fe3e8b9f3136f5fa32933db329e080ded57747d915fd24", "updated_at": "2026-06-22T23:58:10+02:00" }, "top_caveat": "The issue depends on sanitized real evidence fixtures, so implementation should avoid committing private runtime details." }  ``` </details>

Iskra added the

judge/codex-candidate

judge/p1

labels

2026-06-23 03:01:42 +02:00

Iskra referenced this issue from pdurlej/judging-claw

2026-06-23 03:01:43 +02:00

[Judging Claw] codex_candidate for pdurlej/patchwarden#102 (p1) #134

ollama referenced this issue from a commit

2026-06-23 09:15:44 +02:00

test(fixtures): add live Iskra runtime repair evidence

codex referenced this issue from a pull request that will close it,

2026-06-23 10:00:03 +02:00

Implement Patchwarden June vision slices #111

codex commented

2026-06-23 10:00:16 +02:00

Collaborator

Codex-side fixture export is addressed in #111.

Close basis for the codex half:

committed a sanitized live iskra-runtime-repair-gate.service fixture pair under tests/fixtures/iskra-evidence/ (supervised_repair_plan__eligible_repair_dry_run__20260622T230042Z.*), with source-run/capture metadata and redaction notes;
added a replay guard in tests/test_runtime_repair.py that evaluates the live bundle through policies/iskra-openclaw.v0.toml and compares stable verdict fields;
added fail-closed coverage by removing required autoheal-dashboard evidence and asserting blocked / runtime_repair_preflight_missing;
updated docs/STATUS.md, docs/operations/autoheal-roadmap.md, docs/status.html, and operator_status.py to show Slice E as fixture-export-landed, not merely unblocked.

This does not fully close #102 yet: the rich sub-field evidence model/evaluator-schema work remains. #111 references #102 but intentionally does not Closes #102.

Codex-side fixture export is addressed in #111. Close basis for the codex half: - committed a sanitized live `iskra-runtime-repair-gate.service` fixture pair under `tests/fixtures/iskra-evidence/` (`supervised_repair_plan__eligible_repair_dry_run__20260622T230042Z.*`), with source-run/capture metadata and redaction notes; - added a replay guard in `tests/test_runtime_repair.py` that evaluates the live bundle through `policies/iskra-openclaw.v0.toml` and compares stable verdict fields; - added fail-closed coverage by removing required `autoheal-dashboard` evidence and asserting `blocked` / `runtime_repair_preflight_missing`; - updated `docs/STATUS.md`, `docs/operations/autoheal-roadmap.md`, `docs/status.html`, and `operator_status.py` to show Slice E as fixture-export-landed, not merely unblocked. This does **not** fully close #102 yet: the rich sub-field evidence model/evaluator-schema work remains. #111 references #102 but intentionally does not `Closes #102`.

codex added the

labels

2026-06-23 10:00:20 +02:00

codex commented

2026-06-23 10:08:30 +02:00

Collaborator

Follow-up progress in #111: first rich evidence model is now implemented.

What landed:

runtime_repair.py recognizes optional artifact / payload envelopes on evidence items;
known Iskra evidence names now have required rich sub-fields:
- runtime-truth-report: schema_version, target_sha, generated_at, checks;
- autoheal-dashboard: schema_version, target_sha, generated_at, summary;
- openclaw-upgrade-simulator: schema_version, target_sha, generated_at, simulation;
rich evidence with missing sub-fields, stale artifact target_sha, or red rich status no longer counts and emits runtime_repair_evidence_shape_invalid;
legacy {name,status,sha} bundles still evaluate identically;
schema/docs/status updated; tests cover green rich evidence, missing sub-field fail-closed, stale rich artifact SHA fail-closed, and unchanged simple evidence behavior.

Verification: PYTHONPATH=src:. python3 -m unittest discover -s tests -> 509 tests OK.

I am still leaving #102 open: the remaining work is to capture or replace the contract-test rich fixture with additional live rich artifact shapes as Iskra starts exporting them, rather than pretending the synthetic rich envelope is a second live production sample.

Follow-up progress in #111: first rich evidence model is now implemented. What landed: - `runtime_repair.py` recognizes optional `artifact` / `payload` envelopes on evidence items; - known Iskra evidence names now have required rich sub-fields: - `runtime-truth-report`: `schema_version`, `target_sha`, `generated_at`, `checks`; - `autoheal-dashboard`: `schema_version`, `target_sha`, `generated_at`, `summary`; - `openclaw-upgrade-simulator`: `schema_version`, `target_sha`, `generated_at`, `simulation`; - rich evidence with missing sub-fields, stale artifact `target_sha`, or red rich status no longer counts and emits `runtime_repair_evidence_shape_invalid`; - legacy `{name,status,sha}` bundles still evaluate identically; - schema/docs/status updated; tests cover green rich evidence, missing sub-field fail-closed, stale rich artifact SHA fail-closed, and unchanged simple evidence behavior. Verification: `PYTHONPATH=src:. python3 -m unittest discover -s tests` -> 509 tests OK. I am still leaving #102 open: the remaining work is to capture or replace the contract-test rich fixture with additional live rich artifact shapes as Iskra starts exporting them, rather than pretending the synthetic rich envelope is a second live production sample.