feat(autoheal): Slice E — model real Iskra runtime-repair evidence (now unblocked) #102

Closed
opened 2026-06-22 23:57:48 +02:00 by claude · 3 comments
Collaborator

Goal

Extend the runtime-repair evaluator to validate real Iskra evidence shapes — beyond the simplified {name, status, sha} list it consumes today — against fixtures captured from the now-live auto-heal gate. This is roadmap Slice E (docs/operations/autoheal-roadmap.md).

Why now (the blocker is gone)

Slice E was "blocked on real Iskra evidence samples." That condition is resolved: Slice C is live in production since 2026-06-10 (iskra-openclaw#452iskra-runtime-repair-gate.service + hourly timer, e2e-verified, exit 0 / eligible_repair_dry_run). Real evidence bundles + verdicts now exist (produced hourly), so we can fixture against actual shapes.

Scope

  1. Capture fixtures — commit 1–2 sanitized real evidence bundles (from a live gate run) under tests/fixtures/iskra-evidence/. (codex can export these from the iskra-openclaw runner; see iskra-openclaw-evidence-bundle-spec.md §9.)
  2. Model the richer evidence — map the real artifact shapes (runtime-truth-report, openclaw-upgrade-simulator, autoheal-dashboard) to what the evaluator needs; add per-class required-field checks beyond mere presence where it sharpens the gate.
  3. Validateruntime_repair.py validates against the real fixtures; missing/stale sub-fields fail closed (same discipline as the current sha/status check).

Hard boundaries (unchanged)

  • Read-only evaluator; no network, no mutation; mutation classes still cap at needs_human (D20).
  • Richer evidence schema must stay fail-closed — unknown/missing sub-fields never loosen a verdict.
  • stdlib-only; schema_version bumps only if the verdict shape changes (then update the #76 contract test + examples).

Acceptance

  • Evaluator validates against ≥1 committed real Iskra fixture.
  • A deliberately-incomplete real-shaped bundle (missing a required sub-field) → blocked.
  • Existing {name,status,sha} bundles still evaluate identically (backward-compatible).

M2 / D21

Permitted — evidence model of an existing, shipped capability (not a new product feature).

Refs

  • docs/operations/autoheal-roadmap.md (Slice E) · docs/operations/iskra-openclaw-evidence-bundle-spec.md §9
  • src/patchwarden/runtime_repair.py · policies/iskra-openclaw.v0.toml [repair_classes]
  • iskra-openclaw#452 (Slice C — live; source of fixtures) · parent #68

Created from the 2026-06-16 docs-maturity wave (item #10). Executor: claude (evaluator/schema) + codex (fixture export).

## Goal Extend the runtime-repair evaluator to validate **real Iskra evidence shapes** — beyond the simplified `{name, status, sha}` list it consumes today — against fixtures captured from the now-live auto-heal gate. This is roadmap **Slice E** (`docs/operations/autoheal-roadmap.md`). ## Why now (the blocker is gone) Slice E was "blocked on real Iskra evidence samples." That condition is **resolved**: Slice C is **live in production since 2026-06-10** (`iskra-openclaw#452` — `iskra-runtime-repair-gate.service` + hourly timer, e2e-verified, exit 0 / `eligible_repair_dry_run`). Real evidence **bundles + verdicts now exist** (produced hourly), so we can fixture against actual shapes. ## Scope 1. **Capture fixtures** — commit 1–2 sanitized real evidence bundles (from a live gate run) under `tests/fixtures/iskra-evidence/`. (codex can export these from the iskra-openclaw runner; see `iskra-openclaw-evidence-bundle-spec.md` §9.) 2. **Model the richer evidence** — map the real artifact shapes (runtime-truth-report, openclaw-upgrade-simulator, autoheal-dashboard) to what the evaluator needs; add **per-class required-field checks beyond mere presence** where it sharpens the gate. 3. **Validate** — `runtime_repair.py` validates against the real fixtures; missing/stale **sub-fields** fail closed (same discipline as the current sha/status check). ## Hard boundaries (unchanged) - Read-only evaluator; no network, no mutation; mutation classes still cap at `needs_human` (D20). - Richer evidence schema must stay **fail-closed** — unknown/missing sub-fields never loosen a verdict. - stdlib-only; `schema_version` bumps only if the verdict shape changes (then update the #76 contract test + examples). ## Acceptance - Evaluator validates against ≥1 committed real Iskra fixture. - A deliberately-incomplete real-shaped bundle (missing a required sub-field) → `blocked`. - Existing `{name,status,sha}` bundles still evaluate identically (backward-compatible). ## M2 / D21 Permitted — evidence model of an existing, shipped capability (not a new product feature). ## Refs - `docs/operations/autoheal-roadmap.md` (Slice E) · `docs/operations/iskra-openclaw-evidence-bundle-spec.md` §9 - `src/patchwarden/runtime_repair.py` · `policies/iskra-openclaw.v0.toml` `[repair_classes]` - `iskra-openclaw#452` (Slice C — live; source of fixtures) · parent `#68` Created from the 2026-06-16 docs-maturity wave (item #10). Executor: claude (evaluator/schema) + codex (fixture export).
Collaborator

Iskra judgment

Field Value
Target pdurlej/patchwarden#issue#102
Priority p1
Action codex_candidate
Scores reach 4 / impact 4 / confidence 4
Piotr fit high
Effort medium
Labels judge/p1, judge/codex-candidate
Judge iskra via openclaw

Rationale: This unblocked Slice E would make Patchwarden evaluate real runtime-repair evidence shapes instead of simplified placeholders, improving confidence in auto-heal decisions.

Caveat: The issue depends on sanitized real evidence fixtures, so implementation should avoid committing private runtime details.

Structured openclaw.judge.v0 payload
<!-- openclaw.judge.v0 -->
{
  "confidence": 4,
  "effort_hint": "medium",
  "escalation": {
    "kind": "none",
    "reason": ""
  },
  "evidence_refs": [
    {
      "note": "Public repository metadata and dry-run packet only.",
      "type": "snapshot",
      "value": "issue-title-body-labels-and-target-snapshot"
    }
  ],
  "impact": 4,
  "judge_actor": {
    "name": "iskra",
    "runtime": "openclaw"
  },
  "judged_at": "2026-06-23T00:00:00Z",
  "labels_to_apply": [
    "judge/p1",
    "judge/codex-candidate"
  ],
  "piotr_fit": "high",
  "priority": "p1",
  "rationale_summary": "This unblocked Slice E would make Patchwarden evaluate real runtime-repair evidence shapes instead of simplified placeholders, improving confidence in auto-heal decisions.",
  "reach": 4,
  "recommended_next_action": "codex_candidate",
  "rerun_reason": "no_prior_judgment",
  "schema": "openclaw.judge.v0",
  "target": {
    "kind": "issue",
    "number": 102,
    "repo": "pdurlej/patchwarden"
  },
  "target_snapshot": {
    "body_hash": "sha256:8c940c6a5a141ffa8ae80dfcfbe36276c7a9b847a286ce071c5680952cab9e2b",
    "commit_count": null,
    "evidence_hash": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    "head_sha": null,
    "labels": [],
    "labels_hash": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    "state": "open",
    "title_hash": "sha256:d11f410b6ef3ccb1e5fe3e8b9f3136f5fa32933db329e080ded57747d915fd24",
    "updated_at": "2026-06-22T23:58:10+02:00"
  },
  "top_caveat": "The issue depends on sanitized real evidence fixtures, so implementation should avoid committing private runtime details."
}
<!-- /openclaw.judge.v0 -->
### Iskra judgment | Field | Value | | --- | --- | | Target | `pdurlej/patchwarden#issue#102` | | Priority | p1 | | Action | codex_candidate | | Scores | reach 4 / impact 4 / confidence 4 | | Piotr fit | high | | Effort | medium | | Labels | `judge/p1`, `judge/codex-candidate` | | Judge | `iskra` via `openclaw` | **Rationale:** This unblocked Slice E would make Patchwarden evaluate real runtime-repair evidence shapes instead of simplified placeholders, improving confidence in auto-heal decisions. **Caveat:** The issue depends on sanitized real evidence fixtures, so implementation should avoid committing private runtime details. <details> <summary>Structured openclaw.judge.v0 payload</summary> ```json <!-- openclaw.judge.v0 --> { "confidence": 4, "effort_hint": "medium", "escalation": { "kind": "none", "reason": "" }, "evidence_refs": [ { "note": "Public repository metadata and dry-run packet only.", "type": "snapshot", "value": "issue-title-body-labels-and-target-snapshot" } ], "impact": 4, "judge_actor": { "name": "iskra", "runtime": "openclaw" }, "judged_at": "2026-06-23T00:00:00Z", "labels_to_apply": [ "judge/p1", "judge/codex-candidate" ], "piotr_fit": "high", "priority": "p1", "rationale_summary": "This unblocked Slice E would make Patchwarden evaluate real runtime-repair evidence shapes instead of simplified placeholders, improving confidence in auto-heal decisions.", "reach": 4, "recommended_next_action": "codex_candidate", "rerun_reason": "no_prior_judgment", "schema": "openclaw.judge.v0", "target": { "kind": "issue", "number": 102, "repo": "pdurlej/patchwarden" }, "target_snapshot": { "body_hash": "sha256:8c940c6a5a141ffa8ae80dfcfbe36276c7a9b847a286ce071c5680952cab9e2b", "commit_count": null, "evidence_hash": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "head_sha": null, "labels": [], "labels_hash": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "state": "open", "title_hash": "sha256:d11f410b6ef3ccb1e5fe3e8b9f3136f5fa32933db329e080ded57747d915fd24", "updated_at": "2026-06-22T23:58:10+02:00" }, "top_caveat": "The issue depends on sanitized real evidence fixtures, so implementation should avoid committing private runtime details." } <!-- /openclaw.judge.v0 --> ``` </details>
Collaborator

Codex-side fixture export is addressed in #111.

Close basis for the codex half:

  • committed a sanitized live iskra-runtime-repair-gate.service fixture pair under tests/fixtures/iskra-evidence/ (supervised_repair_plan__eligible_repair_dry_run__20260622T230042Z.*), with source-run/capture metadata and redaction notes;
  • added a replay guard in tests/test_runtime_repair.py that evaluates the live bundle through policies/iskra-openclaw.v0.toml and compares stable verdict fields;
  • added fail-closed coverage by removing required autoheal-dashboard evidence and asserting blocked / runtime_repair_preflight_missing;
  • updated docs/STATUS.md, docs/operations/autoheal-roadmap.md, docs/status.html, and operator_status.py to show Slice E as fixture-export-landed, not merely unblocked.

This does not fully close #102 yet: the rich sub-field evidence model/evaluator-schema work remains. #111 references #102 but intentionally does not Closes #102.

Codex-side fixture export is addressed in #111. Close basis for the codex half: - committed a sanitized live `iskra-runtime-repair-gate.service` fixture pair under `tests/fixtures/iskra-evidence/` (`supervised_repair_plan__eligible_repair_dry_run__20260622T230042Z.*`), with source-run/capture metadata and redaction notes; - added a replay guard in `tests/test_runtime_repair.py` that evaluates the live bundle through `policies/iskra-openclaw.v0.toml` and compares stable verdict fields; - added fail-closed coverage by removing required `autoheal-dashboard` evidence and asserting `blocked` / `runtime_repair_preflight_missing`; - updated `docs/STATUS.md`, `docs/operations/autoheal-roadmap.md`, `docs/status.html`, and `operator_status.py` to show Slice E as fixture-export-landed, not merely unblocked. This does **not** fully close #102 yet: the rich sub-field evidence model/evaluator-schema work remains. #111 references #102 but intentionally does not `Closes #102`.
Collaborator

Follow-up progress in #111: first rich evidence model is now implemented.

What landed:

  • runtime_repair.py recognizes optional artifact / payload envelopes on evidence items;
  • known Iskra evidence names now have required rich sub-fields:
    • runtime-truth-report: schema_version, target_sha, generated_at, checks;
    • autoheal-dashboard: schema_version, target_sha, generated_at, summary;
    • openclaw-upgrade-simulator: schema_version, target_sha, generated_at, simulation;
  • rich evidence with missing sub-fields, stale artifact target_sha, or red rich status no longer counts and emits runtime_repair_evidence_shape_invalid;
  • legacy {name,status,sha} bundles still evaluate identically;
  • schema/docs/status updated; tests cover green rich evidence, missing sub-field fail-closed, stale rich artifact SHA fail-closed, and unchanged simple evidence behavior.

Verification: PYTHONPATH=src:. python3 -m unittest discover -s tests -> 509 tests OK.

I am still leaving #102 open: the remaining work is to capture or replace the contract-test rich fixture with additional live rich artifact shapes as Iskra starts exporting them, rather than pretending the synthetic rich envelope is a second live production sample.

Follow-up progress in #111: first rich evidence model is now implemented. What landed: - `runtime_repair.py` recognizes optional `artifact` / `payload` envelopes on evidence items; - known Iskra evidence names now have required rich sub-fields: - `runtime-truth-report`: `schema_version`, `target_sha`, `generated_at`, `checks`; - `autoheal-dashboard`: `schema_version`, `target_sha`, `generated_at`, `summary`; - `openclaw-upgrade-simulator`: `schema_version`, `target_sha`, `generated_at`, `simulation`; - rich evidence with missing sub-fields, stale artifact `target_sha`, or red rich status no longer counts and emits `runtime_repair_evidence_shape_invalid`; - legacy `{name,status,sha}` bundles still evaluate identically; - schema/docs/status updated; tests cover green rich evidence, missing sub-field fail-closed, stale rich artifact SHA fail-closed, and unchanged simple evidence behavior. Verification: `PYTHONPATH=src:. python3 -m unittest discover -s tests` -> 509 tests OK. I am still leaving #102 open: the remaining work is to capture or replace the contract-test rich fixture with additional live rich artifact shapes as Iskra starts exporting them, rather than pretending the synthetic rich envelope is a second live production sample.
Sign in to join this conversation.
No labels
agent/claude-code
agent/codex
agent/gemini
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
area:business-model
area:competitive
area:discovery
area:forgejo
area:metrics
area:product-strategy
area:v0-core
cagan-grade-approved
client:platform
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
kind:artifact
kind:decision
kind:dogfood
kind:epic
kind:implementation
kind:research
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
priority:p0
priority:p1
priority:p2
priority:p3
ready-for-agent
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:blocked-on-discovery
status:cagan-grade-review-pending
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:needs-operator-decision
status:operator-needed
status:parked
tier:0-anchor
tier:0-platform-substrate
tier:1-core
tier:1-iskra-value-layer
tier:2-supporting
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
wave:1-foundation
wave:2-positioning
wave:3-validation
wave:4-economics
wave:5-operating
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/patchwarden#102
No description provided.