feat(v0): review_run invokes ollama_client (live-model path, D20 soft/hard-fail per lane) #49

Merged

pdurlej merged 1 commit from claude/patchwarden-review-run-invokes-ollama into main

2026-05-27 11:24:39 +02:00

claude commented

2026-05-27 11:21:02 +02:00

Collaborator

What

review_run gains a second findings source: live Ollama call. Pass lane + pr_metadata + diff and the module renders the deterministic prompt (PR #47), calls Ollama (PR #48), and embeds the parsed findings + model metadata in the artifact.

patchwarden review-run \
  --target-kind pull_request \
  --target-id forgejo://… \
  --target-sha … \
  --lane-config policies/platform.v0.toml \
  --pr-metadata-file metadata.json \
  --diff-file pr.diff

The file-based path (--findings-file) is unchanged — existing dry-run flows keep working.

Wave position

Step 3/4 toward real Ollama call:

✅ #29 prompt — PR #47 merged
✅ ollama_client — PR #48 merged
✅ THIS PR — review_run integration
⏳ #30 --execute posts findings as Forgejo comments (next thread)

D20 enforcement

The artifact records exactly what happened so the resolver downstream can decide policy without re-asking the model:

runtime_metadata.model_used — actual model that answered (primary or fallback)
runtime_metadata.fell_back — boolean, true iff primary failed and fallback succeeded
runtime_metadata.error_kind — "transport" / "unparseable" only when fail_on_*=False (soft-fail). Absent on success.

Hard vs soft fail per lane:

lane.fail_on_unparseable=True → OllamaUnparseableError propagates → CLI exit 2 (fail-closed for the platform Actions run).
lane.fail_on_unparseable=False → empty findings + error_kind="unparseable" in artifact → exit 0 but operator sees the soft-fail.
Same pattern for fail_on_missing / transport errors.

This honors D20: an unreachable model never silently becomes "no findings" — it becomes either a hard fail or a recorded soft-fail in the audit trail.

Mutual exclusivity validation

Passing --findings-file and --pr-metadata-file/--diff-file raises ValueError (CLI exit 2). No ambiguous "which source won" semantics.
--pr-metadata-file + --diff-file must be paired (you cannot pass only one).
The live-model path requires --lane-config (lane drives model + timeout + fail_on_* policy).

Tests

11 new tests, suite 120/120 green (was 109):

tests/test_review_run.py — 9 new (OllamaPathTests):

happy path → findings + model_used="kimi-k2.6:cloud" + fell_back=False
primary fails → records fell_back=True + model_used=<fallback>
unparseable + fail_on_unparseable=True → raises OllamaUnparseableError
unparseable + fail_on_unparseable=False → empty + error_kind="unparseable"
transport error + fail_on_missing=True → raises OllamaClientError
transport error + fail_on_missing=False → empty + error_kind="transport"
both findings_file and lane inputs → rejects
empty [] from model is valid (no error_kind, exit 0)
the actual rendered prompt reaches the transport (lane name, repo, files visible in body)

tests/test_cli_review_run.py — 2 new:

--pr-metadata-file without --lane-config → exit 2 "require --lane-config"
--diff-file without --pr-metadata-file → exit 2 "must be provided together"

Atomic per ADR-0017

No --execute flag. No Forgejo posting. No CLI changes outside review-run's scope.
No changes to ollama_client.py, prompt.py, or policy_bundle.py.
All existing tests still pass — backward-compatible extension.

Token-accounting note (agent-kanban)

This PR was on Opus (sacred path: review_run is the core artifact builder, changes to it affect every downstream resolver). Opus usage: ~5% weekly Opus. Total wave so far:

#29 prompt (Opus): ~6-7%
ollama_client (Sonnet+Opus verify): ~3%
THIS PR (Opus): ~5%
→ cumulative wave cost: ~14-15% weekly Opus, well within one week as estimated.

## What `review_run` gains a second findings source: live Ollama call. Pass `lane + pr_metadata + diff` and the module renders the deterministic prompt (PR #47), calls Ollama (PR #48), and embeds the parsed findings + model metadata in the artifact. ```bash patchwarden review-run \ --target-kind pull_request \ --target-id forgejo://… \ --target-sha … \ --lane-config policies/platform.v0.toml \ --pr-metadata-file metadata.json \ --diff-file pr.diff ``` The file-based path (`--findings-file`) is unchanged — existing dry-run flows keep working. ## Wave position **Step 3/4** toward real Ollama call: 1. ✅ #29 prompt — PR #47 merged 2. ✅ ollama_client — PR #48 merged 3. ✅ **THIS PR** — review_run integration 4. ⏳ #30 `--execute` posts findings as Forgejo comments (next thread) ## D20 enforcement The artifact records exactly what happened so the resolver downstream can decide policy without re-asking the model: - **`runtime_metadata.model_used`** — actual model that answered (primary or fallback) - **`runtime_metadata.fell_back`** — boolean, true iff primary failed and fallback succeeded - **`runtime_metadata.error_kind`** — `"transport"` / `"unparseable"` only when `fail_on_*=False` (soft-fail). Absent on success. Hard vs soft fail per lane: - `lane.fail_on_unparseable=True` → `OllamaUnparseableError` propagates → CLI exit 2 (fail-closed for the platform Actions run). - `lane.fail_on_unparseable=False` → empty findings + `error_kind="unparseable"` in artifact → exit 0 but operator sees the soft-fail. - Same pattern for `fail_on_missing` / transport errors. This honors **D20**: an unreachable model never silently becomes "no findings" — it becomes either a hard fail or a recorded soft-fail in the audit trail. ## Mutual exclusivity validation - Passing `--findings-file` **and** `--pr-metadata-file/--diff-file` raises `ValueError` (CLI exit 2). No ambiguous "which source won" semantics. - `--pr-metadata-file` + `--diff-file` must be paired (you cannot pass only one). - The live-model path requires `--lane-config` (lane drives model + timeout + fail_on_* policy). ## Tests 11 new tests, suite **120/120 green** (was 109): `tests/test_review_run.py` — 9 new (`OllamaPathTests`): - happy path → findings + `model_used="kimi-k2.6:cloud"` + `fell_back=False` - primary fails → records `fell_back=True` + `model_used=<fallback>` - unparseable + `fail_on_unparseable=True` → raises `OllamaUnparseableError` - unparseable + `fail_on_unparseable=False` → empty + `error_kind="unparseable"` - transport error + `fail_on_missing=True` → raises `OllamaClientError` - transport error + `fail_on_missing=False` → empty + `error_kind="transport"` - both findings_file and lane inputs → rejects - empty `[]` from model is valid (no error_kind, exit 0) - the actual rendered prompt reaches the transport (lane name, repo, files visible in body) `tests/test_cli_review_run.py` — 2 new: - `--pr-metadata-file` without `--lane-config` → exit 2 "require --lane-config" - `--diff-file` without `--pr-metadata-file` → exit 2 "must be provided together" ## Atomic per ADR-0017 - No `--execute` flag. No Forgejo posting. No CLI changes outside `review-run`'s scope. - No changes to `ollama_client.py`, `prompt.py`, or `policy_bundle.py`. - All existing tests still pass — backward-compatible extension. ## Token-accounting note (agent-kanban) This PR was on Opus (sacred path: `review_run` is the core artifact builder, changes to it affect every downstream resolver). Opus usage: ~5% weekly Opus. Total wave so far: - #29 prompt (Opus): ~6-7% - ollama_client (Sonnet+Opus verify): ~3% - **THIS PR (Opus)**: ~5% - → cumulative wave cost: ~14-15% weekly Opus, well within one week as estimated.

Rows
Columns