feat(v0): review_run invokes ollama_client (live-model path, D20 soft/hard-fail per lane) #49

Merged
pdurlej merged 1 commit from claude/patchwarden-review-run-invokes-ollama into main 2026-05-27 11:24:39 +02:00
Collaborator

What

review_run gains a second findings source: live Ollama call. Pass lane + pr_metadata + diff and the module renders the deterministic prompt (PR #47), calls Ollama (PR #48), and embeds the parsed findings + model metadata in the artifact.

patchwarden review-run \
  --target-kind pull_request \
  --target-id forgejo://… \
  --target-sha … \
  --lane-config policies/platform.v0.toml \
  --pr-metadata-file metadata.json \
  --diff-file pr.diff

The file-based path (--findings-file) is unchanged — existing dry-run flows keep working.

Wave position

Step 3/4 toward real Ollama call:

  1. #29 prompt — PR #47 merged
  2. ollama_client — PR #48 merged
  3. THIS PR — review_run integration
  4. #30 --execute posts findings as Forgejo comments (next thread)

D20 enforcement

The artifact records exactly what happened so the resolver downstream can decide policy without re-asking the model:

  • runtime_metadata.model_used — actual model that answered (primary or fallback)
  • runtime_metadata.fell_back — boolean, true iff primary failed and fallback succeeded
  • runtime_metadata.error_kind"transport" / "unparseable" only when fail_on_*=False (soft-fail). Absent on success.

Hard vs soft fail per lane:

  • lane.fail_on_unparseable=TrueOllamaUnparseableError propagates → CLI exit 2 (fail-closed for the platform Actions run).
  • lane.fail_on_unparseable=False → empty findings + error_kind="unparseable" in artifact → exit 0 but operator sees the soft-fail.
  • Same pattern for fail_on_missing / transport errors.

This honors D20: an unreachable model never silently becomes "no findings" — it becomes either a hard fail or a recorded soft-fail in the audit trail.

Mutual exclusivity validation

  • Passing --findings-file and --pr-metadata-file/--diff-file raises ValueError (CLI exit 2). No ambiguous "which source won" semantics.
  • --pr-metadata-file + --diff-file must be paired (you cannot pass only one).
  • The live-model path requires --lane-config (lane drives model + timeout + fail_on_* policy).

Tests

11 new tests, suite 120/120 green (was 109):

tests/test_review_run.py — 9 new (OllamaPathTests):

  • happy path → findings + model_used="kimi-k2.6:cloud" + fell_back=False
  • primary fails → records fell_back=True + model_used=<fallback>
  • unparseable + fail_on_unparseable=True → raises OllamaUnparseableError
  • unparseable + fail_on_unparseable=False → empty + error_kind="unparseable"
  • transport error + fail_on_missing=True → raises OllamaClientError
  • transport error + fail_on_missing=False → empty + error_kind="transport"
  • both findings_file and lane inputs → rejects
  • empty [] from model is valid (no error_kind, exit 0)
  • the actual rendered prompt reaches the transport (lane name, repo, files visible in body)

tests/test_cli_review_run.py — 2 new:

  • --pr-metadata-file without --lane-config → exit 2 "require --lane-config"
  • --diff-file without --pr-metadata-file → exit 2 "must be provided together"

Atomic per ADR-0017

  • No --execute flag. No Forgejo posting. No CLI changes outside review-run's scope.
  • No changes to ollama_client.py, prompt.py, or policy_bundle.py.
  • All existing tests still pass — backward-compatible extension.

Token-accounting note (agent-kanban)

This PR was on Opus (sacred path: review_run is the core artifact builder, changes to it affect every downstream resolver). Opus usage: ~5% weekly Opus. Total wave so far:

  • #29 prompt (Opus): ~6-7%
  • ollama_client (Sonnet+Opus verify): ~3%
  • THIS PR (Opus): ~5%
  • → cumulative wave cost: ~14-15% weekly Opus, well within one week as estimated.
## What `review_run` gains a second findings source: live Ollama call. Pass `lane + pr_metadata + diff` and the module renders the deterministic prompt (PR #47), calls Ollama (PR #48), and embeds the parsed findings + model metadata in the artifact. ```bash patchwarden review-run \ --target-kind pull_request \ --target-id forgejo://… \ --target-sha … \ --lane-config policies/platform.v0.toml \ --pr-metadata-file metadata.json \ --diff-file pr.diff ``` The file-based path (`--findings-file`) is unchanged — existing dry-run flows keep working. ## Wave position **Step 3/4** toward real Ollama call: 1. ✅ #29 prompt — PR #47 merged 2. ✅ ollama_client — PR #48 merged 3. ✅ **THIS PR** — review_run integration 4. ⏳ #30 `--execute` posts findings as Forgejo comments (next thread) ## D20 enforcement The artifact records exactly what happened so the resolver downstream can decide policy without re-asking the model: - **`runtime_metadata.model_used`** — actual model that answered (primary or fallback) - **`runtime_metadata.fell_back`** — boolean, true iff primary failed and fallback succeeded - **`runtime_metadata.error_kind`** — `"transport"` / `"unparseable"` only when `fail_on_*=False` (soft-fail). Absent on success. Hard vs soft fail per lane: - `lane.fail_on_unparseable=True` → `OllamaUnparseableError` propagates → CLI exit 2 (fail-closed for the platform Actions run). - `lane.fail_on_unparseable=False` → empty findings + `error_kind="unparseable"` in artifact → exit 0 but operator sees the soft-fail. - Same pattern for `fail_on_missing` / transport errors. This honors **D20**: an unreachable model never silently becomes "no findings" — it becomes either a hard fail or a recorded soft-fail in the audit trail. ## Mutual exclusivity validation - Passing `--findings-file` **and** `--pr-metadata-file/--diff-file` raises `ValueError` (CLI exit 2). No ambiguous "which source won" semantics. - `--pr-metadata-file` + `--diff-file` must be paired (you cannot pass only one). - The live-model path requires `--lane-config` (lane drives model + timeout + fail_on_* policy). ## Tests 11 new tests, suite **120/120 green** (was 109): `tests/test_review_run.py` — 9 new (`OllamaPathTests`): - happy path → findings + `model_used="kimi-k2.6:cloud"` + `fell_back=False` - primary fails → records `fell_back=True` + `model_used=<fallback>` - unparseable + `fail_on_unparseable=True` → raises `OllamaUnparseableError` - unparseable + `fail_on_unparseable=False` → empty + `error_kind="unparseable"` - transport error + `fail_on_missing=True` → raises `OllamaClientError` - transport error + `fail_on_missing=False` → empty + `error_kind="transport"` - both findings_file and lane inputs → rejects - empty `[]` from model is valid (no error_kind, exit 0) - the actual rendered prompt reaches the transport (lane name, repo, files visible in body) `tests/test_cli_review_run.py` — 2 new: - `--pr-metadata-file` without `--lane-config` → exit 2 "require --lane-config" - `--diff-file` without `--pr-metadata-file` → exit 2 "must be provided together" ## Atomic per ADR-0017 - No `--execute` flag. No Forgejo posting. No CLI changes outside `review-run`'s scope. - No changes to `ollama_client.py`, `prompt.py`, or `policy_bundle.py`. - All existing tests still pass — backward-compatible extension. ## Token-accounting note (agent-kanban) This PR was on Opus (sacred path: `review_run` is the core artifact builder, changes to it affect every downstream resolver). Opus usage: ~5% weekly Opus. Total wave so far: - #29 prompt (Opus): ~6-7% - ollama_client (Sonnet+Opus verify): ~3% - **THIS PR (Opus)**: ~5% - → cumulative wave cost: ~14-15% weekly Opus, well within one week as estimated.
Sign in to join this conversation.
No reviewers
No labels
agent/claude-code
agent/codex
agent/gemini
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
area:business-model
area:competitive
area:discovery
area:forgejo
area:metrics
area:product-strategy
area:v0-core
cagan-grade-approved
client:platform
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
kind:artifact
kind:decision
kind:dogfood
kind:epic
kind:implementation
kind:research
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
priority:p0
priority:p1
priority:p2
priority:p3
ready-for-agent
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:blocked-on-discovery
status:cagan-grade-review-pending
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:needs-operator-decision
status:operator-needed
status:parked
tier:0-anchor
tier:0-platform-substrate
tier:1-core
tier:1-iskra-value-layer
tier:2-supporting
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
wave:1-foundation
wave:2-positioning
wave:3-validation
wave:4-economics
wave:5-operating
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/patchwarden!49
No description provided.