feat(v0): ollama_client (stdlib HTTP plumbing, D20-compliant unparseable handling) #48

Merged

pdurlej merged 1 commit from claude/patchwarden-ollama-client into main

2026-05-27 11:14:13 +02:00

claude commented

2026-05-27 10:47:19 +02:00

Collaborator

What

New patchwarden.ollama_client module — stdlib-only HTTP plumbing for reviewer-lane model calls. POSTs to Ollama /api/chat, parses model output as a JSON array of findings, applies fallback model on failure.

from patchwarden.ollama_client import OllamaRequest, call_ollama

response = call_ollama(OllamaRequest(
    prompt=rendered_prompt,
    primary_model=lane.model,
    fallback_model=lane.fallback_model,
    timeout_seconds=lane.timeout_seconds,
))
# response.findings: list[dict]
# response.model_used / fell_back / raw_content / wall_clock_ms

Wave position

Step 2/4 toward real Ollama call:

✅ #29 prompt — merged in PR #47
✅ THIS PR — ollama_client (transport only, no integration)
⏳ review_run invokes ollama_client (next PR — swaps _load_findings() for "render prompt + call model")
⏳ #30 --execute posts findings as Forgejo comments

D20 enforcement

The module's contract enforces the hybrid review authority boundary at the lowest level:

Unparseable output NEVER silently becomes "no findings". If the model returns prose, an object, a list-of-non-dicts, or anything else, OllamaUnparseableError is raised. The caller is forced to handle it (next PR honors fail_on_unparseable from the lane config).
Empty array [] is valid. A model saying "I have nothing to flag" is legitimate sensor output, not an error.
Fallback on both transport AND unparseable. Operator may provision the fallback specifically because the primary tends to hallucinate, so an unparseable primary justifies trying the fallback once.
OllamaUnparseableError is a subclass of OllamaClientError. The next PR can catch the subclass explicitly before the parent if it needs to distinguish "model is broken" from "network is broken" (it will — for fail_on_unparseable vs fail_on_missing).

Constraints honored

stdlib-only: urllib.request + urllib.error, no requests/httpx.
No state: frozen dataclasses for OllamaRequest / OllamaResponse.
No network in tests: injectable Transport = Callable[[str, bytes, int], bytes]. Default transport is _urllib_transport; tests pass stubs.
Atomic per ADR-0017: no changes to review_run.py, cli.py, or any other module. Pure plumbing.

Tests

17 new tests in tests/test_ollama_client.py. Suite 109/109 green (up from 92):

happy path + empty array valid + raw_content preserved + wall_clock_ms
fallback success on transport error / on unparseable / both fail / no fallback configured
parse validation: object-not-list, list-with-non-dict, bare string all → OllamaUnparseableError
transport contract: timeout propagated, body well-formed, URL is {base}/api/chat
public surface: both error classes importable, subclass relationship documented

Token-accounting note (for agent-kanban)

Wrote this PR with a Sonnet sub-agent (delegated module + tests, then verified + committed myself). Sonnet usage: 42.5k tokens, ~0% of weekly Sonnet limit. Opus usage for delegation+verify+meldunek: ~3% weekly. Total ~3% Opus instead of estimated ~4-5% for doing it directly — small win, and Sonnet handled HTTP boilerplate cleanly.

## What New `patchwarden.ollama_client` module — stdlib-only HTTP plumbing for reviewer-lane model calls. POSTs to Ollama `/api/chat`, parses model output as a JSON array of findings, applies fallback model on failure. ```python from patchwarden.ollama_client import OllamaRequest, call_ollama response = call_ollama(OllamaRequest( prompt=rendered_prompt, primary_model=lane.model, fallback_model=lane.fallback_model, timeout_seconds=lane.timeout_seconds, )) # response.findings: list[dict] # response.model_used / fell_back / raw_content / wall_clock_ms ``` ## Wave position Step 2/4 toward real Ollama call: 1. ✅ #29 prompt — merged in PR #47 2. ✅ **THIS PR** — ollama_client (transport only, no integration) 3. ⏳ `review_run` invokes ollama_client (next PR — swaps `_load_findings()` for "render prompt + call model") 4. ⏳ #30 `--execute` posts findings as Forgejo comments ## D20 enforcement The module's contract enforces the hybrid review authority boundary at the lowest level: - **Unparseable output NEVER silently becomes "no findings".** If the model returns prose, an object, a list-of-non-dicts, or anything else, `OllamaUnparseableError` is raised. The caller is forced to handle it (next PR honors `fail_on_unparseable` from the lane config). - **Empty array `[]` is valid.** A model saying "I have nothing to flag" is legitimate sensor output, not an error. - **Fallback on both transport AND unparseable.** Operator may provision the fallback specifically because the primary tends to hallucinate, so an unparseable primary justifies trying the fallback once. - `OllamaUnparseableError` is a subclass of `OllamaClientError`. The next PR can catch the subclass explicitly *before* the parent if it needs to distinguish "model is broken" from "network is broken" (it will — for `fail_on_unparseable` vs `fail_on_missing`). ## Constraints honored - **stdlib-only**: `urllib.request` + `urllib.error`, no `requests`/`httpx`. - **No state**: frozen dataclasses for `OllamaRequest` / `OllamaResponse`. - **No network in tests**: injectable `Transport = Callable[[str, bytes, int], bytes]`. Default transport is `_urllib_transport`; tests pass stubs. - **Atomic per ADR-0017**: no changes to `review_run.py`, `cli.py`, or any other module. Pure plumbing. ## Tests 17 new tests in `tests/test_ollama_client.py`. Suite **109/109 green** (up from 92): - happy path + empty array valid + raw_content preserved + wall_clock_ms - fallback success on transport error / on unparseable / both fail / no fallback configured - parse validation: object-not-list, list-with-non-dict, bare string all → `OllamaUnparseableError` - transport contract: timeout propagated, body well-formed, URL is `{base}/api/chat` - public surface: both error classes importable, subclass relationship documented ## Token-accounting note (for agent-kanban) Wrote this PR with a Sonnet sub-agent (delegated module + tests, then verified + committed myself). Sonnet usage: 42.5k tokens, ~0% of weekly Sonnet limit. Opus usage for delegation+verify+meldunek: ~3% weekly. Total **~3% Opus instead of estimated ~4-5%** for doing it directly — small win, and Sonnet handled HTTP boilerplate cleanly.

Rows
Columns