feat(v0): ollama_client (stdlib HTTP plumbing, D20-compliant unparseable handling) #48

Merged
pdurlej merged 1 commit from claude/patchwarden-ollama-client into main 2026-05-27 11:14:13 +02:00
Collaborator

What

New patchwarden.ollama_client module — stdlib-only HTTP plumbing for reviewer-lane model calls. POSTs to Ollama /api/chat, parses model output as a JSON array of findings, applies fallback model on failure.

from patchwarden.ollama_client import OllamaRequest, call_ollama

response = call_ollama(OllamaRequest(
    prompt=rendered_prompt,
    primary_model=lane.model,
    fallback_model=lane.fallback_model,
    timeout_seconds=lane.timeout_seconds,
))
# response.findings: list[dict]
# response.model_used / fell_back / raw_content / wall_clock_ms

Wave position

Step 2/4 toward real Ollama call:

  1. #29 prompt — merged in PR #47
  2. THIS PR — ollama_client (transport only, no integration)
  3. review_run invokes ollama_client (next PR — swaps _load_findings() for "render prompt + call model")
  4. #30 --execute posts findings as Forgejo comments

D20 enforcement

The module's contract enforces the hybrid review authority boundary at the lowest level:

  • Unparseable output NEVER silently becomes "no findings". If the model returns prose, an object, a list-of-non-dicts, or anything else, OllamaUnparseableError is raised. The caller is forced to handle it (next PR honors fail_on_unparseable from the lane config).
  • Empty array [] is valid. A model saying "I have nothing to flag" is legitimate sensor output, not an error.
  • Fallback on both transport AND unparseable. Operator may provision the fallback specifically because the primary tends to hallucinate, so an unparseable primary justifies trying the fallback once.
  • OllamaUnparseableError is a subclass of OllamaClientError. The next PR can catch the subclass explicitly before the parent if it needs to distinguish "model is broken" from "network is broken" (it will — for fail_on_unparseable vs fail_on_missing).

Constraints honored

  • stdlib-only: urllib.request + urllib.error, no requests/httpx.
  • No state: frozen dataclasses for OllamaRequest / OllamaResponse.
  • No network in tests: injectable Transport = Callable[[str, bytes, int], bytes]. Default transport is _urllib_transport; tests pass stubs.
  • Atomic per ADR-0017: no changes to review_run.py, cli.py, or any other module. Pure plumbing.

Tests

17 new tests in tests/test_ollama_client.py. Suite 109/109 green (up from 92):

  • happy path + empty array valid + raw_content preserved + wall_clock_ms
  • fallback success on transport error / on unparseable / both fail / no fallback configured
  • parse validation: object-not-list, list-with-non-dict, bare string all → OllamaUnparseableError
  • transport contract: timeout propagated, body well-formed, URL is {base}/api/chat
  • public surface: both error classes importable, subclass relationship documented

Token-accounting note (for agent-kanban)

Wrote this PR with a Sonnet sub-agent (delegated module + tests, then verified + committed myself). Sonnet usage: 42.5k tokens, ~0% of weekly Sonnet limit. Opus usage for delegation+verify+meldunek: ~3% weekly. Total ~3% Opus instead of estimated ~4-5% for doing it directly — small win, and Sonnet handled HTTP boilerplate cleanly.

## What New `patchwarden.ollama_client` module — stdlib-only HTTP plumbing for reviewer-lane model calls. POSTs to Ollama `/api/chat`, parses model output as a JSON array of findings, applies fallback model on failure. ```python from patchwarden.ollama_client import OllamaRequest, call_ollama response = call_ollama(OllamaRequest( prompt=rendered_prompt, primary_model=lane.model, fallback_model=lane.fallback_model, timeout_seconds=lane.timeout_seconds, )) # response.findings: list[dict] # response.model_used / fell_back / raw_content / wall_clock_ms ``` ## Wave position Step 2/4 toward real Ollama call: 1. ✅ #29 prompt — merged in PR #47 2. ✅ **THIS PR** — ollama_client (transport only, no integration) 3. ⏳ `review_run` invokes ollama_client (next PR — swaps `_load_findings()` for "render prompt + call model") 4. ⏳ #30 `--execute` posts findings as Forgejo comments ## D20 enforcement The module's contract enforces the hybrid review authority boundary at the lowest level: - **Unparseable output NEVER silently becomes "no findings".** If the model returns prose, an object, a list-of-non-dicts, or anything else, `OllamaUnparseableError` is raised. The caller is forced to handle it (next PR honors `fail_on_unparseable` from the lane config). - **Empty array `[]` is valid.** A model saying "I have nothing to flag" is legitimate sensor output, not an error. - **Fallback on both transport AND unparseable.** Operator may provision the fallback specifically because the primary tends to hallucinate, so an unparseable primary justifies trying the fallback once. - `OllamaUnparseableError` is a subclass of `OllamaClientError`. The next PR can catch the subclass explicitly *before* the parent if it needs to distinguish "model is broken" from "network is broken" (it will — for `fail_on_unparseable` vs `fail_on_missing`). ## Constraints honored - **stdlib-only**: `urllib.request` + `urllib.error`, no `requests`/`httpx`. - **No state**: frozen dataclasses for `OllamaRequest` / `OllamaResponse`. - **No network in tests**: injectable `Transport = Callable[[str, bytes, int], bytes]`. Default transport is `_urllib_transport`; tests pass stubs. - **Atomic per ADR-0017**: no changes to `review_run.py`, `cli.py`, or any other module. Pure plumbing. ## Tests 17 new tests in `tests/test_ollama_client.py`. Suite **109/109 green** (up from 92): - happy path + empty array valid + raw_content preserved + wall_clock_ms - fallback success on transport error / on unparseable / both fail / no fallback configured - parse validation: object-not-list, list-with-non-dict, bare string all → `OllamaUnparseableError` - transport contract: timeout propagated, body well-formed, URL is `{base}/api/chat` - public surface: both error classes importable, subclass relationship documented ## Token-accounting note (for agent-kanban) Wrote this PR with a Sonnet sub-agent (delegated module + tests, then verified + committed myself). Sonnet usage: 42.5k tokens, ~0% of weekly Sonnet limit. Opus usage for delegation+verify+meldunek: ~3% weekly. Total **~3% Opus instead of estimated ~4-5%** for doing it directly — small win, and Sonnet handled HTTP boilerplate cleanly.
Sign in to join this conversation.
No reviewers
No labels
agent/claude-code
agent/codex
agent/gemini
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
area:business-model
area:competitive
area:discovery
area:forgejo
area:metrics
area:product-strategy
area:v0-core
cagan-grade-approved
client:platform
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
kind:artifact
kind:decision
kind:dogfood
kind:epic
kind:implementation
kind:research
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
priority:p0
priority:p1
priority:p2
priority:p3
ready-for-agent
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:blocked-on-discovery
status:cagan-grade-review-pending
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:needs-operator-decision
status:operator-needed
status:parked
tier:0-anchor
tier:0-platform-substrate
tier:1-core
tier:1-iskra-value-layer
tier:2-supporting
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
wave:1-foundation
wave:2-positioning
wave:3-validation
wave:4-economics
wave:5-operating
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/patchwarden!48
No description provided.