fix(honcho): route gemma through custom provider #372

Merged

pdurlej merged 1 commit from codex/honcho-custom-provider-gemma into main

2026-05-18 09:44:21 +02:00

codex commented

2026-05-18 09:41:08 +02:00

Collaborator

Canary status: missing — module/runtime config fix; rely on required Forgejo checks and operator review before merge

Canary Context Pack

Product story

Iskra reported that Honcho memory recall still timed out after the Gemma/Ollama switch. Runtime investigation showed Honcho had the Gemma model configured but was still using the native OpenAI provider path for legacy Honcho settings, causing OpenAI-compatible calls to miss Ollama.

What changed

Defaults Honcho OpenAI-compatible client to Ollama Cloud when OLLAMA_CLOUD_API_KEY is present.
Routes legacy Honcho LLM provider aliases to custom instead of native openai.
Extends the Honcho/Ollama contract test to assert the custom-provider path.
Updates the runbook with the provider alias and runtime override contract.

Why it changed

gemma4:31b-cloud must be sent through Honcho's custom OpenAI-compatible client. Sending that model name through the native OpenAI client produces provider not-found/retry errors and timeouts in the Iskra memory recall path.

Files touched

compose/apps/compose.yaml
control-plane/platformctl/tests/test_honcho_ollama_contract.py
runbooks/honcho-ollama-gemma-switch.md

Relevant context

#142 RS2000 cutover/Honcho soak thread
#371 Honcho raw memory/tool-result logging privacy follow-up
Honcho/Gemma switch runbook

Runtime evidence

Runtime mitigation already applied on RS2000: provider aliases set to custom, compatible base URL set to https://ollama.com/v1, provider key rendered from Infisical Token Auth.
honcho-api and honcho-deriver are healthy after scoped recreate.
In-container settings show summary/deriver/dream/dialectic providers as custom with model gemma4:31b-cloud.
Direct synthetic Ollama smoke passed earlier for chat, JSON, and tool-call.

Known constraints

Embeddings intentionally remain on text-embedding-3-small and OpenAI until the BGE-M3 vector migration is designed.
Runtime compose.env may still contain old explicit values; the runbook documents the later override file ordering required for existing RS2000 state.

Explicit out-of-scope

No BGE-M3 production switch.
No database migration.
No raw memory/tool logging fix beyond documenting #371.

Requested decision

Approve and merge after checks. This PR makes the runtime mitigation durable in desired state.

Merge blockers

Test failure in Honcho/Ollama contract.
Evidence that Honcho custom provider is incompatible with Ollama Cloud.

Spec sources read

compose/apps/compose.yaml — Honcho environment contract
control-plane/platformctl/tests/test_honcho_ollama_contract.py — existing Gemma/Ollama assertions
runbooks/honcho-ollama-gemma-switch.md — operator runbook
Runtime Honcho config/source snippets on RS2000 — to identify provider selection behavior

Tests

control-plane/.venv/bin/python -m pytest control-plane/platformctl/tests/test_honcho_ollama_contract.py
git diff --check

Canary status: missing — module/runtime config fix; rely on required Forgejo checks and operator review before merge ## Canary Context Pack ### Product story Iskra reported that Honcho memory recall still timed out after the Gemma/Ollama switch. Runtime investigation showed Honcho had the Gemma model configured but was still using the native OpenAI provider path for legacy Honcho settings, causing OpenAI-compatible calls to miss Ollama. ### What changed - Defaults Honcho OpenAI-compatible client to Ollama Cloud when `OLLAMA_CLOUD_API_KEY` is present. - Routes legacy Honcho LLM provider aliases to `custom` instead of native `openai`. - Extends the Honcho/Ollama contract test to assert the custom-provider path. - Updates the runbook with the provider alias and runtime override contract. ### Why it changed `gemma4:31b-cloud` must be sent through Honcho's `custom` OpenAI-compatible client. Sending that model name through the native OpenAI client produces provider not-found/retry errors and timeouts in the Iskra memory recall path. ### Files touched - `compose/apps/compose.yaml` - `control-plane/platformctl/tests/test_honcho_ollama_contract.py` - `runbooks/honcho-ollama-gemma-switch.md` ### Relevant context - #142 RS2000 cutover/Honcho soak thread - #371 Honcho raw memory/tool-result logging privacy follow-up - Honcho/Gemma switch runbook ### Runtime evidence - Runtime mitigation already applied on RS2000: provider aliases set to `custom`, compatible base URL set to `https://ollama.com/v1`, provider key rendered from Infisical Token Auth. - `honcho-api` and `honcho-deriver` are healthy after scoped recreate. - In-container settings show summary/deriver/dream/dialectic providers as `custom` with model `gemma4:31b-cloud`. - Direct synthetic Ollama smoke passed earlier for chat, JSON, and tool-call. ### Known constraints - Embeddings intentionally remain on `text-embedding-3-small` and OpenAI until the BGE-M3 vector migration is designed. - Runtime `compose.env` may still contain old explicit values; the runbook documents the later override file ordering required for existing RS2000 state. ### Explicit out-of-scope - No BGE-M3 production switch. - No database migration. - No raw memory/tool logging fix beyond documenting #371. ### Requested decision Approve and merge after checks. This PR makes the runtime mitigation durable in desired state. ### Merge blockers - Test failure in Honcho/Ollama contract. - Evidence that Honcho `custom` provider is incompatible with Ollama Cloud. ## Spec sources read - `compose/apps/compose.yaml` — Honcho environment contract - `control-plane/platformctl/tests/test_honcho_ollama_contract.py` — existing Gemma/Ollama assertions - `runbooks/honcho-ollama-gemma-switch.md` — operator runbook - Runtime Honcho config/source snippets on RS2000 — to identify provider selection behavior ## Tests - `control-plane/.venv/bin/python -m pytest control-plane/platformctl/tests/test_honcho_ollama_contract.py` - `git diff --check`

codex added 1 commit

2026-05-18 09:41:08 +02:00

fix(honcho): route gemma through custom provider

base-is-main / guard (pull_request) Successful in 2s

Details

canary-required / collect-diff (pull_request) Successful in 5s

Details

patchwarden-pr-sanity / collect-diff (pull_request) Successful in 3s

Details

platformctl plan / auto-apply scope (pull_request) Successful in 20s

Details

pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 19s