feat(honcho): prepare Ollama Gemma LLM switch #358

Merged

pdurlej merged 1 commit from codex/honcho-gemma-ollama-prep into main

2026-05-18 01:54:24 +02:00

codex commented

2026-05-18 01:44:13 +02:00

Collaborator

Canary status: missing — required Forgejo checks and canary review before production deploy

Summary

Prepare the Honcho LLM-only provider switch from the current OpenAI-style gpt-5.4-mini path to Ollama Cloud gemma4:31b-cloud, while deliberately keeping production embeddings on text-embedding-3-small until the BGE-M3 vector-space migration is designed.

This PR does not mutate production by itself. It prepares desired-state config, backup metadata support, synthetic compatibility smokes, and the operator runbook for the morning deploy.

Canary Context Pack

Product story

Honcho memory work is already sent to external OpenAI-style providers. The owner wants the reasoning/summary/dialectic LLM path moved to the preferred Ollama Cloud provider now, without delaying on the more complex embedding migration.

What changed

Honcho LLM/reasoning defaults now point at gemma4:31b-cloud through OpenAI-compatible per-feature overrides.
LLM endpoint defaults to https://ollama.com/v1; per-feature API_KEY_ENV points at OLLAMA_CLOUD_API_KEY.
Embeddings stay unchanged on text-embedding-3-small; no BGE-M3 production wiring and no EMBEDDING_VECTOR_DIMENSIONS=1024.
DERIVER_FLUSH_ENABLED keeps normal-write fallback for this provider switch.
Added synthetic Ollama/Gemma compatibility smoke with chat, JSON, and tool-call checks.
Added synthetic BGE-M3 embedding smoke for future internal/local endpoint validation only.
Backup-before script now emits protected .metadata.json sidecars with path, size, sha256, class, container, and exit code.
Added Honcho switch runbook and cutover evidence artifact.
Opened #357 for the BGE-M3 vector-space migration design.

Why it changed

Current RS2000 truth shows Honcho already uses external LLM and embedding processing, and the database already has 1536-dimensional vectors. The fast safe target is therefore LLM-only. BGE-M3 is prepared but blocked from production until retrieval can avoid mixing vector spaces.

Files touched

compose/apps/compose.yaml
modules/honcho-api/module.yaml
modules/honcho-deriver/module.yaml
scripts/cutover/backup-before-apply.sh
scripts/cutover/README.md
scripts/honcho/ollama-gemma-compat-smoke.py
scripts/honcho/bge-m3-embedding-smoke.py
runbooks/honcho-ollama-gemma-switch.md
state/cutover/honcho-gemma-ollama-prep.md
control-plane/platformctl/tests/test_honcho_ollama_contract.py

Relevant context

Honcho supports OpenAI-compatible endpoints by keeping transport=openai and setting MODEL_CONFIG__OVERRIDES__BASE_URL / API_KEY_ENV.
Honcho embedding changes are not in-place safe once a database contains existing vectors.
RS2000 evidence: documents.embedding vector(1536) has 26141 rows; message_embeddings.embedding vector(1536) has 13558 rows.

Runtime evidence

Read-only RS2000 checks before this PR:

honcho-api and honcho-deriver envs show LLM paths on gpt-5.4-mini, transport openai.
DERIVER_FLUSH_ENABLED=true is active.
Embeddings use text-embedding-3-small, transport openai.
documents and message_embeddings are both vector(1536), with all counted rows at 1536 dimensions.
No local RS2000 Ollama listener/container was observed on port 11434.

Known constraints

OLLAMA_CLOUD_API_KEY must be provided by runtime/Infisical only; no key value is stored in repo.
Local Mac Infisical CLI lacked an active login session, so the live Ollama Cloud compatibility smoke was not run from this workstation.
The runbook makes the compatibility smoke a pre-deploy gate before Honcho is switched.

Explicit out-of-scope

No production BGE-M3 switch.
No vector schema migration.
No data re-embedding.
No production backup execution from this PR.
No production Honcho deploy from this PR.

Requested decision

Approve the prepared LLM-only provider switch package. Production deploy remains gated on: Ollama/Gemma compatibility smoke, Honcho Postgres/Redis backups, release-root promotion, and sequential Honcho smokes.

Merge blockers

A reviewer finds that Honcho will not honor MODEL_CONFIG__OVERRIDES__API_KEY_ENV on the selected config path.
A reviewer finds BGE-M3 production wiring accidentally enabled.
A reviewer finds the provider switch would drop the current embedding path.
A reviewer finds the backup metadata sidecar leaks backup content or secrets.

Verification

control-plane/.venv/bin/python -m pytest -q control-plane/platformctl/tests/test_honcho_ollama_contract.py control-plane/platformctl/tests/test_validate.py control-plane/platformctl/tests/test_apply_env_file.py control-plane/platformctl/tests/test_forgejo_ci_scripts_contract.py — 50 passed
PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-api/module.yaml — ok
PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-deriver/module.yaml — ok
PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-postgres/module.yaml — ok
PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-redis/module.yaml — ok
bash -n scripts/cutover/backup-before-apply.sh
python3 -m py_compile scripts/honcho/ollama-gemma-compat-smoke.py scripts/honcho/bge-m3-embedding-smoke.py
git diff --check

Notes:

docker compose -f compose/apps/compose.yaml config --quiet cannot be run meaningfully from the Mac without the production env set; it stops on missing required runtime secrets/empty path vars. The new Honcho interpolation is covered by the contract test.
Ollama Cloud live compatibility smoke is prepared but blocked locally by missing Infisical login; run it on RS2000 before deploy per the runbook.

Spec sources read

compose/apps/compose.yaml — Honcho env/config surface.
modules/honcho-api/module.yaml — secret reference update.
modules/honcho-deriver/module.yaml — secret reference update.
scripts/cutover/backup-before-apply.sh and scripts/cutover/README.md — backup-before package.
control-plane/platformctl/tests/test_validate.py, test_apply_env_file.py, test_forgejo_ci_scripts_contract.py — nearby test patterns.
Honcho docs: https://honcho.dev/docs/v3/contributing/configuration.md and https://honcho.dev/docs/v3/contributing/changing-embeddings.md — provider override and embedding migration behavior.
Ollama docs: https://docs.ollama.com/api/openai-compatibility and https://docs.ollama.com/cloud — OpenAI-compatible path target.
BGE-M3 model card: https://huggingface.co/BAAI/bge-m3 — expected 1024-dimensional embedding target.

Refs #357
Refs pdurlej/iskra-openclaw#293

Canary status: missing — required Forgejo checks and canary review before production deploy ## Summary Prepare the Honcho LLM-only provider switch from the current OpenAI-style `gpt-5.4-mini` path to Ollama Cloud `gemma4:31b-cloud`, while deliberately keeping production embeddings on `text-embedding-3-small` until the BGE-M3 vector-space migration is designed. This PR does not mutate production by itself. It prepares desired-state config, backup metadata support, synthetic compatibility smokes, and the operator runbook for the morning deploy. ## Canary Context Pack ### Product story Honcho memory work is already sent to external OpenAI-style providers. The owner wants the reasoning/summary/dialectic LLM path moved to the preferred Ollama Cloud provider now, without delaying on the more complex embedding migration. ### What changed - Honcho LLM/reasoning defaults now point at `gemma4:31b-cloud` through OpenAI-compatible per-feature overrides. - LLM endpoint defaults to `https://ollama.com/v1`; per-feature `API_KEY_ENV` points at `OLLAMA_CLOUD_API_KEY`. - Embeddings stay unchanged on `text-embedding-3-small`; no BGE-M3 production wiring and no `EMBEDDING_VECTOR_DIMENSIONS=1024`. - `DERIVER_FLUSH_ENABLED` keeps normal-write fallback for this provider switch. - Added synthetic Ollama/Gemma compatibility smoke with chat, JSON, and tool-call checks. - Added synthetic BGE-M3 embedding smoke for future internal/local endpoint validation only. - Backup-before script now emits protected `.metadata.json` sidecars with path, size, sha256, class, container, and exit code. - Added Honcho switch runbook and cutover evidence artifact. - Opened #357 for the BGE-M3 vector-space migration design. ### Why it changed Current RS2000 truth shows Honcho already uses external LLM and embedding processing, and the database already has 1536-dimensional vectors. The fast safe target is therefore LLM-only. BGE-M3 is prepared but blocked from production until retrieval can avoid mixing vector spaces. ### Files touched - `compose/apps/compose.yaml` - `modules/honcho-api/module.yaml` - `modules/honcho-deriver/module.yaml` - `scripts/cutover/backup-before-apply.sh` - `scripts/cutover/README.md` - `scripts/honcho/ollama-gemma-compat-smoke.py` - `scripts/honcho/bge-m3-embedding-smoke.py` - `runbooks/honcho-ollama-gemma-switch.md` - `state/cutover/honcho-gemma-ollama-prep.md` - `control-plane/platformctl/tests/test_honcho_ollama_contract.py` ### Relevant context - Honcho supports OpenAI-compatible endpoints by keeping `transport=openai` and setting `MODEL_CONFIG__OVERRIDES__BASE_URL` / `API_KEY_ENV`. - Honcho embedding changes are not in-place safe once a database contains existing vectors. - RS2000 evidence: `documents.embedding vector(1536)` has 26141 rows; `message_embeddings.embedding vector(1536)` has 13558 rows. ### Runtime evidence Read-only RS2000 checks before this PR: - `honcho-api` and `honcho-deriver` envs show LLM paths on `gpt-5.4-mini`, transport `openai`. - `DERIVER_FLUSH_ENABLED=true` is active. - Embeddings use `text-embedding-3-small`, transport `openai`. - `documents` and `message_embeddings` are both `vector(1536)`, with all counted rows at 1536 dimensions. - No local RS2000 Ollama listener/container was observed on port 11434. ### Known constraints - `OLLAMA_CLOUD_API_KEY` must be provided by runtime/Infisical only; no key value is stored in repo. - Local Mac Infisical CLI lacked an active login session, so the live Ollama Cloud compatibility smoke was not run from this workstation. - The runbook makes the compatibility smoke a pre-deploy gate before Honcho is switched. ### Explicit out-of-scope - No production BGE-M3 switch. - No vector schema migration. - No data re-embedding. - No production backup execution from this PR. - No production Honcho deploy from this PR. ### Requested decision Approve the prepared LLM-only provider switch package. Production deploy remains gated on: Ollama/Gemma compatibility smoke, Honcho Postgres/Redis backups, release-root promotion, and sequential Honcho smokes. ### Merge blockers - A reviewer finds that Honcho will not honor `MODEL_CONFIG__OVERRIDES__API_KEY_ENV` on the selected config path. - A reviewer finds BGE-M3 production wiring accidentally enabled. - A reviewer finds the provider switch would drop the current embedding path. - A reviewer finds the backup metadata sidecar leaks backup content or secrets. ## Verification - `control-plane/.venv/bin/python -m pytest -q control-plane/platformctl/tests/test_honcho_ollama_contract.py control-plane/platformctl/tests/test_validate.py control-plane/platformctl/tests/test_apply_env_file.py control-plane/platformctl/tests/test_forgejo_ci_scripts_contract.py` — 50 passed - `PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-api/module.yaml` — ok - `PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-deriver/module.yaml` — ok - `PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-postgres/module.yaml` — ok - `PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-redis/module.yaml` — ok - `bash -n scripts/cutover/backup-before-apply.sh` - `python3 -m py_compile scripts/honcho/ollama-gemma-compat-smoke.py scripts/honcho/bge-m3-embedding-smoke.py` - `git diff --check` Notes: - `docker compose -f compose/apps/compose.yaml config --quiet` cannot be run meaningfully from the Mac without the production env set; it stops on missing required runtime secrets/empty path vars. The new Honcho interpolation is covered by the contract test. - Ollama Cloud live compatibility smoke is prepared but blocked locally by missing Infisical login; run it on RS2000 before deploy per the runbook. ## Spec sources read - `compose/apps/compose.yaml` — Honcho env/config surface. - `modules/honcho-api/module.yaml` — secret reference update. - `modules/honcho-deriver/module.yaml` — secret reference update. - `scripts/cutover/backup-before-apply.sh` and `scripts/cutover/README.md` — backup-before package. - `control-plane/platformctl/tests/test_validate.py`, `test_apply_env_file.py`, `test_forgejo_ci_scripts_contract.py` — nearby test patterns. - Honcho docs: `https://honcho.dev/docs/v3/contributing/configuration.md` and `https://honcho.dev/docs/v3/contributing/changing-embeddings.md` — provider override and embedding migration behavior. - Ollama docs: `https://docs.ollama.com/api/openai-compatibility` and `https://docs.ollama.com/cloud` — OpenAI-compatible path target. - BGE-M3 model card: `https://huggingface.co/BAAI/bge-m3` — expected 1024-dimensional embedding target. Refs #357 Refs pdurlej/iskra-openclaw#293

codex added 1 commit

2026-05-18 01:44:13 +02:00

feat(honcho): prepare Ollama Gemma LLM switch

canary-required / collect-diff (pull_request) Successful in 3s

Details

patchwarden-pr-sanity / collect-diff (pull_request) Successful in 3s

Details

platformctl plan / auto-apply scope (pull_request) Successful in 19s

Details

pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 17s

Details

python-ci / Python 3.11 (pull_request) Successful in 37s