feat(honcho): prepare Ollama Gemma LLM switch #358

Merged
pdurlej merged 1 commit from codex/honcho-gemma-ollama-prep into main 2026-05-18 01:54:24 +02:00
Collaborator

Canary status: missing — required Forgejo checks and canary review before production deploy

Summary

Prepare the Honcho LLM-only provider switch from the current OpenAI-style gpt-5.4-mini path to Ollama Cloud gemma4:31b-cloud, while deliberately keeping production embeddings on text-embedding-3-small until the BGE-M3 vector-space migration is designed.

This PR does not mutate production by itself. It prepares desired-state config, backup metadata support, synthetic compatibility smokes, and the operator runbook for the morning deploy.

Canary Context Pack

Product story

Honcho memory work is already sent to external OpenAI-style providers. The owner wants the reasoning/summary/dialectic LLM path moved to the preferred Ollama Cloud provider now, without delaying on the more complex embedding migration.

What changed

  • Honcho LLM/reasoning defaults now point at gemma4:31b-cloud through OpenAI-compatible per-feature overrides.
  • LLM endpoint defaults to https://ollama.com/v1; per-feature API_KEY_ENV points at OLLAMA_CLOUD_API_KEY.
  • Embeddings stay unchanged on text-embedding-3-small; no BGE-M3 production wiring and no EMBEDDING_VECTOR_DIMENSIONS=1024.
  • DERIVER_FLUSH_ENABLED keeps normal-write fallback for this provider switch.
  • Added synthetic Ollama/Gemma compatibility smoke with chat, JSON, and tool-call checks.
  • Added synthetic BGE-M3 embedding smoke for future internal/local endpoint validation only.
  • Backup-before script now emits protected .metadata.json sidecars with path, size, sha256, class, container, and exit code.
  • Added Honcho switch runbook and cutover evidence artifact.
  • Opened #357 for the BGE-M3 vector-space migration design.

Why it changed

Current RS2000 truth shows Honcho already uses external LLM and embedding processing, and the database already has 1536-dimensional vectors. The fast safe target is therefore LLM-only. BGE-M3 is prepared but blocked from production until retrieval can avoid mixing vector spaces.

Files touched

  • compose/apps/compose.yaml
  • modules/honcho-api/module.yaml
  • modules/honcho-deriver/module.yaml
  • scripts/cutover/backup-before-apply.sh
  • scripts/cutover/README.md
  • scripts/honcho/ollama-gemma-compat-smoke.py
  • scripts/honcho/bge-m3-embedding-smoke.py
  • runbooks/honcho-ollama-gemma-switch.md
  • state/cutover/honcho-gemma-ollama-prep.md
  • control-plane/platformctl/tests/test_honcho_ollama_contract.py

Relevant context

  • Honcho supports OpenAI-compatible endpoints by keeping transport=openai and setting MODEL_CONFIG__OVERRIDES__BASE_URL / API_KEY_ENV.
  • Honcho embedding changes are not in-place safe once a database contains existing vectors.
  • RS2000 evidence: documents.embedding vector(1536) has 26141 rows; message_embeddings.embedding vector(1536) has 13558 rows.

Runtime evidence

Read-only RS2000 checks before this PR:

  • honcho-api and honcho-deriver envs show LLM paths on gpt-5.4-mini, transport openai.
  • DERIVER_FLUSH_ENABLED=true is active.
  • Embeddings use text-embedding-3-small, transport openai.
  • documents and message_embeddings are both vector(1536), with all counted rows at 1536 dimensions.
  • No local RS2000 Ollama listener/container was observed on port 11434.

Known constraints

  • OLLAMA_CLOUD_API_KEY must be provided by runtime/Infisical only; no key value is stored in repo.
  • Local Mac Infisical CLI lacked an active login session, so the live Ollama Cloud compatibility smoke was not run from this workstation.
  • The runbook makes the compatibility smoke a pre-deploy gate before Honcho is switched.

Explicit out-of-scope

  • No production BGE-M3 switch.
  • No vector schema migration.
  • No data re-embedding.
  • No production backup execution from this PR.
  • No production Honcho deploy from this PR.

Requested decision

Approve the prepared LLM-only provider switch package. Production deploy remains gated on: Ollama/Gemma compatibility smoke, Honcho Postgres/Redis backups, release-root promotion, and sequential Honcho smokes.

Merge blockers

  • A reviewer finds that Honcho will not honor MODEL_CONFIG__OVERRIDES__API_KEY_ENV on the selected config path.
  • A reviewer finds BGE-M3 production wiring accidentally enabled.
  • A reviewer finds the provider switch would drop the current embedding path.
  • A reviewer finds the backup metadata sidecar leaks backup content or secrets.

Verification

  • control-plane/.venv/bin/python -m pytest -q control-plane/platformctl/tests/test_honcho_ollama_contract.py control-plane/platformctl/tests/test_validate.py control-plane/platformctl/tests/test_apply_env_file.py control-plane/platformctl/tests/test_forgejo_ci_scripts_contract.py — 50 passed
  • PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-api/module.yaml — ok
  • PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-deriver/module.yaml — ok
  • PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-postgres/module.yaml — ok
  • PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-redis/module.yaml — ok
  • bash -n scripts/cutover/backup-before-apply.sh
  • python3 -m py_compile scripts/honcho/ollama-gemma-compat-smoke.py scripts/honcho/bge-m3-embedding-smoke.py
  • git diff --check

Notes:

  • docker compose -f compose/apps/compose.yaml config --quiet cannot be run meaningfully from the Mac without the production env set; it stops on missing required runtime secrets/empty path vars. The new Honcho interpolation is covered by the contract test.
  • Ollama Cloud live compatibility smoke is prepared but blocked locally by missing Infisical login; run it on RS2000 before deploy per the runbook.

Spec sources read

  • compose/apps/compose.yaml — Honcho env/config surface.
  • modules/honcho-api/module.yaml — secret reference update.
  • modules/honcho-deriver/module.yaml — secret reference update.
  • scripts/cutover/backup-before-apply.sh and scripts/cutover/README.md — backup-before package.
  • control-plane/platformctl/tests/test_validate.py, test_apply_env_file.py, test_forgejo_ci_scripts_contract.py — nearby test patterns.
  • Honcho docs: https://honcho.dev/docs/v3/contributing/configuration.md and https://honcho.dev/docs/v3/contributing/changing-embeddings.md — provider override and embedding migration behavior.
  • Ollama docs: https://docs.ollama.com/api/openai-compatibility and https://docs.ollama.com/cloud — OpenAI-compatible path target.
  • BGE-M3 model card: https://huggingface.co/BAAI/bge-m3 — expected 1024-dimensional embedding target.

Refs #357
Refs pdurlej/iskra-openclaw#293

Canary status: missing — required Forgejo checks and canary review before production deploy ## Summary Prepare the Honcho LLM-only provider switch from the current OpenAI-style `gpt-5.4-mini` path to Ollama Cloud `gemma4:31b-cloud`, while deliberately keeping production embeddings on `text-embedding-3-small` until the BGE-M3 vector-space migration is designed. This PR does not mutate production by itself. It prepares desired-state config, backup metadata support, synthetic compatibility smokes, and the operator runbook for the morning deploy. ## Canary Context Pack ### Product story Honcho memory work is already sent to external OpenAI-style providers. The owner wants the reasoning/summary/dialectic LLM path moved to the preferred Ollama Cloud provider now, without delaying on the more complex embedding migration. ### What changed - Honcho LLM/reasoning defaults now point at `gemma4:31b-cloud` through OpenAI-compatible per-feature overrides. - LLM endpoint defaults to `https://ollama.com/v1`; per-feature `API_KEY_ENV` points at `OLLAMA_CLOUD_API_KEY`. - Embeddings stay unchanged on `text-embedding-3-small`; no BGE-M3 production wiring and no `EMBEDDING_VECTOR_DIMENSIONS=1024`. - `DERIVER_FLUSH_ENABLED` keeps normal-write fallback for this provider switch. - Added synthetic Ollama/Gemma compatibility smoke with chat, JSON, and tool-call checks. - Added synthetic BGE-M3 embedding smoke for future internal/local endpoint validation only. - Backup-before script now emits protected `.metadata.json` sidecars with path, size, sha256, class, container, and exit code. - Added Honcho switch runbook and cutover evidence artifact. - Opened #357 for the BGE-M3 vector-space migration design. ### Why it changed Current RS2000 truth shows Honcho already uses external LLM and embedding processing, and the database already has 1536-dimensional vectors. The fast safe target is therefore LLM-only. BGE-M3 is prepared but blocked from production until retrieval can avoid mixing vector spaces. ### Files touched - `compose/apps/compose.yaml` - `modules/honcho-api/module.yaml` - `modules/honcho-deriver/module.yaml` - `scripts/cutover/backup-before-apply.sh` - `scripts/cutover/README.md` - `scripts/honcho/ollama-gemma-compat-smoke.py` - `scripts/honcho/bge-m3-embedding-smoke.py` - `runbooks/honcho-ollama-gemma-switch.md` - `state/cutover/honcho-gemma-ollama-prep.md` - `control-plane/platformctl/tests/test_honcho_ollama_contract.py` ### Relevant context - Honcho supports OpenAI-compatible endpoints by keeping `transport=openai` and setting `MODEL_CONFIG__OVERRIDES__BASE_URL` / `API_KEY_ENV`. - Honcho embedding changes are not in-place safe once a database contains existing vectors. - RS2000 evidence: `documents.embedding vector(1536)` has 26141 rows; `message_embeddings.embedding vector(1536)` has 13558 rows. ### Runtime evidence Read-only RS2000 checks before this PR: - `honcho-api` and `honcho-deriver` envs show LLM paths on `gpt-5.4-mini`, transport `openai`. - `DERIVER_FLUSH_ENABLED=true` is active. - Embeddings use `text-embedding-3-small`, transport `openai`. - `documents` and `message_embeddings` are both `vector(1536)`, with all counted rows at 1536 dimensions. - No local RS2000 Ollama listener/container was observed on port 11434. ### Known constraints - `OLLAMA_CLOUD_API_KEY` must be provided by runtime/Infisical only; no key value is stored in repo. - Local Mac Infisical CLI lacked an active login session, so the live Ollama Cloud compatibility smoke was not run from this workstation. - The runbook makes the compatibility smoke a pre-deploy gate before Honcho is switched. ### Explicit out-of-scope - No production BGE-M3 switch. - No vector schema migration. - No data re-embedding. - No production backup execution from this PR. - No production Honcho deploy from this PR. ### Requested decision Approve the prepared LLM-only provider switch package. Production deploy remains gated on: Ollama/Gemma compatibility smoke, Honcho Postgres/Redis backups, release-root promotion, and sequential Honcho smokes. ### Merge blockers - A reviewer finds that Honcho will not honor `MODEL_CONFIG__OVERRIDES__API_KEY_ENV` on the selected config path. - A reviewer finds BGE-M3 production wiring accidentally enabled. - A reviewer finds the provider switch would drop the current embedding path. - A reviewer finds the backup metadata sidecar leaks backup content or secrets. ## Verification - `control-plane/.venv/bin/python -m pytest -q control-plane/platformctl/tests/test_honcho_ollama_contract.py control-plane/platformctl/tests/test_validate.py control-plane/platformctl/tests/test_apply_env_file.py control-plane/platformctl/tests/test_forgejo_ci_scripts_contract.py` — 50 passed - `PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-api/module.yaml` — ok - `PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-deriver/module.yaml` — ok - `PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-postgres/module.yaml` — ok - `PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-redis/module.yaml` — ok - `bash -n scripts/cutover/backup-before-apply.sh` - `python3 -m py_compile scripts/honcho/ollama-gemma-compat-smoke.py scripts/honcho/bge-m3-embedding-smoke.py` - `git diff --check` Notes: - `docker compose -f compose/apps/compose.yaml config --quiet` cannot be run meaningfully from the Mac without the production env set; it stops on missing required runtime secrets/empty path vars. The new Honcho interpolation is covered by the contract test. - Ollama Cloud live compatibility smoke is prepared but blocked locally by missing Infisical login; run it on RS2000 before deploy per the runbook. ## Spec sources read - `compose/apps/compose.yaml` — Honcho env/config surface. - `modules/honcho-api/module.yaml` — secret reference update. - `modules/honcho-deriver/module.yaml` — secret reference update. - `scripts/cutover/backup-before-apply.sh` and `scripts/cutover/README.md` — backup-before package. - `control-plane/platformctl/tests/test_validate.py`, `test_apply_env_file.py`, `test_forgejo_ci_scripts_contract.py` — nearby test patterns. - Honcho docs: `https://honcho.dev/docs/v3/contributing/configuration.md` and `https://honcho.dev/docs/v3/contributing/changing-embeddings.md` — provider override and embedding migration behavior. - Ollama docs: `https://docs.ollama.com/api/openai-compatibility` and `https://docs.ollama.com/cloud` — OpenAI-compatible path target. - BGE-M3 model card: `https://huggingface.co/BAAI/bge-m3` — expected 1024-dimensional embedding target. Refs #357 Refs pdurlej/iskra-openclaw#293
feat(honcho): prepare Ollama Gemma LLM switch
All checks were successful
canary-required / collect-diff (pull_request) Successful in 3s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 3s
platformctl plan / auto-apply scope (pull_request) Successful in 19s
pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 17s
python-ci / Python 3.11 (pull_request) Successful in 37s
python-ci / Python 3.12 (pull_request) Successful in 38s
python-ci / Python 3.13 (pull_request) Successful in 36s
canary-required / canary (pull_request) Successful in 17s
base-is-main / guard (pull_request) Successful in 1s
patchwarden-pr-sanity / sanity (pull_request) Successful in 19s
da8ae2adf8
Sign in to join this conversation.
No reviewers
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform!358
No description provided.