feat(honcho): prepare Ollama Gemma LLM switch #358
No reviewers
Labels
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
pdurlej/platform!358
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "codex/honcho-gemma-ollama-prep"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Canary status: missing — required Forgejo checks and canary review before production deploy
Summary
Prepare the Honcho LLM-only provider switch from the current OpenAI-style
gpt-5.4-minipath to Ollama Cloudgemma4:31b-cloud, while deliberately keeping production embeddings ontext-embedding-3-smalluntil the BGE-M3 vector-space migration is designed.This PR does not mutate production by itself. It prepares desired-state config, backup metadata support, synthetic compatibility smokes, and the operator runbook for the morning deploy.
Canary Context Pack
Product story
Honcho memory work is already sent to external OpenAI-style providers. The owner wants the reasoning/summary/dialectic LLM path moved to the preferred Ollama Cloud provider now, without delaying on the more complex embedding migration.
What changed
gemma4:31b-cloudthrough OpenAI-compatible per-feature overrides.https://ollama.com/v1; per-featureAPI_KEY_ENVpoints atOLLAMA_CLOUD_API_KEY.text-embedding-3-small; no BGE-M3 production wiring and noEMBEDDING_VECTOR_DIMENSIONS=1024.DERIVER_FLUSH_ENABLEDkeeps normal-write fallback for this provider switch..metadata.jsonsidecars with path, size, sha256, class, container, and exit code.Why it changed
Current RS2000 truth shows Honcho already uses external LLM and embedding processing, and the database already has 1536-dimensional vectors. The fast safe target is therefore LLM-only. BGE-M3 is prepared but blocked from production until retrieval can avoid mixing vector spaces.
Files touched
compose/apps/compose.yamlmodules/honcho-api/module.yamlmodules/honcho-deriver/module.yamlscripts/cutover/backup-before-apply.shscripts/cutover/README.mdscripts/honcho/ollama-gemma-compat-smoke.pyscripts/honcho/bge-m3-embedding-smoke.pyrunbooks/honcho-ollama-gemma-switch.mdstate/cutover/honcho-gemma-ollama-prep.mdcontrol-plane/platformctl/tests/test_honcho_ollama_contract.pyRelevant context
transport=openaiand settingMODEL_CONFIG__OVERRIDES__BASE_URL/API_KEY_ENV.documents.embedding vector(1536)has 26141 rows;message_embeddings.embedding vector(1536)has 13558 rows.Runtime evidence
Read-only RS2000 checks before this PR:
honcho-apiandhoncho-deriverenvs show LLM paths ongpt-5.4-mini, transportopenai.DERIVER_FLUSH_ENABLED=trueis active.text-embedding-3-small, transportopenai.documentsandmessage_embeddingsare bothvector(1536), with all counted rows at 1536 dimensions.Known constraints
OLLAMA_CLOUD_API_KEYmust be provided by runtime/Infisical only; no key value is stored in repo.Explicit out-of-scope
Requested decision
Approve the prepared LLM-only provider switch package. Production deploy remains gated on: Ollama/Gemma compatibility smoke, Honcho Postgres/Redis backups, release-root promotion, and sequential Honcho smokes.
Merge blockers
MODEL_CONFIG__OVERRIDES__API_KEY_ENVon the selected config path.Verification
control-plane/.venv/bin/python -m pytest -q control-plane/platformctl/tests/test_honcho_ollama_contract.py control-plane/platformctl/tests/test_validate.py control-plane/platformctl/tests/test_apply_env_file.py control-plane/platformctl/tests/test_forgejo_ci_scripts_contract.py— 50 passedPYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-api/module.yaml— okPYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-deriver/module.yaml— okPYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-postgres/module.yaml— okPYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate --strict-v2 modules/honcho-redis/module.yaml— okbash -n scripts/cutover/backup-before-apply.shpython3 -m py_compile scripts/honcho/ollama-gemma-compat-smoke.py scripts/honcho/bge-m3-embedding-smoke.pygit diff --checkNotes:
docker compose -f compose/apps/compose.yaml config --quietcannot be run meaningfully from the Mac without the production env set; it stops on missing required runtime secrets/empty path vars. The new Honcho interpolation is covered by the contract test.Spec sources read
compose/apps/compose.yaml— Honcho env/config surface.modules/honcho-api/module.yaml— secret reference update.modules/honcho-deriver/module.yaml— secret reference update.scripts/cutover/backup-before-apply.shandscripts/cutover/README.md— backup-before package.control-plane/platformctl/tests/test_validate.py,test_apply_env_file.py,test_forgejo_ci_scripts_contract.py— nearby test patterns.https://honcho.dev/docs/v3/contributing/configuration.mdandhttps://honcho.dev/docs/v3/contributing/changing-embeddings.md— provider override and embedding migration behavior.https://docs.ollama.com/api/openai-compatibilityandhttps://docs.ollama.com/cloud— OpenAI-compatible path target.https://huggingface.co/BAAI/bge-m3— expected 1024-dimensional embedding target.Refs #357
Refs pdurlej/iskra-openclaw#293