docs(honcho): design embedding-space migration #377

Merged
pdurlej merged 1 commit from codex/domain/honcho-embedding-space-design-v2 into main 2026-05-18 23:18:43 +02:00
Collaborator

Canary status: missing - security-sensitive design/tooling change; full review before merge

Canary Context Pack

Product story

Honcho now runs LLM paths through Gemma/Ollama, but its durable memory embeddings still live in the existing OpenAI 1536-dimensional space. The operator wants a real path toward self-hosted embeddings for mixed Polish/English memory work without breaking current recall.

What changed

  • Added state/cutover/honcho-embedding-space-migration.md with the #357 migration contract.
  • Kept production embeddings on text-embedding-3-small / 1536d.
  • Named BGE-M3 as the first self-host text baseline and Qwen3-Embedding-0.6B as the first challenger.
  • Treated Jina v4 / multimodal as a later extraction/index lane, not part of this text-vector switch.
  • Extended scripts/honcho/bge-m3-embedding-smoke.py so the metadata-only smoke can test candidate models with explicit expected dimensions and endpoint class.
  • Added tests that the smoke blocks non-local endpoints by default and does not emit sentinel text.
  • Linked the migration artifact from the Honcho runbook, closeout plan, and status file.

Why it changed

#357 needed to move from a broad “BGE-M3 migration” idea to an executable embedding-space plan: additive schema, versioned routing, shadow validation, backup-before, rollback, and delayed deletion. The key invariant is that retrieval never mixes vector spaces.

Files touched

  • state/cutover/honcho-embedding-space-migration.md
  • scripts/honcho/bge-m3-embedding-smoke.py
  • control-plane/platformctl/tests/test_honcho_ollama_contract.py
  • runbooks/honcho-ollama-gemma-switch.md
  • state/cutover/honcho-closeout-plan.md
  • state/STATUS_NOW.md

Relevant context

Runtime evidence

No production mutation was performed. The smoke was exercised only in its safe blocked mode:

scripts/honcho/bge-m3-embedding-smoke.py --endpoint https://example.invalid --json /tmp/honcho-embedding-smoke-blocked-v2.json
exit=2
status=blocked
expected_dimension=1024
model=bge-m3

Known constraints

  • Do not set EMBEDDING_VECTOR_DIMENSIONS=1024 in production Honcho yet.
  • Do not drop, resize, overwrite, or repurpose current 1536d vectors.
  • Do not store raw user messages, prompts, emails, memory snippets, transcripts, or model responses in evidence.
  • Candidate storage must be additive and versioned by embedding-space key.

Explicit out-of-scope

  • Standing up BGE-M3/Qwen3 runtime endpoints.
  • Adding database migrations.
  • Backfilling candidate vectors.
  • Switching Honcho reads or writes away from the active OpenAI 1536d embedding space.
  • Closing #357; this PR makes #357 executable, it does not complete the migration.

Requested decision

Approve the design and metadata-only smoke contract so the next fork can implement E1/E2 without reopening the architecture question.

Merge blockers

  • Any production wiring to BGE-M3/Qwen3 in this PR.
  • Any path that mixes distances from different embedding spaces.
  • Any evidence/logging that can emit private content.
  • Any schema proposal that mutates current 1536d columns in place.

Spec sources read

  • state/cutover/honcho-closeout-plan.md - H4 boundary and closeout definition.
  • state/cutover/honcho-gemma-ollama-prep.md - current Honcho evidence and embedding boundary.
  • runbooks/honcho-ollama-gemma-switch.md - current provider-switch runbook.
  • scripts/honcho/bge-m3-embedding-smoke.py - existing synthetic smoke.
  • control-plane/platformctl/tests/test_honcho_ollama_contract.py - existing Honcho contract tests.
  • Platform issue #357 - acceptance criteria.
  • Current model cards listed above - candidate characteristics and dimensions.

Validation

  • git diff --check - pass.
  • /Users/pd/Developer/iskra-platform-2026-04-30/control-plane/.venv/bin/python -m pytest control-plane/platformctl/tests/test_honcho_ollama_contract.py -q - 7 passed.
  • Safe blocked smoke against non-local endpoint - exit 2 with metadata-only JSON.

Refs #357

Canary status: missing - security-sensitive design/tooling change; full review before merge ## Canary Context Pack ### Product story Honcho now runs LLM paths through Gemma/Ollama, but its durable memory embeddings still live in the existing OpenAI 1536-dimensional space. The operator wants a real path toward self-hosted embeddings for mixed Polish/English memory work without breaking current recall. ### What changed - Added `state/cutover/honcho-embedding-space-migration.md` with the #357 migration contract. - Kept production embeddings on `text-embedding-3-small` / 1536d. - Named BGE-M3 as the first self-host text baseline and Qwen3-Embedding-0.6B as the first challenger. - Treated Jina v4 / multimodal as a later extraction/index lane, not part of this text-vector switch. - Extended `scripts/honcho/bge-m3-embedding-smoke.py` so the metadata-only smoke can test candidate models with explicit expected dimensions and endpoint class. - Added tests that the smoke blocks non-local endpoints by default and does not emit sentinel text. - Linked the migration artifact from the Honcho runbook, closeout plan, and status file. ### Why it changed #357 needed to move from a broad “BGE-M3 migration” idea to an executable embedding-space plan: additive schema, versioned routing, shadow validation, backup-before, rollback, and delayed deletion. The key invariant is that retrieval never mixes vector spaces. ### Files touched - `state/cutover/honcho-embedding-space-migration.md` - `scripts/honcho/bge-m3-embedding-smoke.py` - `control-plane/platformctl/tests/test_honcho_ollama_contract.py` - `runbooks/honcho-ollama-gemma-switch.md` - `state/cutover/honcho-closeout-plan.md` - `state/STATUS_NOW.md` ### Relevant context - Platform issue #357. - Honcho closeout plan H4. - Existing evidence: `documents.embedding vector(1536)` with 26141 rows and `message_embeddings.embedding vector(1536)` with 13558 rows. - Active embedding model remains `text-embedding-3-small`. - BGE-M3 model card: https://huggingface.co/BAAI/bge-m3 - Qwen3-Embedding-0.6B model card: https://huggingface.co/Qwen/Qwen3-Embedding-0.6B - Jina Embeddings v4 model card: https://huggingface.co/jinaai/jina-embeddings-v4 ### Runtime evidence No production mutation was performed. The smoke was exercised only in its safe blocked mode: ```text scripts/honcho/bge-m3-embedding-smoke.py --endpoint https://example.invalid --json /tmp/honcho-embedding-smoke-blocked-v2.json exit=2 status=blocked expected_dimension=1024 model=bge-m3 ``` ### Known constraints - Do not set `EMBEDDING_VECTOR_DIMENSIONS=1024` in production Honcho yet. - Do not drop, resize, overwrite, or repurpose current 1536d vectors. - Do not store raw user messages, prompts, emails, memory snippets, transcripts, or model responses in evidence. - Candidate storage must be additive and versioned by embedding-space key. ### Explicit out-of-scope - Standing up BGE-M3/Qwen3 runtime endpoints. - Adding database migrations. - Backfilling candidate vectors. - Switching Honcho reads or writes away from the active OpenAI 1536d embedding space. - Closing #357; this PR makes #357 executable, it does not complete the migration. ### Requested decision Approve the design and metadata-only smoke contract so the next fork can implement E1/E2 without reopening the architecture question. ### Merge blockers - Any production wiring to BGE-M3/Qwen3 in this PR. - Any path that mixes distances from different embedding spaces. - Any evidence/logging that can emit private content. - Any schema proposal that mutates current 1536d columns in place. ## Spec sources read - `state/cutover/honcho-closeout-plan.md` - H4 boundary and closeout definition. - `state/cutover/honcho-gemma-ollama-prep.md` - current Honcho evidence and embedding boundary. - `runbooks/honcho-ollama-gemma-switch.md` - current provider-switch runbook. - `scripts/honcho/bge-m3-embedding-smoke.py` - existing synthetic smoke. - `control-plane/platformctl/tests/test_honcho_ollama_contract.py` - existing Honcho contract tests. - Platform issue #357 - acceptance criteria. - Current model cards listed above - candidate characteristics and dimensions. ## Validation - `git diff --check` - pass. - `/Users/pd/Developer/iskra-platform-2026-04-30/control-plane/.venv/bin/python -m pytest control-plane/platformctl/tests/test_honcho_ollama_contract.py -q` - 7 passed. - Safe blocked smoke against non-local endpoint - exit 2 with metadata-only JSON. Refs #357
docs(honcho): design embedding space migration
All checks were successful
canary-required / collect-diff (pull_request) Successful in 3s
base-is-main / guard (pull_request) Successful in 1s
platformctl plan / auto-apply scope (pull_request) Successful in 20s
pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 16s
python-ci / Python 3.13 (pull_request) Successful in 39s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 3s
python-ci / Python 3.11 (pull_request) Successful in 37s
python-ci / Python 3.12 (pull_request) Successful in 40s
canary-required / canary (pull_request) Successful in 17s
patchwarden-pr-sanity / sanity (pull_request) Successful in 20s
13f9282685
Sign in to join this conversation.
No reviewers
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform!377
No description provided.