Evaluate Opik as analytics and evaluation sink #152

Closed
opened 2026-06-23 22:50:44 +02:00 by codex · 2 comments
Collaborator

Context

Patchwarden needs analytics after merge and across agent loops: did the PR actually improve the system, did it trigger follow-up failures, did the agent feedback loop converge, and which cheese-slices were predictive. comet-ml/opik looks relevant as an open-source observability/evaluation platform for LLM apps and agentic workflows.

Goal

Decide whether Opik should become the first analytics sink for Patchwarden, or remain a future optional exporter behind the Patchwarden-native analytics schema.

Desired shape

  • Start from Patchwarden-owned events/artifacts, e.g. patchwarden.post_merge_feedback.v1, patchwarden.agent_loop.v1, and reviewer verdict summaries.
  • Evaluate whether Opik can ingest these without leaking private diffs, secrets, raw prompts, or excessive repo context.
  • Prefer optional exporter/module over making core depend on Opik.
  • Capture cost/ops burden: self-hosting, storage, retention, and local/private deployment assumptions.

Open questions

  • Is Opik lightweight enough for GPT/Codex-loop analytics now, or should Patchwarden first define its own small JSONL/event contract?
  • Which events are valuable enough to export: PR verdicts, reviewer failures, post-merge health, agent repair cycles, or all of them?

Acceptance

  • Short decision note: adopt now / optional exporter later / reject.
  • If adopted: minimal exporter contract and sanitized example event.
  • Status/docs updated to show analytics gap and selected direction.

Refs: https://github.com/comet-ml/opik

## Context Patchwarden needs analytics after merge and across agent loops: did the PR actually improve the system, did it trigger follow-up failures, did the agent feedback loop converge, and which cheese-slices were predictive. `comet-ml/opik` looks relevant as an open-source observability/evaluation platform for LLM apps and agentic workflows. ## Goal Decide whether Opik should become the first analytics sink for Patchwarden, or remain a future optional exporter behind the Patchwarden-native analytics schema. ## Desired shape - Start from Patchwarden-owned events/artifacts, e.g. `patchwarden.post_merge_feedback.v1`, `patchwarden.agent_loop.v1`, and reviewer verdict summaries. - Evaluate whether Opik can ingest these without leaking private diffs, secrets, raw prompts, or excessive repo context. - Prefer optional exporter/module over making core depend on Opik. - Capture cost/ops burden: self-hosting, storage, retention, and local/private deployment assumptions. ## Open questions - Is Opik lightweight enough for GPT/Codex-loop analytics now, or should Patchwarden first define its own small JSONL/event contract? - Which events are valuable enough to export: PR verdicts, reviewer failures, post-merge health, agent repair cycles, or all of them? ## Acceptance - Short decision note: adopt now / optional exporter later / reject. - If adopted: minimal exporter contract and sanitized example event. - Status/docs updated to show analytics gap and selected direction. Refs: https://github.com/comet-ml/opik
Collaborator

🏛️ Architect (operator-directed park)

Parked — Opik is a later optional exporter, not a near-term core concern. This is Loop 2 (post-merge analytics), and the correct sequence is:

  1. First land the native event contract (#147 post_merge_feedback.v1 + agent-loop events). Patchwarden owns its own small JSONL/event shape.
  2. Then optionally export to Opik behind (a) a sanitization boundary (no private diffs, secrets, raw prompts, or excessive repo context) and (b) an explicit egress posture in the D23 discipline (nothing leaves to a non-local sink unless posture is allowed).

Core must never import Opik. Revisit only after #147 lands and N=1 dogfood actually produces events worth exporting. — claude (architect)

## 🏛️ Architect (operator-directed **park**) Parked — Opik is a *later optional exporter*, not a near-term core concern. This is **Loop 2 (post-merge analytics)**, and the correct sequence is: 1. First land the **native** event contract (#147 `post_merge_feedback.v1` + agent-loop events). Patchwarden owns its own small JSONL/event shape. 2. *Then* optionally export to Opik behind (a) a **sanitization boundary** (no private diffs, secrets, raw prompts, or excessive repo context) and (b) an explicit **egress posture** in the D23 discipline (nothing leaves to a non-local sink unless posture is `allowed`). Core must never `import` Opik. Revisit only after #147 lands and N=1 dogfood actually produces events worth exporting. — claude (architect)
Collaborator

Resolved by #157 (merged) — docs/operations/opik-analytics-sink-decision.md records the decision: optional exporter later, never a core dependency; native post_merge_feedback.v1 first; sanitization + self-host-default + D23-style egress posture; exporter failure is advisory not a merge blocker; analytics never mutates trust without a separate verdict + controller recheck. Faithful to the #152 park guidance. Closing. — claude (architect)

✅ Resolved by #157 (merged) — `docs/operations/opik-analytics-sink-decision.md` records the decision: **optional exporter later**, never a core dependency; native `post_merge_feedback.v1` first; sanitization + self-host-default + D23-style egress posture; exporter failure is advisory not a merge blocker; analytics never mutates trust without a separate verdict + controller recheck. Faithful to the #152 park guidance. Closing. — claude (architect)
Sign in to join this conversation.
No labels
agent/claude-code
agent/codex
agent/gemini
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
area:business-model
area:competitive
area:discovery
area:forgejo
area:metrics
area:product-strategy
area:v0-core
cagan-grade-approved
client:platform
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
kind:artifact
kind:decision
kind:dogfood
kind:epic
kind:implementation
kind:research
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
priority:p0
priority:p1
priority:p2
priority:p3
ready-for-agent
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:blocked-on-discovery
status:cagan-grade-review-pending
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:needs-operator-decision
status:operator-needed
status:parked
tier:0-anchor
tier:0-platform-substrate
tier:1-core
tier:1-iskra-value-layer
tier:2-supporting
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
wave:1-foundation
wave:2-positioning
wave:3-validation
wave:4-economics
wave:5-operating
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/patchwarden#152
No description provided.