bug(ops): /health endpoint reports ok without verifying upstream usability #59
Labels
No labels
3plus3-followup
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
analytics
api
cockpit
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
docs
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
gemini-flash
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
leviathan
mcp
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
ops
priority:p0
priority:p1
priority:p2
priority:p3
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
safety
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
scout
security
size/large
size/medium
size/small
size/tiny
size/unknown
small-task
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tests
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
ui
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
pdurlej/kan-ductor#59
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Source
3+3 review on PR #41 (
feat: add MCP ops health and smoke modes), merged 2026-05-10 via chain drain #52.Problem
The
/healthendpoint currently returnsokbased on local config/liveness checks (process running, env loaded, DB pool initialized). It does NOT verify:So
/health: okmay be returned when the system cannot actually serve real traffic. This makes the endpoint unreliable for ops alerting.Scope
Two paths:
(a) Honest readiness — verify usability:
SELECT 1)/readyfor full readiness vs/livefor livenessdocs/ops/health.md(b) Honest labeling — keep current behavior, fix naming:
/health→/live(liveness only)/readyfor full readinessAcceptance criteria
okwhen DB/MCP are degradedRefs
Codex verification on current
origin/main/ current working tree after the stabilization train: #59 appears resolved.Evidence:
packages/mcp/src/index.tsseparates/liveand/mcp/livefrom/readyand/mcp/ready./liveis liveness only and reports upstream Kan API as unknown/not probed./readycalls the Kan API health path through the MCP client and returns non-OK when the upstream is unavailable./healthand/mcp/healthare compatibility readiness endpoints; they also probe the Kan API instead of reporting false OK.packages/mcp/src/health.tssanitizes configured URLs and avoids leaking credentials.docs/openclaw-kan-mcp.mdplusdocs/ops/merge-train-smoke.mddescribe the/live,/ready,/healthsplit; PRs #151/#152 further clarify diagnostics and smoke modes.Verification run:
Passed. Recommendation: close #59 as satisfied by current main.
{
"confidence": 5,
"effort_hint": "medium",
"escalation": {
"kind": "none",
"reason": ""
},
"evidence_refs": [
{
"note": "Issue reports the ops health endpoint can return ok without verifying upstream usability.",
"type": "forgejo",
"value": "issue-title-body-labels-and-target-snapshot"
},
{
"note": "Body states current health checks may only prove local liveness, not database reachability, MCP dispatch, or critical upstream access.",
"type": "forgejo",
"value": "issue-body-problem"
},
{
"note": "Scope proposes honest readiness checks or relabeling endpoints as liveness/config-only with documentation.",
"type": "forgejo",
"value": "issue-body-scope"
}
],
"impact": 4,
"judge_actor": {
"name": "iskra",
"runtime": "openclaw"
},
"judged_at": "2026-06-09T01:09:00Z",
"labels_to_apply": [
"judge/p1",
"judge/codex-candidate"
],
"piotr_fit": "high",
"priority": "p1",
"rationale_summary": "This is P1 Codex-ready ops reliability work because misleading health checks can hide real service inability and break alerting trust.",
"reach": 4,
"recommended_next_action": "codex_candidate",
"rerun_reason": "no_prior_judgment",
"schema": "openclaw.judge.v0",
"target": {
"kind": "issue",
"number": 59,
"repo": "pdurlej/kan-ductor"
},
"target_snapshot": {
"body_hash": "sha256:1f098bc63f182fe79362a41df5b42252c24660a2af3acb07634b271e59482463",
"commit_count": null,
"evidence_hash": "sha256:f2372711456965cfaf6eac2f26806aef2f599f59ed8f2c3657abeba31b3db67e",
"head_sha": null,
"labels": [
"3plus3-followup",
"priority:p1"
],
"labels_hash": "sha256:eae246ad0747d73dd2fb96aea169d7d74574e5b2de03312edce7ec9f6d87a8f0",
"state": "open",
"title_hash": "sha256:d545bf15dbc576ecdfd7b3c4b7d6c13934ddb7344734b88de550845ef619cf93",
"updated_at": "2026-06-03T10:37:46+02:00"
},
"top_caveat": "Separate liveness from readiness clearly so cheap health checks do not overclaim real traffic readiness."
}