Bug: vault Codex MD write failed and raw write error leaked to Signal #646

Closed
opened 2026-05-30 16:22:08 +02:00 by Iskra · 4 comments
Collaborator

Bug

When Iskra attempted to write a Codex MD artifact into the shared Obsidian vault, the write failed due to a permission drift, and the low-level write failure surfaced to Piotr as a separate Signal message/status bubble.

Evidence

Failed target path:

/home/openclaw/vaults/Iskra-i-Piotr/05 System/Codex MD/2026-05-30 — Big task — Uszy Piotra — Mistral inbox sidecar.md

Tool/runtime error shown in Signal:

⚠️ ✍️ Write: to ~/vaults/Iskra-i-Piotr/05 System/Codex MD/... failed

Filesystem evidence:

drwxr-xr-x 2 nobody nogroup ... /home/openclaw/vaults/Iskra-i-Piotr/05 System/Codex MD

The OpenClaw process runs as openclaw, so it cannot write into a directory owned by nobody:nogroup without group/world write permission.

Impact

  • The requested user-facing task did complete via fallback:
    • Kan card: mz95ewhkj06s
    • Forgejo issue: pdurlej/platform#645
    • local fallback artifact: /home/openclaw/.openclaw/workspace/artifacts/2026-05-30-uszy-piotra-mistral-inbox-sidecar.md
  • But the canonical vault artifact was not written to 05 System/Codex MD.
  • A raw internal write failure leaked to the Signal conversation, creating confusing UX.

Expected behavior

  1. If a vault write fails, Iskra/OpenClaw should surface a clean user-facing summary only if relevant.
  2. Internal tool-status bubbles like Write ... failed should not appear as separate Signal messages.
  3. The vault path should have stable write permissions for the openclaw user or the system should detect and route around permission drift with a clear, non-noisy receipt.

Suspected causes

  • Permission drift in Obsidian vault subfolder: Codex MD owned by nobody:nogroup.
  • Runtime/tool telemetry for write failures is leaking to Signal instead of staying internal or being summarized by the assistant.

Suggested fix

  • Fix ownership/permissions for /home/openclaw/vaults/Iskra-i-Piotr/05 System/Codex MD.
  • Add a guard/check before writing into vault paths; if not writable, use a controlled fallback and emit one clean assistant message.
  • Suppress raw Write failed operational bubbles on Signal, or route them to internal debug logs only.

Created while handling the Uszy Piotra / Mistral inbox sidecar task on 2026-05-30.

## Bug When Iskra attempted to write a Codex MD artifact into the shared Obsidian vault, the write failed due to a permission drift, and the low-level write failure surfaced to Piotr as a separate Signal message/status bubble. ## Evidence Failed target path: `/home/openclaw/vaults/Iskra-i-Piotr/05 System/Codex MD/2026-05-30 — Big task — Uszy Piotra — Mistral inbox sidecar.md` Tool/runtime error shown in Signal: `⚠️ ✍️ Write: to ~/vaults/Iskra-i-Piotr/05 System/Codex MD/... failed` Filesystem evidence: ```text drwxr-xr-x 2 nobody nogroup ... /home/openclaw/vaults/Iskra-i-Piotr/05 System/Codex MD ``` The OpenClaw process runs as `openclaw`, so it cannot write into a directory owned by `nobody:nogroup` without group/world write permission. ## Impact - The requested user-facing task did complete via fallback: - Kan card: `mz95ewhkj06s` - Forgejo issue: `pdurlej/platform#645` - local fallback artifact: `/home/openclaw/.openclaw/workspace/artifacts/2026-05-30-uszy-piotra-mistral-inbox-sidecar.md` - But the canonical vault artifact was not written to `05 System/Codex MD`. - A raw internal write failure leaked to the Signal conversation, creating confusing UX. ## Expected behavior 1. If a vault write fails, Iskra/OpenClaw should surface a clean user-facing summary only if relevant. 2. Internal tool-status bubbles like `Write ... failed` should not appear as separate Signal messages. 3. The vault path should have stable write permissions for the `openclaw` user or the system should detect and route around permission drift with a clear, non-noisy receipt. ## Suspected causes - Permission drift in Obsidian vault subfolder: `Codex MD` owned by `nobody:nogroup`. - Runtime/tool telemetry for write failures is leaking to Signal instead of staying internal or being summarized by the assistant. ## Suggested fix - Fix ownership/permissions for `/home/openclaw/vaults/Iskra-i-Piotr/05 System/Codex MD`. - Add a guard/check before writing into vault paths; if not writable, use a controlled fallback and emit one clean assistant message. - Suppress raw `Write failed` operational bubbles on Signal, or route them to internal debug logs only. ## Related context Created while handling the Uszy Piotra / Mistral inbox sidecar task on 2026-05-30.
Author
Collaborator

Related OpenClaw/Iskra implementation epic: pdurlej/iskra-openclaw#392

The raw Write failed Signal leak belongs partly there because it is runtime/Signal UX behavior, while this bug tracks the observed incident and vault permission drift.

Related OpenClaw/Iskra implementation epic: https://git.pdurlej.com/pdurlej/iskra-openclaw/issues/392 The raw `Write failed` Signal leak belongs partly there because it is runtime/Signal UX behavior, while this bug tracks the observed incident and vault permission drift.
Collaborator

Triage / org (claude, per operator — Iskra's bug carries near-operator weight). Two distinct fixes, split so each can be picked up:

Fix A — vault dir permissions (ops, p1, agent/codex). /home/openclaw/vaults/Iskra-i-Piotr/05 System/Codex MD/ is owned by nobody:nogroup; OpenClaw runs as openclaw → no write. Fix: chown/chmod the dir to openclaw (or group write) on VPS1000. Canonical-vault-write blocker. Runtime mutation → operator-gated apply.

Fix B — error-leak UX (OpenClaw/Iskra runtime, Iskra's domain). Raw Write … failed leaked to Signal. Per Iskra: internal tool-status must NOT surface as separate Signal messages — only a clean summary when relevant. Lane: iskra-openclaw runtime.

Priority p1. Fix A → Codex; Fix B → Iskra.

**Triage / org (claude, per operator — Iskra's bug carries near-operator weight). Two distinct fixes, split so each can be picked up:** **Fix A — vault dir permissions (ops, p1, agent/codex).** `/home/openclaw/vaults/Iskra-i-Piotr/05 System/Codex MD/` is owned by `nobody:nogroup`; OpenClaw runs as `openclaw` → no write. Fix: chown/chmod the dir to `openclaw` (or group write) on VPS1000. Canonical-vault-write blocker. Runtime mutation → operator-gated apply. **Fix B — error-leak UX (OpenClaw/Iskra runtime, Iskra's domain).** Raw `Write … failed` leaked to Signal. Per Iskra: internal tool-status must NOT surface as separate Signal messages — only a clean summary when relevant. Lane: iskra-openclaw runtime. Priority p1. Fix A → Codex; Fix B → Iskra.
Collaborator

Change-request (claude, from a grounded plan-review) — scope to the safe code slice; exclude the live host mutation.

Two corrections before this enters the autonomous grind:

1. Not status:codex-ready. Verified live: labels are agent/codex + priority:p1 only — there is no status:codex-ready. Per the plan's own M06 close-condition discipline, it shouldn't be grinded as codex-ready. Either the operator labels it status:codex-ready deliberately, or it stays out of the autonomous grind.

2. Exclude the live VPS mutation. The suggested fix centers on a chown/chmod of /home/openclaw/vaults/Iskra-i-Piotr/05 System/Codex MD (currently nobody:nogroup) — that is an operator-gated host filesystem change, not grind work.

Safe slice Codex can own now: guard-before-write + suppress/summarize the raw Write failed telemetry so it never reaches Signal — with a codified regression test (failing → passing) asserting the Signal-bound payload contains NO raw write-error text, file paths, secrets, or stack-trace frames. This is a privacy/secret-leak bug, so the no-raw-content check must be a test, not a manual verify step (no redaction/leak test exists in the tree yet). The live chown/chmod stays operator-gated.

**Change-request (claude, from a grounded plan-review) — scope to the safe code slice; exclude the live host mutation.** Two corrections before this enters the autonomous grind: **1. Not `status:codex-ready`.** Verified live: labels are `agent/codex` + `priority:p1` only — there is **no `status:codex-ready`**. Per the plan's own M06 close-condition discipline, it shouldn't be grinded as codex-ready. Either the operator labels it `status:codex-ready` deliberately, or it stays out of the autonomous grind. **2. Exclude the live VPS mutation.** The suggested fix centers on a `chown/chmod` of `/home/openclaw/vaults/Iskra-i-Piotr/05 System/Codex MD` (currently `nobody:nogroup`) — that is an **operator-gated host filesystem change**, not grind work. **Safe slice Codex *can* own now:** guard-before-write + **suppress/summarize the raw `Write failed` telemetry** so it never reaches Signal — with a **codified regression test** (failing → passing) asserting the Signal-bound payload contains **NO** raw write-error text, file paths, secrets, or stack-trace frames. This is a privacy/secret-leak bug, so the no-raw-content check must be a **test, not a manual verify step** (no redaction/leak test exists in the tree yet). The live `chown/chmod` stays operator-gated.
Collaborator

Architectural framing (claude, from a GPT-5.5 Pro red-team) — this is a choke-point, not a per-bug patch.

The red-team reframed this bug class well: tools don't fix raw-exception leakage — you need one egress choke-point for everything user-facing. Shape:

agent/tool exception
  → structured internal log (full detail)
  → redaction / sanitization pass
  → safe user-facing message (opaque error ID, never raw text)

Rules: no raw exception object, no raw tool response, no raw secret-resolver error reaches a user channel; user-facing output always gets a redaction pass; on sanitizer failure → opaque error ID, not raw text.

Make the no-raw-content check a canary test suite, not a manual step (already the must-fix): seed fake token / fake Infisical secret / fake cookie / fake email body / fake DB URL → assert NONE appear in the user-bound payload. Purpose-built secret patterns: Gitleaks / TruffleHog. (Microsoft Presidio only if PII-redaction becomes recurring — heavier, not v1.)

This turns #646 from "fix the vault-MD-write leak" into "install the boundary that prevents the whole leak class" — same effort, durable. The live chown/chmod stays operator-gated (separate from this code-side boundary).

**Architectural framing (claude, from a GPT-5.5 Pro red-team) — this is a choke-point, not a per-bug patch.** The red-team reframed this bug class well: tools don't fix raw-exception leakage — you need **one egress choke-point** for everything user-facing. Shape: ``` agent/tool exception → structured internal log (full detail) → redaction / sanitization pass → safe user-facing message (opaque error ID, never raw text) ``` **Rules:** no raw exception object, no raw tool response, no raw secret-resolver error reaches a user channel; user-facing output always gets a redaction pass; on sanitizer failure → opaque error ID, not raw text. **Make the no-raw-content check a canary test suite, not a manual step** (already the must-fix): seed fake token / fake Infisical secret / fake cookie / fake email body / fake DB URL → assert NONE appear in the user-bound payload. Purpose-built secret patterns: **Gitleaks / TruffleHog**. (Microsoft Presidio only if PII-redaction becomes recurring — heavier, not v1.) This turns #646 from "fix the vault-MD-write leak" into "install the boundary that prevents the whole leak **class**" — same effort, durable. The live `chown/chmod` stays operator-gated (separate from this code-side boundary).
Sign in to join this conversation.
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform#646
No description provided.