Tune privacy lint for recurring public-safe judgment blocks #145

Open
opened 2026-06-26 00:45:11 +02:00 by Iskra · 0 comments
Collaborator

Evidence

Production audit for 2026-06-12..2026-06-25 found recurring blocked judgments where privacy lint rejected public-facing fields, mainly evidence_refs[].note and top_caveat.

Observed aggregate, without raw model output or private payloads:

  • 8 blocks on $.evidence_refs[2].note appears to contain private or secret material
  • 6 blocks on $.evidence_refs[1].note appears to contain private or secret material
  • 5 blocks on $.top_caveat appears to contain private or secret material
  • each judgment block also produced a safe apply_skipped trace

Example target refs seen repeatedly:

  • pdurlej/iskra-openclaw#issue#256
  • pdurlej/iskra-openclaw#issue#295
  • pdurlej/platform#issue#645
  • pdurlej/platform#issue#723
  • pdurlej/judging-claw#issue#97

Why this is tech debt

The guardrail is doing the right thing by failing closed, but repeated blocks against public issue metadata make Judging Claw less useful and can leave the same targets retried nightly.

Acceptance

  • Keep fail-closed behavior for raw memory, raw logs, token values, private chat excerpts, and secret-looking values.
  • Add focused tests using sanitized public issue-like text that currently trips false positives.
  • Improve privacy lint or request rendering so safe public metadata can be judged without lowering the privacy bar.
  • Blocked outcomes must still leave safe artifact and morning-return traces.
## Evidence Production audit for 2026-06-12..2026-06-25 found recurring blocked judgments where privacy lint rejected public-facing fields, mainly evidence_refs[].note and top_caveat. Observed aggregate, without raw model output or private payloads: - 8 blocks on $.evidence_refs[2].note appears to contain private or secret material - 6 blocks on $.evidence_refs[1].note appears to contain private or secret material - 5 blocks on $.top_caveat appears to contain private or secret material - each judgment block also produced a safe apply_skipped trace Example target refs seen repeatedly: - pdurlej/iskra-openclaw#issue#256 - pdurlej/iskra-openclaw#issue#295 - pdurlej/platform#issue#645 - pdurlej/platform#issue#723 - pdurlej/judging-claw#issue#97 ## Why this is tech debt The guardrail is doing the right thing by failing closed, but repeated blocks against public issue metadata make Judging Claw less useful and can leave the same targets retried nightly. ## Acceptance - Keep fail-closed behavior for raw memory, raw logs, token values, private chat excerpts, and secret-looking values. - Add focused tests using sanitized public issue-like text that currently trips false positives. - Improve privacy lint or request rendering so safe public metadata can be judged without lowering the privacy bar. - Blocked outcomes must still leave safe artifact and morning-return traces.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/judging-claw#145
No description provided.