Tune privacy lint for recurring public-safe judgment blocks #145

New issue

Open

opened 2026-06-26 00:45:11 +02:00 by Iskra · 0 comments

Iskra commented

2026-06-26 00:45:11 +02:00

Collaborator

Evidence

Production audit for 2026-06-12..2026-06-25 found recurring blocked judgments where privacy lint rejected public-facing fields, mainly evidence_refs[].note and top_caveat.

Observed aggregate, without raw model output or private payloads:

8 blocks on $.evidence_refs[2].note appears to contain private or secret material
6 blocks on $.evidence_refs[1].note appears to contain private or secret material
5 blocks on $.top_caveat appears to contain private or secret material
each judgment block also produced a safe apply_skipped trace

Example target refs seen repeatedly:

pdurlej/iskra-openclaw#issue#256
pdurlej/iskra-openclaw#issue#295
pdurlej/platform#issue#645
pdurlej/platform#issue#723
pdurlej/judging-claw#issue#97

Why this is tech debt

The guardrail is doing the right thing by failing closed, but repeated blocks against public issue metadata make Judging Claw less useful and can leave the same targets retried nightly.

Acceptance

Keep fail-closed behavior for raw memory, raw logs, token values, private chat excerpts, and secret-looking values.
Add focused tests using sanitized public issue-like text that currently trips false positives.
Improve privacy lint or request rendering so safe public metadata can be judged without lowering the privacy bar.
Blocked outcomes must still leave safe artifact and morning-return traces.

## Evidence Production audit for 2026-06-12..2026-06-25 found recurring blocked judgments where privacy lint rejected public-facing fields, mainly evidence_refs[].note and top_caveat. Observed aggregate, without raw model output or private payloads: - 8 blocks on $.evidence_refs[2].note appears to contain private or secret material - 6 blocks on $.evidence_refs[1].note appears to contain private or secret material - 5 blocks on $.top_caveat appears to contain private or secret material - each judgment block also produced a safe apply_skipped trace Example target refs seen repeatedly: - pdurlej/iskra-openclaw#issue#256 - pdurlej/iskra-openclaw#issue#295 - pdurlej/platform#issue#645 - pdurlej/platform#issue#723 - pdurlej/judging-claw#issue#97 ## Why this is tech debt The guardrail is doing the right thing by failing closed, but repeated blocks against public issue metadata make Judging Claw less useful and can leave the same targets retried nightly. ## Acceptance - Keep fail-closed behavior for raw memory, raw logs, token values, private chat excerpts, and secret-looking values. - Add focused tests using sanitized public issue-like text that currently trips false positives. - Improve privacy lint or request rendering so safe public metadata can be judged without lowering the privacy bar. - Blocked outcomes must still leave safe artifact and morning-return traces.