pdurlej/platform

Fork 0

product: AI Slot Machine — local static linter / integrity companion for agent answers #55

New issue

Closed

opened 2026-05-04 13:32:49 +02:00 by Iskra · 1 comment

Iskra commented

2026-05-04 13:32:49 +02:00

Collaborator

Context

Moved from pdurlej/iskra-openclaw#31 because this is a platform/operator microproduct, not primarily an Iskra-specific feature.

Piotr developed the "AI Slot Machine / LLM Slot Machine" idea in Signal voice notes on 2026-05-04.

Initial shape:

a small OSS git repo / microproduct;
MCP, sidecar, plugin, or companion service;
not a "truth machine", but a risk/integrity indicator;
green/yellow/red/slot-machine style badge for AI answers.

Follow-up refinement:

Potraktować AI Slot Machine jako statyczny gate / coś jak statyczna analiza kodu. Automatycznie z boku MCP mówi, czy odpowiedź AI-a, biorąc pod uwagę model i kontekst, wygląda dobrze. Skupiamy się na modelach kodeksowych, ChatGPT i modelach Claude'owych. To ma być deterministyczne, niczym code review/lintery: trzyma poziom, daje feedback automatycznie, może samo napisać wiadomość: "hej, nie spełniasz tego, tego i tego, proszę popraw". Dajemy automatycznego asystenta dla vibe-coderów, który guardrailuje współpracę. Ponieważ jest statyczny i niczego nigdzie nie wysyła, działa prywatnie.

Core idea

Build a lightweight, local-first AI answer integrity linter.

It should inspect an agent/LLM answer and produce a deterministic risk/integrity verdict, similar to how linters and static analyzers inspect code.

The point is not to prove factual truth. The point is to flag answer-production smells before the user trusts the output.

Positioning

Not:

hallucination oracle;
factual truth machine;
another heavyweight eval framework;
cloud compliance dashboard.

Instead:

local/static answer linter;
operator companion;
"slot machine" risk badge;
guardrail for vibe-coding / agentic coding workflows;
private-by-default, no external calls required in baseline mode.

Deterministic checks / heuristics

Candidate static checks:

unsupported strong factual claims;
claims about files/system state without tool evidence;
claims about past actions without receipts/logs;
"confirmed/done/working" language without evidence;
contradiction between tool errors and final answer;
missing source tags for important state/history claims;
overconfident language under uncertainty;
no tests/lint/build despite code-change claims;
privacy-sensitive claim without explicit source;
stale model/context warnings;
mismatch between requested output format and answer;
apology/hedging loops that hide the actual blocker;
excessive smoothness / sycophancy markers if configured.

Output shape

Simple badge + actionable feedback:

🟢 low risk / grounded enough;
🟡 some uncertainty or missing evidence;
🔴 high risk / unsupported claims or tool contradiction;
🎰 slot-machine mode: answer sounds fluent but has poor grounding.

Example:

🎰 SLOT MACHINE
Reasons:
- says "confirmed" but no command/log/source supports it
- references a file path that returned ENOENT
- no verification step after code change
Suggested fix:
- rerun path check
- include exact evidence or mark as hypothesis

Integration targets

Possible MVP targets:

CLI:
- aislot check answer.md --context trace.json
MCP server:
- expose check_answer, check_trace, explain_verdict;
OpenClaw plugin/sidecar:
- inspect agent final replies before delivery;
GitHub/Forgejo bot:
- comment on PR/issue agent outputs;
Codex/Claude Code companion:
- guardrail coding-agent responses.

Privacy model

Baseline should be fully local/static:

no external API calls;
no telemetry;
no uploading prompts/answers;
optional config file for local rules;
optional external verifier mode only when explicitly enabled.

This privacy angle is important: a small tool can be trusted if it does not need to send answer/context anywhere.

Prior art / nearby OSS

Quick search found related but not identical tools:

cisco-open/polygraphLLM — hallucination/factuality library;
KRLabsOrg/LettuceDetect — RAG hallucination detection;
exa-labs/exa-hallucination-detector — claim extraction + web evidence;
HalluciGuard — claim extraction + trust score;
VATBox/llm-confidence — confidence from logprobs.

These are closer to detectors/eval frameworks. The proposed niche is a lighter operator-facing static linter/badge for agent answers.

MVP scope

A tiny first version could be:

input: answer text + optional tool trace JSON;
deterministic rules engine;
verdict: green/yellow/red/slot;
reasons + suggested fix;
no network;
configurable severity thresholds;
examples from Iskra/OpenClaw incidents.

Acceptance criteria

Repo skeleton exists with README explaining the concept.
CLI can evaluate a saved answer against static rules.
At least 10 deterministic rules implemented.
Output includes verdict, reasons, and suggested fix.
Baseline mode performs no network calls.
Example fixtures include at least one green, yellow, red, and slot-machine answer.
Future MCP/server integration path is documented.

Why this matters

This directly addresses the recurring trust problem in agent systems: fluent answers can hide weak grounding.

A cheap deterministic linter will not solve truth, but it can catch the obvious operational failures before they reach a human as polished bullshit.

That is very on-brand for OpenClaw/Iskra: evidence discipline over vibes.

#27 — OpenClaw upgrade blocked by missing auto-repair/self-healing guardrails
#29 — WordPress/blog integration
#30 — Rybbit analytics visible to Iskra

## Context Moved from `pdurlej/iskra-openclaw#31` because this is a platform/operator microproduct, not primarily an Iskra-specific feature. Piotr developed the "AI Slot Machine / LLM Slot Machine" idea in Signal voice notes on 2026-05-04. Initial shape: - a small OSS git repo / microproduct; - MCP, sidecar, plugin, or companion service; - not a "truth machine", but a risk/integrity indicator; - green/yellow/red/slot-machine style badge for AI answers. Follow-up refinement: > Potraktować AI Slot Machine jako statyczny gate / coś jak statyczna analiza kodu. Automatycznie z boku MCP mówi, czy odpowiedź AI-a, biorąc pod uwagę model i kontekst, wygląda dobrze. Skupiamy się na modelach kodeksowych, ChatGPT i modelach Claude'owych. To ma być deterministyczne, niczym code review/lintery: trzyma poziom, daje feedback automatycznie, może samo napisać wiadomość: "hej, nie spełniasz tego, tego i tego, proszę popraw". Dajemy automatycznego asystenta dla vibe-coderów, który guardrailuje współpracę. Ponieważ jest statyczny i niczego nigdzie nie wysyła, działa prywatnie. ## Core idea Build a lightweight, local-first **AI answer integrity linter**. It should inspect an agent/LLM answer and produce a deterministic risk/integrity verdict, similar to how linters and static analyzers inspect code. The point is not to prove factual truth. The point is to flag answer-production smells before the user trusts the output. ## Positioning Not: - hallucination oracle; - factual truth machine; - another heavyweight eval framework; - cloud compliance dashboard. Instead: - local/static answer linter; - operator companion; - "slot machine" risk badge; - guardrail for vibe-coding / agentic coding workflows; - private-by-default, no external calls required in baseline mode. ## Suggested product names Working names: - AI Slot Machine - LLM Slot Machine - Answer Integrity Companion - SlotCheck - Truthiness Companion - Jednoręki Bandyta AI ## Deterministic checks / heuristics Candidate static checks: - unsupported strong factual claims; - claims about files/system state without tool evidence; - claims about past actions without receipts/logs; - "confirmed/done/working" language without evidence; - contradiction between tool errors and final answer; - missing source tags for important state/history claims; - overconfident language under uncertainty; - no tests/lint/build despite code-change claims; - privacy-sensitive claim without explicit source; - stale model/context warnings; - mismatch between requested output format and answer; - apology/hedging loops that hide the actual blocker; - excessive smoothness / sycophancy markers if configured. ## Output shape Simple badge + actionable feedback: - 🟢 low risk / grounded enough; - 🟡 some uncertainty or missing evidence; - 🔴 high risk / unsupported claims or tool contradiction; - 🎰 slot-machine mode: answer sounds fluent but has poor grounding. Example: ```text 🎰 SLOT MACHINE Reasons: - says "confirmed" but no command/log/source supports it - references a file path that returned ENOENT - no verification step after code change Suggested fix: - rerun path check - include exact evidence or mark as hypothesis ``` ## Integration targets Possible MVP targets: 1. CLI: - `aislot check answer.md --context trace.json` 2. MCP server: - expose `check_answer`, `check_trace`, `explain_verdict`; 3. OpenClaw plugin/sidecar: - inspect agent final replies before delivery; 4. GitHub/Forgejo bot: - comment on PR/issue agent outputs; 5. Codex/Claude Code companion: - guardrail coding-agent responses. ## Privacy model Baseline should be fully local/static: - no external API calls; - no telemetry; - no uploading prompts/answers; - optional config file for local rules; - optional external verifier mode only when explicitly enabled. This privacy angle is important: a small tool can be trusted if it does not need to send answer/context anywhere. ## Prior art / nearby OSS Quick search found related but not identical tools: - `cisco-open/polygraphLLM` — hallucination/factuality library; - `KRLabsOrg/LettuceDetect` — RAG hallucination detection; - `exa-labs/exa-hallucination-detector` — claim extraction + web evidence; - `HalluciGuard` — claim extraction + trust score; - `VATBox/llm-confidence` — confidence from logprobs. These are closer to detectors/eval frameworks. The proposed niche is a lighter operator-facing static linter/badge for agent answers. ## MVP scope A tiny first version could be: - input: answer text + optional tool trace JSON; - deterministic rules engine; - verdict: green/yellow/red/slot; - reasons + suggested fix; - no network; - configurable severity thresholds; - examples from Iskra/OpenClaw incidents. ## Acceptance criteria - Repo skeleton exists with README explaining the concept. - CLI can evaluate a saved answer against static rules. - At least 10 deterministic rules implemented. - Output includes verdict, reasons, and suggested fix. - Baseline mode performs no network calls. - Example fixtures include at least one green, yellow, red, and slot-machine answer. - Future MCP/server integration path is documented. ## Why this matters This directly addresses the recurring trust problem in agent systems: fluent answers can hide weak grounding. A cheap deterministic linter will not solve truth, but it can catch the obvious operational failures before they reach a human as polished bullshit. That is very on-brand for OpenClaw/Iskra: evidence discipline over vibes. ## Related - #27 — OpenClaw upgrade blocked by missing auto-repair/self-healing guardrails - #29 — WordPress/blog integration - #30 — Rybbit analytics visible to Iskra

claude commented

2026-05-06 00:24:18 +02:00

Collaborator

Operator decision: SPIN OFF (2026-05-06)

Operator-confirmed via chat 2026-05-06: AI Slot Machine moves to a separate repo.

Rationale

This is an OSS microproduct, not platform-internal infrastructure. Mixing it with pdurlej/platform would:

Bloat platform repo with unrelated product code
Mix licensing concerns (OSS-public vs platform-private)
Create confusion for future contributors
Violate platform↔microproject contract (Issue #74) — different scope, different audience

Action

Closing this issue from pdurlej/platform.

Operator-side TODO (not for agent automation):

Create new repo pdurlej/aislot (or chosen name) on Forgejo
Initial commit: README with the concept (copy from this issue body) + LICENSE choice (MIT or Apache 2.0 — operator picks)
Apply governance pattern from #75 (mandatory non-author reviewer + branch protection) to the new repo
Open the same issue in the new repo if active development starts; reference this closed one as origin

What stays in platform

Nothing from #55 carries forward to platform repo. The aislot design itself (deterministic linter for AI answers, slot-machine risk badge, local-first privacy) is operator's IP and lives in the spin-off repo.

## Operator decision: SPIN OFF (2026-05-06) Operator-confirmed via chat 2026-05-06: AI Slot Machine moves to a separate repo. ### Rationale This is an OSS microproduct, not platform-internal infrastructure. Mixing it with `pdurlej/platform` would: - Bloat platform repo with unrelated product code - Mix licensing concerns (OSS-public vs platform-private) - Create confusion for future contributors - Violate platform↔microproject contract (Issue #74) — different scope, different audience ### Action Closing this issue from `pdurlej/platform`. Operator-side TODO (not for agent automation): - [ ] Create new repo `pdurlej/aislot` (or chosen name) on Forgejo - [ ] Initial commit: README with the concept (copy from this issue body) + LICENSE choice (MIT or Apache 2.0 — operator picks) - [ ] Apply governance pattern from #75 (mandatory non-author reviewer + branch protection) to the new repo - [ ] Open the same issue in the new repo if active development starts; reference this closed one as origin ### What stays in platform Nothing from #55 carries forward to platform repo. The `aislot` design itself (deterministic linter for AI answers, slot-machine risk badge, local-first privacy) is operator's IP and lives in the spin-off repo.

claude closed this issue

2026-05-06 00:24:18 +02:00