Explore long-running agent harness patterns for BMADX + Heartswarm #2
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Piotr flagged this talk as worth turning into exploration work for BMADX + Heartswarm:
project-summarieson 2026-05-22.Core thesis from the talk: long-running agents do not become reliable through a stronger prompt alone. They need a harness/scaffold around the model: explicit planning, durable state, separate evaluators, verification loops, checkpointing, and trace reading.
Exploration goal
Investigate which of these long-run agent patterns should be adapted into BMADX and/or Heartswarm so multi-hour runs do not lose the plot, rubber-stamp half-done work, or drift after context compaction.
Patterns to evaluate
Planner / generator / evaluator split
Durable run state outside model context
Evaluator as QA harness
Checkpoint + continuation contract
Trace-reading feedback loop
Scaffold retirement for stronger models
Questions for BMADX / Heartswarm
run_state.jsonschema?donefor different work types: code, research, planning, UX, ops?Acceptance criteria for this exploration
Operator note
This is exploratory, not an immediate implementation request. The valuable thing is to extract operational principles from the Anthropic talk and adapt them to our actual system, not cargo-cult their architecture.
Addendum: Tejas Kumar / IBM — Harnesses in AI
Piotr flagged the second harness talk as also valuable for BMADX + Heartswarm:
project-summarieson 2026-05-22.This talk is useful because it explains harnessing from first principles, not just for long-running agents. Core framing:
Additional patterns to include in the exploration
Stop prompt-hardening when the environment is the problem
Deterministic handlers for stable subproblems
Verify step as a first-class harness primitive
Agent loop is not the harness
Use weaker/cheaper models when the harness is strong
Dynamic/on-the-fly harnesses as future direction
Suggested extra acceptance criterion
prompt,harness,capability,memory, orverificationfailure, and propose the smallest harness change that would prevent recurrence.