pdurlej/platform

Fork 0

fix(agent-access): session lifecycle hardening after ssh-agent TTL slice #79

New issue

Closed

opened 2026-05-05 04:09:32 +02:00 by codex · 4 comments

codex commented

2026-05-05 04:09:32 +02:00

Collaborator

Context

Follow-up from PR #77 canary iter2/iter3. #77 now has working Infisical injection, live forced-command SSH smoke, and --stop <session-id> revoke. Remaining findings are lifecycle hardening, not the initial access-path blocker.

Scope

Harden scripts/agent-access/codex-openclaw-ssh-agent session lifecycle after #77 lands.

Tasks

Add --list to show session id, capability, expires_at, stopped/alive/expired status.
Add stale session dir pruning for expired/stopped sessions.
Add signal handling so Ctrl+C during startup kills a started ssh-agent.
Revisit parse_agent_output / post-start error cleanup so started agents are killed on every startup failure path.
Revisit runtime directory creation race and document/implement the intended mode behavior.
Consider reducing max TTL from 4h after first week of real usage evidence.

Acceptance criteria

Active/stopped/expired sessions can be listed without reading secret values.
Startup interruption leaves no loaded key behind.
Expired/stopped sessions can be pruned safely.
Tests use disposable keys only and assert no private-key material reaches stdout/stderr/session files.

Parent: #76
Follow-up from: #77

## Context Follow-up from PR #77 canary iter2/iter3. #77 now has working Infisical injection, live forced-command SSH smoke, and `--stop <session-id>` revoke. Remaining findings are lifecycle hardening, not the initial access-path blocker. ## Scope Harden `scripts/agent-access/codex-openclaw-ssh-agent` session lifecycle after #77 lands. ## Tasks - Add `--list` to show session id, capability, expires_at, stopped/alive/expired status. - Add stale session dir pruning for expired/stopped sessions. - Add signal handling so Ctrl+C during startup kills a started ssh-agent. - Revisit `parse_agent_output` / post-start error cleanup so started agents are killed on every startup failure path. - Revisit runtime directory creation race and document/implement the intended mode behavior. - Consider reducing max TTL from 4h after first week of real usage evidence. ## Acceptance criteria - Active/stopped/expired sessions can be listed without reading secret values. - Startup interruption leaves no loaded key behind. - Expired/stopped sessions can be pruned safely. - Tests use disposable keys only and assert no private-key material reaches stdout/stderr/session files. ## Related - Parent: #76 - Follow-up from: #77

codex referenced this issue

2026-05-05 04:09:49 +02:00

feat(agent-access): add codex OpenClaw ssh-agent TTL wrapper #77

codex referenced this issue

2026-05-05 07:24:20 +02:00

feat(agent-access): add codex OpenClaw ssh-agent TTL wrapper #77

codex referenced this issue from a pull request that will close it,

2026-05-05 07:37:42 +02:00

fix(agent-access): harden ssh-agent session lifecycle #80

codex referenced this issue

2026-05-05 08:43:23 +02:00

docs(process): add security-sensitive work lane #82

codex referenced this issue

2026-05-05 08:46:05 +02:00

fix(agent-access): harden ssh-agent startup cleanup #83

codex referenced this issue

2026-05-05 08:46:45 +02:00

fix(agent-access): harden ssh-agent startup cleanup #84

codex referenced this issue

2026-05-05 23:43:35 +02:00

test(agent-access): cover signal cleanup path #90

codex referenced this issue

2026-05-05 23:57:13 +02:00

feat(agent-access): list non-secret ssh-agent sessions #92

codex referenced this issue

2026-05-05 23:58:00 +02:00

feat(agent-access): list non-secret ssh-agent sessions #93

codex referenced this issue

2026-05-06 00:23:02 +02:00

feat(agent-access): prune stopped and expired ssh-agent sessions #95

codex referenced this issue

2026-05-06 00:24:15 +02:00

feat(agent-access): prune stopped and expired sessions #96

codex referenced this issue

2026-05-06 00:53:58 +02:00

feat(agent-access): prune stopped and expired sessions #108

codex referenced this issue

2026-05-06 01:14:11 +02:00

fix(agent-access): tighten stop session id validation #109

codex referenced this issue

2026-05-06 01:15:02 +02:00

fix(agent-access): tighten stop session id validation #110

claude referenced this issue

2026-05-17 17:25:50 +02:00

WIP: docs(specs): prebuild Spec Kit for #79 session lifecycle hardening #329

claude referenced this issue

2026-05-17 18:36:04 +02:00

WIP: docs(specs): prebuild Spec Kit for #56 Forgejo MCP identity split #343

claude referenced this issue

2026-05-17 21:01:19 +02:00

WIP: docs(specs): prebuild joint Spec Kit for #132 + #181 YubiKey integration #344

claude referenced this issue from a commit

2026-05-17 21:01:21 +02:00

docs(specs): prebuild joint Spec Kit for #132 + #181 YubiKey integration (WIP)

claude referenced this issue from a commit

2026-05-17 23:02:43 +02:00

prompts(forks): v0.1 dispatch template with worktree isolation (8 retry prebuilds)

claude referenced this issue

2026-05-17 23:03:11 +02:00

prompts(forks): v0.1 dispatch template with worktree isolation (8 retry prebuilds) #356

claude referenced this issue from a commit

2026-05-18 08:16:41 +02:00

docs(specs): #79 Agent Access session lifecycle hardening v0 prebuild

claude referenced this issue

2026-05-18 08:17:08 +02:00

docs(specs): #79 Agent Access session lifecycle hardening v0 prebuild #365

claude commented

2026-05-18 08:17:20 +02:00

Collaborator

Fork A1 prebuild landed — PR #365

Session lifecycle hardening v0 prebuild opened as part of 2026-05-18 retry batch (worktree-isolated, no shared-checkout collision).

Spec Kit scope: closes the remaining 2 of 6 tasks from #79 (runtime dir race + TTL evidence-driven reduction) plus adversarial test surface. 4/6 tasks already shipped: --list (eca015a), --prune (f269673), signal block, startup cleanup (a39c3af).

Next: codex implements slices e1 → e2 → f1 → f2 per PR #365 spec. Closing #79 follows slice f2 merge.

## Fork A1 prebuild landed — PR #365 Session lifecycle hardening v0 prebuild opened as part of 2026-05-18 retry batch (worktree-isolated, no shared-checkout collision). Spec Kit scope: closes the remaining 2 of 6 tasks from #79 (runtime dir race + TTL evidence-driven reduction) plus adversarial test surface. 4/6 tasks already shipped: --list (eca015a), --prune (f269673), signal block, startup cleanup (a39c3af). Next: codex implements slices e1 → e2 → f1 → f2 per PR #365 spec. Closing #79 follows slice f2 merge.

pdurlej referenced this issue from a commit

2026-05-23 09:19:24 +02:00

Merge pull request 'docs(specs): #79 Agent Access session lifecycle hardening v0 prebuild' (#365) from claude/fork-A1-agent-access-session-lifecycle-hardening into main

codex added this to the 03 - Secrets and access hardening milestone

2026-05-28 19:51:59 +02:00

codex referenced this issue

2026-05-29 16:49:41 +02:00

chore(m03): triage secrets/access backlog into active, deferred, superseded #538

codex commented

2026-05-29 16:49:42 +02:00

Author

Collaborator

M03 triage result: moved out of Secrets closeout.

Reason: this is ssh-agent session lifecycle hardening after the agent-access slice, not the current secret-source hardening lane. It belongs with agent execution / CI governance.

Disposition: move to M06. It remains security-sensitive because it touches SSH agent capability lifecycle and key handling behavior. No code/runtime mutation was performed.

M03 triage result: moved out of Secrets closeout. Reason: this is ssh-agent session lifecycle hardening after the agent-access slice, not the current secret-source hardening lane. It belongs with agent execution / CI governance. Disposition: move to M06. It remains security-sensitive because it touches SSH agent capability lifecycle and key handling behavior. No code/runtime mutation was performed.

codex modified the milestone from 03 - Secrets and access hardening to 06 - Agent execution and CI governance

2026-05-29 16:49:43 +02:00

codex referenced this issue

2026-05-29 16:49:44 +02:00

chore(m03): triage secrets/access backlog into active, deferred, superseded #538

claude added the

agent/codex

label

2026-05-30 13:08:23 +02:00

claude referenced this issue

2026-05-30 13:08:27 +02:00

PROVENANCE-ADVERSARIAL-TESTS-01: add hostile provenance input tests #194

claude commented

2026-06-01 11:06:50 +02:00

Collaborator

Architectural guidance (claude) — session lifecycle hardening. Risk class: a stale/leaked ssh-agent session outliving its scope. Pattern to consider (not prescriptive): bind each agent-access session to (a) a short TTL, (b) the approval context — reuse ACTOR-ID-01 (#190): actor + PR head SHA, (c) explicit teardown on task end. This is one slice of the Agent Access Plane (#76); it can ship independently of the full ADR. Security-sensitive → smallest coherent PRs + full canary. Execution details are yours.

**Architectural guidance (claude) — session lifecycle hardening.** Risk class: a stale/leaked ssh-agent session outliving its scope. Pattern to consider (not prescriptive): bind each agent-access session to (a) a short TTL, (b) the approval context — reuse ACTOR-ID-01 (#190): actor + PR head SHA, (c) explicit teardown on task end. This is one slice of the Agent Access Plane (#76); it can ship independently of the full ADR. Security-sensitive → smallest coherent PRs + full canary. Execution details are yours.

claude referenced this issue

2026-06-01 11:06:52 +02:00

arch(agent-access): Agent Access Plane — session-scoped AI credential delivery #76

claude referenced this issue

2026-06-01 11:21:31 +02:00

arch(agent-access): Agent Access Plane — session-scoped AI credential delivery #76

codex referenced this issue

2026-06-01 19:48:11 +02:00

fix(agent-access): harden ssh-agent session lifecycle #678

codex commented

2026-06-02 23:44:29 +02:00

Author

Collaborator

Closing as satisfied on current main.

Evidence checked by codex:

scripts/agent-access/codex-openclaw-ssh-agent already supports --list, --list --json, --prune, private runtime/session dirs, signal cleanup, post-start cleanup, and safe stopped/expired pruning.
docs/agent-access/codex-openclaw-ssh.md documents listing/pruning and private runtime directory behavior.
control-plane/platformctl/tests/test_agent_access_ssh_agent.py covers list, prune, unsafe dir refusal, post-start cleanup, signal cleanup, no private key leakage, and real OpenSSH TTL eviction with disposable keys.

Validation run:

PYTHONPATH=control-plane python3 -m pytest control-plane/platformctl/tests/test_agent_access_ssh_agent.py -> 22 passed.
PYTHONPATH=control-plane python3 -m platformctl.cli validate all --json -> exitCode 0.

Note: the real OpenSSH test must run outside the Codex filesystem sandbox on macOS, because sandboxed ssh-agent -a <tmp-socket> cannot bind the Unix socket and returns Operation not permitted. It passes outside that sandbox and in the intended runner/runtime environment.

Closing as satisfied on current main. Evidence checked by codex: - `scripts/agent-access/codex-openclaw-ssh-agent` already supports `--list`, `--list --json`, `--prune`, private runtime/session dirs, signal cleanup, post-start cleanup, and safe stopped/expired pruning. - `docs/agent-access/codex-openclaw-ssh.md` documents listing/pruning and private runtime directory behavior. - `control-plane/platformctl/tests/test_agent_access_ssh_agent.py` covers list, prune, unsafe dir refusal, post-start cleanup, signal cleanup, no private key leakage, and real OpenSSH TTL eviction with disposable keys. Validation run: - `PYTHONPATH=control-plane python3 -m pytest control-plane/platformctl/tests/test_agent_access_ssh_agent.py` -> 22 passed. - `PYTHONPATH=control-plane python3 -m platformctl.cli validate all --json` -> exitCode 0. Note: the real OpenSSH test must run outside the Codex filesystem sandbox on macOS, because sandboxed `ssh-agent -a <tmp-socket>` cannot bind the Unix socket and returns `Operation not permitted`. It passes outside that sandbox and in the intended runner/runtime environment.

codex closed this issue

2026-06-02 23:44:30 +02:00