pdurlej/platform

Fork 0

feat(agent-access): add codex OpenClaw ssh-agent TTL wrapper #77

Merged

pdurlej merged 4 commits from codex/issues/73-agent-access-ssh-ttl into main

2026-05-05 07:22:48 +02:00

codex commented

2026-05-05 03:18:25 +02:00

Collaborator

Canary status: defer_to_issue — accepted by operator and merged; session lifecycle hardening tracked in #79

Canary Context Pack

Product story

The operator needs to delegate real OpenClaw/VPS1000 checks to Codex without pasting SSH keys into chat or leaving private keys on disk. This is the first narrow child of #76 Agent Access Plane: agents receive a bounded SSH capability, not a standing raw credential.

What changed

Added scripts/agent-access/codex-openclaw-ssh-agent, a Python wrapper that starts a dedicated ssh-agent, loads an Infisical-injected key via ssh-add -t, and writes only non-secret session state under ~/.platformctl-runtime/agent-access/sessions/<session-id>/.
Added docs/agent-access/codex-openclaw-ssh.md with operator usage, stop conditions, and runtime layout.
Extended state/runtime-layout.md with the new non-secret agent-access/ session directory.
Added pytest coverage for secret-safety, default 1h TTL, max TTL refusal, private runtime permissions, sanitized child env, real OpenSSH TTL eviction with a disposable key, and --stop revoke without secret env.
Iter2 fixes: ssh_hint now forces IdentityAgent="$SSH_AUTH_SOCK", IdentityFile=none, and IdentitiesOnly=no; wrapper prints revoke_hint; --stop <session-id> kills the session agent and marks metadata; unexpected exceptions now kill started agents; empty ssh-add -l output becomes AgentAccessError.

Why it changed

#73 identified Codex SSH key delivery as the immediate friction point. Oracle review and product framing confirmed the larger direction (#76), but the safe first slice is still one capability: openclaw.ssh.codex-local via TTL ssh-agent.

Files touched

scripts/agent-access/codex-openclaw-ssh-agent
docs/agent-access/codex-openclaw-ssh.md
control-plane/platformctl/tests/test_agent_access_ssh_agent.py
state/runtime-layout.md

Relevant context

#73: Codex SSH key delivery via ssh-agent with TTL
#76: Agent Access Plane parent issue
#64: Vault -> Infisical migration, dependency but not parent
#56: per-agent Forgejo/MCP identity split, adjacent child
#72: later safe CI/canary Infisical integration; intentionally not touched here
pdurlej/iskra-openclaw#43: broader SecretRef/Agent Vault architecture to migrate/split later
pdurlej/iskra-openclaw#44 and PR #45: existing OpenClaw Path A wrapper and public-key side

Runtime evidence

PYTHONPATH=control-plane pytest -q control-plane/platformctl/tests/test_agent_access_ssh_agent.py => 7 passed.
PYTHONPATH=control-plane pytest -q control-plane/platformctl/tests => 154 passed.
git diff --check HEAD~1 HEAD => pass.
Identity check: codex Forgejo+git doctor pass with --skip infisical --skip review --skip mcp; full doctor still fails on missing local Infisical token-cache, so this PR does not claim full identity-doctor coverage.
Live Infisical validation: /home-platform/agent_access/ssh/openclaw/codex-local in prod injects OPENCLAW_SSH_PRIVATE_KEY; wrapper prints PASS capability=openclaw.ssh.codex-local and loads fingerprint SHA256:unGNI4x7pKVTlxP4Cnrs8RhYmlIvRBcGFqo4qByelFA with ttl_seconds=3600.
Live SSH transport smoke: after sourcing agent.env, ssh-add -l -E sha256 lists the same fingerprint; ssh -o BatchMode=yes -o ForwardAgent=no -o IdentityAgent="$SSH_AUTH_SOCK" -o IdentityFile=none -o IdentitiesOnly=no openclaw@vps1000 'ls /home/openclaw/.local/bin/' succeeds via the forced-command wrapper.
Live OpenClaw canary execution proof: the same access path runs iskra-canary --json --timeout-seconds 30; it returns status=critical from Iskra runtime health, which is not an SSH/access failure.
Live revoke smoke: wrapper prints revoke_hint=scripts/agent-access/codex-openclaw-ssh-agent --stop <session-id>; running it returns PASS stopped_session=<session-id> and subsequent ssh-add -l against that socket fails with Error connecting to agent: No such file or directory.

Known constraints

The wrapper expects the private key to arrive through a specific environment variable injected into this wrapper process, for example via infisical run. It does not fetch secrets itself and does not print them.
Child ssh-agent/ssh-add processes receive a sanitized environment; the private-key env var is popped before child execution.
This PR does not modify vps1000, OpenClaw wrapper allowlists, Forgejo PATs, MCP identity, CI, or #72.
The agent.env file is non-secret but access-bearing while the agent TTL is alive; runtime dir permissions, IdentityFile=none, and TTL bound the exposure.

Explicit out-of-scope

Agent Access Plane ADR/design doc (#76 follow-up)
Generic capability catalog/resolver
Per-agent Forgejo/MCP identity split (#56)
Forgejo Actions/canary Infisical integration (#72 follow-up)
Any infisical run -- bash or arbitrary command execution with secrets
Writing private keys to $HOME/.ssh, /tmp, dotenv files, repo files, logs, or PR text

Requested decision

Approve this first implementation slice for #73 now that Infisical path validation, live forced-command SSH smoke, explicit identity isolation, and revoke smoke pass.

Merge blockers

Any path where the private key can land in repo, logs, argv, sourced dotenv, $HOME/.ssh, or /tmp.
Any inheritance of the private-key env var into ssh-agent or ssh-add child processes.
Any reviewer requirement that #76 ADR must land before this tactical #73 slice.
Live access path must remain green: Infisical injection + ssh-agent load + forced-command SSH smoke + --stop revoke.

Spec sources read

PLATFORM_CHARTER.md §runtime state and service identities — runtime/secret boundaries.
state/runtime-layout.md — canonical machine-local runtime layout.
AGENTS.md — identity, canary, and PR body conventions.
control-plane/platformctl/identity/codex.py — codex askpass/worktree identity pattern.
control-plane/platformctl/tests/test_codex_askpass.py — local pytest style for identity/runtime wrappers.
Forgejo issue #73 — direct issue scope and acceptance criteria.
Forgejo issue #76 — parent Agent Access Plane issue created from the Oracle/product framing.

Closes #73

Canary status: defer_to_issue — accepted by operator and merged; session lifecycle hardening tracked in #79 ## Canary Context Pack ### Product story The operator needs to delegate real OpenClaw/VPS1000 checks to Codex without pasting SSH keys into chat or leaving private keys on disk. This is the first narrow child of #76 Agent Access Plane: agents receive a bounded SSH capability, not a standing raw credential. ### What changed - Added `scripts/agent-access/codex-openclaw-ssh-agent`, a Python wrapper that starts a dedicated `ssh-agent`, loads an Infisical-injected key via `ssh-add -t`, and writes only non-secret session state under `~/.platformctl-runtime/agent-access/sessions/<session-id>/`. - Added `docs/agent-access/codex-openclaw-ssh.md` with operator usage, stop conditions, and runtime layout. - Extended `state/runtime-layout.md` with the new non-secret `agent-access/` session directory. - Added pytest coverage for secret-safety, default 1h TTL, max TTL refusal, private runtime permissions, sanitized child env, real OpenSSH TTL eviction with a disposable key, and `--stop` revoke without secret env. - Iter2 fixes: `ssh_hint` now forces `IdentityAgent="$SSH_AUTH_SOCK"`, `IdentityFile=none`, and `IdentitiesOnly=no`; wrapper prints `revoke_hint`; `--stop <session-id>` kills the session agent and marks metadata; unexpected exceptions now kill started agents; empty `ssh-add -l` output becomes `AgentAccessError`. ### Why it changed #73 identified Codex SSH key delivery as the immediate friction point. Oracle review and product framing confirmed the larger direction (#76), but the safe first slice is still one capability: `openclaw.ssh.codex-local` via TTL ssh-agent. ### Files touched - `scripts/agent-access/codex-openclaw-ssh-agent` - `docs/agent-access/codex-openclaw-ssh.md` - `control-plane/platformctl/tests/test_agent_access_ssh_agent.py` - `state/runtime-layout.md` ### Relevant context - #73: Codex SSH key delivery via ssh-agent with TTL - #76: Agent Access Plane parent issue - #64: Vault -> Infisical migration, dependency but not parent - #56: per-agent Forgejo/MCP identity split, adjacent child - #72: later safe CI/canary Infisical integration; intentionally not touched here - `pdurlej/iskra-openclaw#43`: broader SecretRef/Agent Vault architecture to migrate/split later - `pdurlej/iskra-openclaw#44` and PR #45: existing OpenClaw Path A wrapper and public-key side ### Runtime evidence - `PYTHONPATH=control-plane pytest -q control-plane/platformctl/tests/test_agent_access_ssh_agent.py` => 7 passed. - `PYTHONPATH=control-plane pytest -q control-plane/platformctl/tests` => 154 passed. - `git diff --check HEAD~1 HEAD` => pass. - Identity check: codex Forgejo+git doctor pass with `--skip infisical --skip review --skip mcp`; full doctor still fails on missing local Infisical token-cache, so this PR does not claim full identity-doctor coverage. - Live Infisical validation: `/home-platform/agent_access/ssh/openclaw/codex-local` in prod injects `OPENCLAW_SSH_PRIVATE_KEY`; wrapper prints `PASS capability=openclaw.ssh.codex-local` and loads fingerprint `SHA256:unGNI4x7pKVTlxP4Cnrs8RhYmlIvRBcGFqo4qByelFA` with `ttl_seconds=3600`. - Live SSH transport smoke: after sourcing `agent.env`, `ssh-add -l -E sha256` lists the same fingerprint; `ssh -o BatchMode=yes -o ForwardAgent=no -o IdentityAgent="$SSH_AUTH_SOCK" -o IdentityFile=none -o IdentitiesOnly=no openclaw@vps1000 'ls /home/openclaw/.local/bin/'` succeeds via the forced-command wrapper. - Live OpenClaw canary execution proof: the same access path runs `iskra-canary --json --timeout-seconds 30`; it returns `status=critical` from Iskra runtime health, which is not an SSH/access failure. - Live revoke smoke: wrapper prints `revoke_hint=scripts/agent-access/codex-openclaw-ssh-agent --stop <session-id>`; running it returns `PASS stopped_session=<session-id>` and subsequent `ssh-add -l` against that socket fails with `Error connecting to agent: No such file or directory`. ### Known constraints - The wrapper expects the private key to arrive through a specific environment variable injected into this wrapper process, for example via `infisical run`. It does not fetch secrets itself and does not print them. - Child `ssh-agent`/`ssh-add` processes receive a sanitized environment; the private-key env var is popped before child execution. - This PR does not modify vps1000, OpenClaw wrapper allowlists, Forgejo PATs, MCP identity, CI, or #72. - The `agent.env` file is non-secret but access-bearing while the agent TTL is alive; runtime dir permissions, `IdentityFile=none`, and TTL bound the exposure. ### Explicit out-of-scope - Agent Access Plane ADR/design doc (#76 follow-up) - Generic capability catalog/resolver - Per-agent Forgejo/MCP identity split (#56) - Forgejo Actions/canary Infisical integration (#72 follow-up) - Any `infisical run -- bash` or arbitrary command execution with secrets - Writing private keys to `$HOME/.ssh`, `/tmp`, dotenv files, repo files, logs, or PR text ### Requested decision Approve this first implementation slice for #73 now that Infisical path validation, live forced-command SSH smoke, explicit identity isolation, and revoke smoke pass. ### Merge blockers - Any path where the private key can land in repo, logs, argv, sourced dotenv, `$HOME/.ssh`, or `/tmp`. - Any inheritance of the private-key env var into `ssh-agent` or `ssh-add` child processes. - Any reviewer requirement that #76 ADR must land before this tactical #73 slice. - Live access path must remain green: Infisical injection + `ssh-agent` load + forced-command SSH smoke + `--stop` revoke. ## Spec sources read - `PLATFORM_CHARTER.md` §runtime state and service identities — runtime/secret boundaries. - `state/runtime-layout.md` — canonical machine-local runtime layout. - `AGENTS.md` — identity, canary, and PR body conventions. - `control-plane/platformctl/identity/codex.py` — codex askpass/worktree identity pattern. - `control-plane/platformctl/tests/test_codex_askpass.py` — local pytest style for identity/runtime wrappers. - Forgejo issue #73 — direct issue scope and acceptance criteria. - Forgejo issue #76 — parent Agent Access Plane issue created from the Oracle/product framing. Closes #73

codex added 1 commit

2026-05-05 03:18:25 +02:00

feat(agent-access): add codex openclaw ssh-agent ttl wrapper

canary-required / collect-diff (pull_request) Successful in 3s

Details

python-ci / Python 3.11 (pull_request) Successful in 22s

Details

python-ci / Python 3.12 (pull_request) Successful in 22s

Details

python-ci / Python 3.13 (pull_request) Successful in 22s

Details

canary-required / canary (pull_request) Failing after 1s

Details

1abe456d1e

codex referenced this pull request

2026-05-05 03:21:33 +02:00

docs(decisions): define Agent Access Plane #78

codex added 1 commit

2026-05-05 03:52:59 +02:00

fix(agent-access): force session ssh agent in hints

canary-required / collect-diff (pull_request) Successful in 3s

Details

python-ci / Python 3.11 (pull_request) Successful in 24s

Details

python-ci / Python 3.12 (pull_request) Successful in 25s

Details

python-ci / Python 3.13 (pull_request) Successful in 24s

Details

canary-required / canary (pull_request) Failing after 1s

Details

f5dff161d0

codex added 1 commit

2026-05-05 03:59:20 +02:00

fix(agent-access): add stop path and isolate ssh identities

canary-required / collect-diff (pull_request) Successful in 3s

Details

python-ci / Python 3.11 (pull_request) Successful in 22s

Details

python-ci / Python 3.12 (pull_request) Successful in 23s

Details

python-ci / Python 3.13 (pull_request) Successful in 22s

Details

canary-required / canary (pull_request) Failing after 2s

Details

eff6d5c46a

codex referenced this pull request

2026-05-05 04:09:32 +02:00

fix(agent-access): session lifecycle hardening after ssh-agent TTL slice #79

pdurlej added 1 commit

2026-05-05 07:22:28 +02:00

Merge branch 'main' into codex/issues/73-agent-access-ssh-ttl

canary-required / collect-diff (pull_request) Successful in 3s

Details

python-ci / Python 3.11 (pull_request) Successful in 23s

Details

python-ci / Python 3.12 (pull_request) Successful in 24s

Details

python-ci / Python 3.13 (pull_request) Successful in 24s

Details

canary-required / canary (pull_request) Failing after 2s

Details

e366e65f84

pdurlej merged commit 7fc4062caf into main

2026-05-05 07:22:48 +02:00

pdurlej referenced this pull request from a commit

2026-05-05 07:22:49 +02:00

Merge pull request 'feat(agent-access): add codex OpenClaw ssh-agent TTL wrapper' (#77) from codex/issues/73-agent-access-ssh-ttl into main

codex commented

2026-05-05 07:24:20 +02:00

Author

Collaborator

Post-merge note from Codex: operator accepted defer_to_issue terminal for #77 and merged after live evidence passed.

Evidence before merge:

Infisical prod path injected OPENCLAW_SSH_PRIVATE_KEY.
ssh-agent loaded fingerprint SHA256:unGNI4x7pKVTlxP4Cnrs8RhYmlIvRBcGFqo4qByelFA with 1h TTL.
Forced-command SSH smoke to openclaw@vps1000 succeeded with IdentityAgent="$SSH_AUTH_SOCK" and IdentityFile=none.
--stop <session-id> revoke smoke succeeded.
Remaining lifecycle hardening is tracked in #79.

Post-merge note from Codex: operator accepted `defer_to_issue` terminal for #77 and merged after live evidence passed. Evidence before merge: - Infisical prod path injected `OPENCLAW_SSH_PRIVATE_KEY`. - `ssh-agent` loaded fingerprint `SHA256:unGNI4x7pKVTlxP4Cnrs8RhYmlIvRBcGFqo4qByelFA` with 1h TTL. - Forced-command SSH smoke to `openclaw@vps1000` succeeded with `IdentityAgent="$SSH_AUTH_SOCK"` and `IdentityFile=none`. - `--stop <session-id>` revoke smoke succeeded. - Remaining lifecycle hardening is tracked in #79.

codex referenced this pull request

2026-05-05 07:37:42 +02:00

fix(agent-access): harden ssh-agent session lifecycle #80