feat(agent-access): add codex OpenClaw ssh-agent TTL wrapper #77

Merged
pdurlej merged 4 commits from codex/issues/73-agent-access-ssh-ttl into main 2026-05-05 07:22:48 +02:00
Collaborator

Canary status: defer_to_issue — accepted by operator and merged; session lifecycle hardening tracked in #79

Canary Context Pack

Product story

The operator needs to delegate real OpenClaw/VPS1000 checks to Codex without pasting SSH keys into chat or leaving private keys on disk. This is the first narrow child of #76 Agent Access Plane: agents receive a bounded SSH capability, not a standing raw credential.

What changed

  • Added scripts/agent-access/codex-openclaw-ssh-agent, a Python wrapper that starts a dedicated ssh-agent, loads an Infisical-injected key via ssh-add -t, and writes only non-secret session state under ~/.platformctl-runtime/agent-access/sessions/<session-id>/.
  • Added docs/agent-access/codex-openclaw-ssh.md with operator usage, stop conditions, and runtime layout.
  • Extended state/runtime-layout.md with the new non-secret agent-access/ session directory.
  • Added pytest coverage for secret-safety, default 1h TTL, max TTL refusal, private runtime permissions, sanitized child env, real OpenSSH TTL eviction with a disposable key, and --stop revoke without secret env.
  • Iter2 fixes: ssh_hint now forces IdentityAgent="$SSH_AUTH_SOCK", IdentityFile=none, and IdentitiesOnly=no; wrapper prints revoke_hint; --stop <session-id> kills the session agent and marks metadata; unexpected exceptions now kill started agents; empty ssh-add -l output becomes AgentAccessError.

Why it changed

#73 identified Codex SSH key delivery as the immediate friction point. Oracle review and product framing confirmed the larger direction (#76), but the safe first slice is still one capability: openclaw.ssh.codex-local via TTL ssh-agent.

Files touched

  • scripts/agent-access/codex-openclaw-ssh-agent
  • docs/agent-access/codex-openclaw-ssh.md
  • control-plane/platformctl/tests/test_agent_access_ssh_agent.py
  • state/runtime-layout.md

Relevant context

  • #73: Codex SSH key delivery via ssh-agent with TTL
  • #76: Agent Access Plane parent issue
  • #64: Vault -> Infisical migration, dependency but not parent
  • #56: per-agent Forgejo/MCP identity split, adjacent child
  • #72: later safe CI/canary Infisical integration; intentionally not touched here
  • pdurlej/iskra-openclaw#43: broader SecretRef/Agent Vault architecture to migrate/split later
  • pdurlej/iskra-openclaw#44 and PR #45: existing OpenClaw Path A wrapper and public-key side

Runtime evidence

  • PYTHONPATH=control-plane pytest -q control-plane/platformctl/tests/test_agent_access_ssh_agent.py => 7 passed.
  • PYTHONPATH=control-plane pytest -q control-plane/platformctl/tests => 154 passed.
  • git diff --check HEAD~1 HEAD => pass.
  • Identity check: codex Forgejo+git doctor pass with --skip infisical --skip review --skip mcp; full doctor still fails on missing local Infisical token-cache, so this PR does not claim full identity-doctor coverage.
  • Live Infisical validation: /home-platform/agent_access/ssh/openclaw/codex-local in prod injects OPENCLAW_SSH_PRIVATE_KEY; wrapper prints PASS capability=openclaw.ssh.codex-local and loads fingerprint SHA256:unGNI4x7pKVTlxP4Cnrs8RhYmlIvRBcGFqo4qByelFA with ttl_seconds=3600.
  • Live SSH transport smoke: after sourcing agent.env, ssh-add -l -E sha256 lists the same fingerprint; ssh -o BatchMode=yes -o ForwardAgent=no -o IdentityAgent="$SSH_AUTH_SOCK" -o IdentityFile=none -o IdentitiesOnly=no openclaw@vps1000 'ls /home/openclaw/.local/bin/' succeeds via the forced-command wrapper.
  • Live OpenClaw canary execution proof: the same access path runs iskra-canary --json --timeout-seconds 30; it returns status=critical from Iskra runtime health, which is not an SSH/access failure.
  • Live revoke smoke: wrapper prints revoke_hint=scripts/agent-access/codex-openclaw-ssh-agent --stop <session-id>; running it returns PASS stopped_session=<session-id> and subsequent ssh-add -l against that socket fails with Error connecting to agent: No such file or directory.

Known constraints

  • The wrapper expects the private key to arrive through a specific environment variable injected into this wrapper process, for example via infisical run. It does not fetch secrets itself and does not print them.
  • Child ssh-agent/ssh-add processes receive a sanitized environment; the private-key env var is popped before child execution.
  • This PR does not modify vps1000, OpenClaw wrapper allowlists, Forgejo PATs, MCP identity, CI, or #72.
  • The agent.env file is non-secret but access-bearing while the agent TTL is alive; runtime dir permissions, IdentityFile=none, and TTL bound the exposure.

Explicit out-of-scope

  • Agent Access Plane ADR/design doc (#76 follow-up)
  • Generic capability catalog/resolver
  • Per-agent Forgejo/MCP identity split (#56)
  • Forgejo Actions/canary Infisical integration (#72 follow-up)
  • Any infisical run -- bash or arbitrary command execution with secrets
  • Writing private keys to $HOME/.ssh, /tmp, dotenv files, repo files, logs, or PR text

Requested decision

Approve this first implementation slice for #73 now that Infisical path validation, live forced-command SSH smoke, explicit identity isolation, and revoke smoke pass.

Merge blockers

  • Any path where the private key can land in repo, logs, argv, sourced dotenv, $HOME/.ssh, or /tmp.
  • Any inheritance of the private-key env var into ssh-agent or ssh-add child processes.
  • Any reviewer requirement that #76 ADR must land before this tactical #73 slice.
  • Live access path must remain green: Infisical injection + ssh-agent load + forced-command SSH smoke + --stop revoke.

Spec sources read

  • PLATFORM_CHARTER.md §runtime state and service identities — runtime/secret boundaries.
  • state/runtime-layout.md — canonical machine-local runtime layout.
  • AGENTS.md — identity, canary, and PR body conventions.
  • control-plane/platformctl/identity/codex.py — codex askpass/worktree identity pattern.
  • control-plane/platformctl/tests/test_codex_askpass.py — local pytest style for identity/runtime wrappers.
  • Forgejo issue #73 — direct issue scope and acceptance criteria.
  • Forgejo issue #76 — parent Agent Access Plane issue created from the Oracle/product framing.

Closes #73

Canary status: defer_to_issue — accepted by operator and merged; session lifecycle hardening tracked in #79 ## Canary Context Pack ### Product story The operator needs to delegate real OpenClaw/VPS1000 checks to Codex without pasting SSH keys into chat or leaving private keys on disk. This is the first narrow child of #76 Agent Access Plane: agents receive a bounded SSH capability, not a standing raw credential. ### What changed - Added `scripts/agent-access/codex-openclaw-ssh-agent`, a Python wrapper that starts a dedicated `ssh-agent`, loads an Infisical-injected key via `ssh-add -t`, and writes only non-secret session state under `~/.platformctl-runtime/agent-access/sessions/<session-id>/`. - Added `docs/agent-access/codex-openclaw-ssh.md` with operator usage, stop conditions, and runtime layout. - Extended `state/runtime-layout.md` with the new non-secret `agent-access/` session directory. - Added pytest coverage for secret-safety, default 1h TTL, max TTL refusal, private runtime permissions, sanitized child env, real OpenSSH TTL eviction with a disposable key, and `--stop` revoke without secret env. - Iter2 fixes: `ssh_hint` now forces `IdentityAgent="$SSH_AUTH_SOCK"`, `IdentityFile=none`, and `IdentitiesOnly=no`; wrapper prints `revoke_hint`; `--stop <session-id>` kills the session agent and marks metadata; unexpected exceptions now kill started agents; empty `ssh-add -l` output becomes `AgentAccessError`. ### Why it changed #73 identified Codex SSH key delivery as the immediate friction point. Oracle review and product framing confirmed the larger direction (#76), but the safe first slice is still one capability: `openclaw.ssh.codex-local` via TTL ssh-agent. ### Files touched - `scripts/agent-access/codex-openclaw-ssh-agent` - `docs/agent-access/codex-openclaw-ssh.md` - `control-plane/platformctl/tests/test_agent_access_ssh_agent.py` - `state/runtime-layout.md` ### Relevant context - #73: Codex SSH key delivery via ssh-agent with TTL - #76: Agent Access Plane parent issue - #64: Vault -> Infisical migration, dependency but not parent - #56: per-agent Forgejo/MCP identity split, adjacent child - #72: later safe CI/canary Infisical integration; intentionally not touched here - `pdurlej/iskra-openclaw#43`: broader SecretRef/Agent Vault architecture to migrate/split later - `pdurlej/iskra-openclaw#44` and PR #45: existing OpenClaw Path A wrapper and public-key side ### Runtime evidence - `PYTHONPATH=control-plane pytest -q control-plane/platformctl/tests/test_agent_access_ssh_agent.py` => 7 passed. - `PYTHONPATH=control-plane pytest -q control-plane/platformctl/tests` => 154 passed. - `git diff --check HEAD~1 HEAD` => pass. - Identity check: codex Forgejo+git doctor pass with `--skip infisical --skip review --skip mcp`; full doctor still fails on missing local Infisical token-cache, so this PR does not claim full identity-doctor coverage. - Live Infisical validation: `/home-platform/agent_access/ssh/openclaw/codex-local` in prod injects `OPENCLAW_SSH_PRIVATE_KEY`; wrapper prints `PASS capability=openclaw.ssh.codex-local` and loads fingerprint `SHA256:unGNI4x7pKVTlxP4Cnrs8RhYmlIvRBcGFqo4qByelFA` with `ttl_seconds=3600`. - Live SSH transport smoke: after sourcing `agent.env`, `ssh-add -l -E sha256` lists the same fingerprint; `ssh -o BatchMode=yes -o ForwardAgent=no -o IdentityAgent="$SSH_AUTH_SOCK" -o IdentityFile=none -o IdentitiesOnly=no openclaw@vps1000 'ls /home/openclaw/.local/bin/'` succeeds via the forced-command wrapper. - Live OpenClaw canary execution proof: the same access path runs `iskra-canary --json --timeout-seconds 30`; it returns `status=critical` from Iskra runtime health, which is not an SSH/access failure. - Live revoke smoke: wrapper prints `revoke_hint=scripts/agent-access/codex-openclaw-ssh-agent --stop <session-id>`; running it returns `PASS stopped_session=<session-id>` and subsequent `ssh-add -l` against that socket fails with `Error connecting to agent: No such file or directory`. ### Known constraints - The wrapper expects the private key to arrive through a specific environment variable injected into this wrapper process, for example via `infisical run`. It does not fetch secrets itself and does not print them. - Child `ssh-agent`/`ssh-add` processes receive a sanitized environment; the private-key env var is popped before child execution. - This PR does not modify vps1000, OpenClaw wrapper allowlists, Forgejo PATs, MCP identity, CI, or #72. - The `agent.env` file is non-secret but access-bearing while the agent TTL is alive; runtime dir permissions, `IdentityFile=none`, and TTL bound the exposure. ### Explicit out-of-scope - Agent Access Plane ADR/design doc (#76 follow-up) - Generic capability catalog/resolver - Per-agent Forgejo/MCP identity split (#56) - Forgejo Actions/canary Infisical integration (#72 follow-up) - Any `infisical run -- bash` or arbitrary command execution with secrets - Writing private keys to `$HOME/.ssh`, `/tmp`, dotenv files, repo files, logs, or PR text ### Requested decision Approve this first implementation slice for #73 now that Infisical path validation, live forced-command SSH smoke, explicit identity isolation, and revoke smoke pass. ### Merge blockers - Any path where the private key can land in repo, logs, argv, sourced dotenv, `$HOME/.ssh`, or `/tmp`. - Any inheritance of the private-key env var into `ssh-agent` or `ssh-add` child processes. - Any reviewer requirement that #76 ADR must land before this tactical #73 slice. - Live access path must remain green: Infisical injection + `ssh-agent` load + forced-command SSH smoke + `--stop` revoke. ## Spec sources read - `PLATFORM_CHARTER.md` §runtime state and service identities — runtime/secret boundaries. - `state/runtime-layout.md` — canonical machine-local runtime layout. - `AGENTS.md` — identity, canary, and PR body conventions. - `control-plane/platformctl/identity/codex.py` — codex askpass/worktree identity pattern. - `control-plane/platformctl/tests/test_codex_askpass.py` — local pytest style for identity/runtime wrappers. - Forgejo issue #73 — direct issue scope and acceptance criteria. - Forgejo issue #76 — parent Agent Access Plane issue created from the Oracle/product framing. Closes #73
feat(agent-access): add codex openclaw ssh-agent ttl wrapper
Some checks failed
canary-required / collect-diff (pull_request) Successful in 3s
python-ci / Python 3.11 (pull_request) Successful in 22s
python-ci / Python 3.12 (pull_request) Successful in 22s
python-ci / Python 3.13 (pull_request) Successful in 22s
canary-required / canary (pull_request) Failing after 1s
1abe456d1e
fix(agent-access): force session ssh agent in hints
Some checks failed
canary-required / collect-diff (pull_request) Successful in 3s
python-ci / Python 3.11 (pull_request) Successful in 24s
python-ci / Python 3.12 (pull_request) Successful in 25s
python-ci / Python 3.13 (pull_request) Successful in 24s
canary-required / canary (pull_request) Failing after 1s
f5dff161d0
fix(agent-access): add stop path and isolate ssh identities
Some checks failed
canary-required / collect-diff (pull_request) Successful in 3s
python-ci / Python 3.11 (pull_request) Successful in 22s
python-ci / Python 3.12 (pull_request) Successful in 23s
python-ci / Python 3.13 (pull_request) Successful in 22s
canary-required / canary (pull_request) Failing after 2s
eff6d5c46a
Merge branch 'main' into codex/issues/73-agent-access-ssh-ttl
Some checks failed
canary-required / collect-diff (pull_request) Successful in 3s
python-ci / Python 3.11 (pull_request) Successful in 23s
python-ci / Python 3.12 (pull_request) Successful in 24s
python-ci / Python 3.13 (pull_request) Successful in 24s
canary-required / canary (pull_request) Failing after 2s
e366e65f84
Author
Collaborator

Post-merge note from Codex: operator accepted defer_to_issue terminal for #77 and merged after live evidence passed.

Evidence before merge:

  • Infisical prod path injected OPENCLAW_SSH_PRIVATE_KEY.
  • ssh-agent loaded fingerprint SHA256:unGNI4x7pKVTlxP4Cnrs8RhYmlIvRBcGFqo4qByelFA with 1h TTL.
  • Forced-command SSH smoke to openclaw@vps1000 succeeded with IdentityAgent="$SSH_AUTH_SOCK" and IdentityFile=none.
  • --stop <session-id> revoke smoke succeeded.
  • Remaining lifecycle hardening is tracked in #79.
Post-merge note from Codex: operator accepted `defer_to_issue` terminal for #77 and merged after live evidence passed. Evidence before merge: - Infisical prod path injected `OPENCLAW_SSH_PRIVATE_KEY`. - `ssh-agent` loaded fingerprint `SHA256:unGNI4x7pKVTlxP4Cnrs8RhYmlIvRBcGFqo4qByelFA` with 1h TTL. - Forced-command SSH smoke to `openclaw@vps1000` succeeded with `IdentityAgent="$SSH_AUTH_SOCK"` and `IdentityFile=none`. - `--stop <session-id>` revoke smoke succeeded. - Remaining lifecycle hardening is tracked in #79.
Sign in to join this conversation.
No reviewers
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform!77
No description provided.