fix(agent-access): harden ssh-agent session lifecycle #678

Merged
pdurlej merged 1 commit from codex/79-agent-access-session-lifecycle into main 2026-06-02 07:35:33 +02:00
Collaborator

Canary status: missing - fire canary 3+3 manually before merge

Summary

Hardens the Codex OpenClaw SSH agent wrapper for issue #79 without touching live SSH, RS2000, Forgejo, Infisical, or production keys.

This PR intentionally uses Refs #79, not Closes #79: the issue's own AC requires f2 after 7 days of real TTL evidence, so full closure would be a false-green.

Refs #79

Canary Context Pack

Product story

Codex/OpenClaw SSH access should be auditable and fail closed. The operator should be able to inspect session lifecycle evidence without exposing private key material or trusting stale runtime directories.

What changed

  • Makes runtime/session directory creation race-safe and exact-0700/owned-by-current-user.
  • Adds non-secret audit.jsonl lifecycle events for created/key_loaded/stopped/crashed.
  • Adds --list --json with duration and exit-reason fields.
  • Adds codex-openclaw-ssh-agent-ttl-evidence for TTL evidence summaries.
  • Expands regression tests for concurrency, JSON list output, audit privacy, and evidence helper behavior.

Why it changed

Issue #79 requires stronger lifecycle hardening and queryable evidence before any future TTL reduction decision.

Files touched

  • scripts/agent-access/codex-openclaw-ssh-agent
  • scripts/agent-access/codex-openclaw-ssh-agent-ttl-evidence
  • control-plane/platformctl/tests/test_agent_access_ssh_agent.py
  • docs/agent-access/codex-openclaw-ssh.md

Relevant context

  • docs/specs/agent-access-session-lifecycle-hardening-v0/01-specify.md
  • docs/specs/agent-access-session-lifecycle-hardening-v0/02-plan.md
  • docs/specs/agent-access-session-lifecycle-hardening-v0/03-tasks.md
  • docs/agent-access/codex-openclaw-ssh.md

Runtime evidence

No live runtime action was performed. Verification used fake OpenSSH tools and disposable runtime roots only.

Known constraints

This does not make the f2 TTL reduction decision. That still needs 7+ days of real evidence.

Explicit out-of-scope

  • Live SSH session creation
  • Infisical access
  • RS2000/VPS1000 mutation
  • TTL max reduction
  • Closing #79 before f2 evidence exists

Requested decision

Review whether this safely lands e1/e2/f1 groundwork for #79.

Merge blockers

  • Any private-key material in stdout/stderr/runtime files/audit logs
  • Any path that chmods unsafe existing runtime directories into compliance
  • Any live runtime dependency or mutation

Spec sources read

  • docs/specs/agent-access-session-lifecycle-hardening-v0/01-specify.md - acceptance criteria
  • docs/specs/agent-access-session-lifecycle-hardening-v0/02-plan.md - design choices
  • docs/specs/agent-access-session-lifecycle-hardening-v0/03-tasks.md - slice tasks and sequencing
  • docs/agent-access/codex-openclaw-ssh.md - operator-facing runbook
  • scripts/agent-access/codex-openclaw-ssh-agent - implementation surface
  • control-plane/platformctl/tests/test_agent_access_ssh_agent.py - regression surface

Validation

  • uv run pytest control-plane/platformctl/tests/test_agent_access_ssh_agent.py - 22 passed
  • PYTHONPATH=control-plane uv run --project control-plane python -m platformctl.cli validate all --json - exitCode 0
  • python3 -m py_compile scripts/agent-access/codex-openclaw-ssh-agent scripts/agent-access/codex-openclaw-ssh-agent-ttl-evidence - passed

Notes

uv refreshed tracked control-plane/platformctl.egg-info/* files in the local worktree during validation, but those generated changes are intentionally unstaged and not part of this PR.

Canary status: missing - fire canary 3+3 manually before merge ## Summary Hardens the Codex OpenClaw SSH agent wrapper for issue #79 without touching live SSH, RS2000, Forgejo, Infisical, or production keys. This PR intentionally uses `Refs #79`, not `Closes #79`: the issue's own AC requires f2 after 7 days of real TTL evidence, so full closure would be a false-green. Refs #79 ## Canary Context Pack ### Product story Codex/OpenClaw SSH access should be auditable and fail closed. The operator should be able to inspect session lifecycle evidence without exposing private key material or trusting stale runtime directories. ### What changed - Makes runtime/session directory creation race-safe and exact-0700/owned-by-current-user. - Adds non-secret `audit.jsonl` lifecycle events for created/key_loaded/stopped/crashed. - Adds `--list --json` with duration and exit-reason fields. - Adds `codex-openclaw-ssh-agent-ttl-evidence` for TTL evidence summaries. - Expands regression tests for concurrency, JSON list output, audit privacy, and evidence helper behavior. ### Why it changed Issue #79 requires stronger lifecycle hardening and queryable evidence before any future TTL reduction decision. ### Files touched - `scripts/agent-access/codex-openclaw-ssh-agent` - `scripts/agent-access/codex-openclaw-ssh-agent-ttl-evidence` - `control-plane/platformctl/tests/test_agent_access_ssh_agent.py` - `docs/agent-access/codex-openclaw-ssh.md` ### Relevant context - `docs/specs/agent-access-session-lifecycle-hardening-v0/01-specify.md` - `docs/specs/agent-access-session-lifecycle-hardening-v0/02-plan.md` - `docs/specs/agent-access-session-lifecycle-hardening-v0/03-tasks.md` - `docs/agent-access/codex-openclaw-ssh.md` ### Runtime evidence No live runtime action was performed. Verification used fake OpenSSH tools and disposable runtime roots only. ### Known constraints This does not make the f2 TTL reduction decision. That still needs 7+ days of real evidence. ### Explicit out-of-scope - Live SSH session creation - Infisical access - RS2000/VPS1000 mutation - TTL max reduction - Closing #79 before f2 evidence exists ### Requested decision Review whether this safely lands e1/e2/f1 groundwork for #79. ### Merge blockers - Any private-key material in stdout/stderr/runtime files/audit logs - Any path that chmods unsafe existing runtime directories into compliance - Any live runtime dependency or mutation ## Spec sources read - `docs/specs/agent-access-session-lifecycle-hardening-v0/01-specify.md` - acceptance criteria - `docs/specs/agent-access-session-lifecycle-hardening-v0/02-plan.md` - design choices - `docs/specs/agent-access-session-lifecycle-hardening-v0/03-tasks.md` - slice tasks and sequencing - `docs/agent-access/codex-openclaw-ssh.md` - operator-facing runbook - `scripts/agent-access/codex-openclaw-ssh-agent` - implementation surface - `control-plane/platformctl/tests/test_agent_access_ssh_agent.py` - regression surface ## Validation - `uv run pytest control-plane/platformctl/tests/test_agent_access_ssh_agent.py` - 22 passed - `PYTHONPATH=control-plane uv run --project control-plane python -m platformctl.cli validate all --json` - exitCode 0 - `python3 -m py_compile scripts/agent-access/codex-openclaw-ssh-agent scripts/agent-access/codex-openclaw-ssh-agent-ttl-evidence` - passed ## Notes `uv` refreshed tracked `control-plane/platformctl.egg-info/*` files in the local worktree during validation, but those generated changes are intentionally unstaged and not part of this PR.
fix(agent-access): harden ssh-agent session lifecycle
All checks were successful
canary-required / collect-diff (pull_request) Successful in 3s
platformctl plan / auto-apply scope (pull_request) Successful in 16s
pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 15s
python-ci / Python 3.11 (pull_request) Successful in 34s
python-ci / Python 3.12 (pull_request) Successful in 36s
python-ci / Python 3.13 (pull_request) Successful in 36s
canary-required / canary (pull_request) Successful in 11s
base-is-main / guard (pull_request) Successful in 1s
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 3s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 3s
patchwarden-client-dry-run / dry-run (pull_request) Successful in 15s
patchwarden-pr-sanity / sanity (pull_request) Successful in 7m7s
a069624495
Author
Collaborator

Patchwarden PR sanity

  • Status: eligible_sanity_clean
  • PR: 678
  • Commit: a0696244954f581070ce2a6dc67879b3af5b171c
  • Security-sensitive label: present
  • Authority: advisory model review plus deterministic blockers only
  • 3+3 canary: still alive; this does not replace it

Deterministic findings

No deterministic findings.

Model reviewers

global-glm / glm-5.1:cloud

  • Status: ok

  • Verdict: OK

  • low Unnecessary exception handler in write_audit_event is dead code

    • Evidence: scripts/agent-access/codex-openclaw-ssh-agent lines ~195-198: except Exception: raise is a no-op that adds noise without changing control flow
    • Next: Remove the try/except block or replace with meaningful cleanup (e.g., os.close(fd) on partial write failure)
  • low TOCTOU window between os.mkdir and post-creation validation

    • Evidence: scripts/agent-access/codex-openclaw-ssh-agent _ensure_parent_existsandcreate_private_session_dir: os.mkdir at mode 0o700 is followed by _path_is_owned_and_privatecheck, but an attacker with write access to parent could replace the
    • Next: This is a fundamental filesystem limitation; the post-creation validation is the best practical mitigation. Document this as a known constraint if not already.

global-deepseek / deepseek-v4-pro:cloud

  • Status: error
  • Verdict: -
  • Note: ReadTimeout: The read operation timed out
  • Findings: none

redteam / kimi-k2.6:cloud

  • Status: error
  • Verdict: -
  • Note: ReadTimeout: The read operation timed out
  • Findings: none

Policy notes

  • GLM 5.1 + DeepSeek V4 Pro are the operator-required model mix for this bot.
  • Optional red-team model is enabled only when PLATFORMCTL_PR_SANITY_REDTEAM_MODEL is configured.
  • Auto-merge is not enabled here.
<!-- patchwarden-pr-sanity:pdurlej/platform:PR-678 --> # Patchwarden PR sanity - Status: `eligible_sanity_clean` - PR: `678` - Commit: `a0696244954f581070ce2a6dc67879b3af5b171c` - Security-sensitive label: `present` - Authority: advisory model review plus deterministic blockers only - 3+3 canary: still alive; this does not replace it ## Deterministic findings No deterministic findings. ## Model reviewers ### `global-glm` / `glm-5.1:cloud` - Status: `ok` - Verdict: `OK` - **`low`** Unnecessary exception handler in write_audit_event is dead code - Evidence: `scripts/agent-access/codex-openclaw-ssh-agent lines ~195-198: `except Exception: raise` is a no-op that adds noise without changing control flow` - Next: Remove the try/except block or replace with meaningful cleanup (e.g., os.close(fd) on partial write failure) - **`low`** TOCTOU window between os.mkdir and post-creation validation - Evidence: `scripts/agent-access/codex-openclaw-ssh-agent `_ensure_parent_exists` and `create_private_session_dir`: os.mkdir at mode 0o700 is followed by `_path_is_owned_and_private` check, but an attacker with write access to parent could replace the ` - Next: This is a fundamental filesystem limitation; the post-creation validation is the best practical mitigation. Document this as a known constraint if not already. ### `global-deepseek` / `deepseek-v4-pro:cloud` - Status: `error` - Verdict: `-` - Note: ReadTimeout: The read operation timed out - Findings: none ### `redteam` / `kimi-k2.6:cloud` - Status: `error` - Verdict: `-` - Note: ReadTimeout: The read operation timed out - Findings: none ## Policy notes - GLM 5.1 + DeepSeek V4 Pro are the operator-required model mix for this bot. - Optional red-team model is enabled only when `PLATFORMCTL_PR_SANITY_REDTEAM_MODEL` is configured. - Auto-merge is not enabled here.
pdurlej deleted branch codex/79-agent-access-session-lifecycle 2026-06-02 07:35:33 +02:00
Sign in to join this conversation.
No reviewers
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform!678
No description provided.