fix(platformctl): set apply transport timeout #647

Merged
pdurlej merged 1 commit from codex/m06-apply-transport-timeout into main 2026-05-30 16:42:29 +02:00
Collaborator

Canary status: missing - fire canary 3+3 manually before merge

Canary Context Pack

Product story

Prevent remote compose hangs from blocking platformctl apply indefinitely by making transport timeouts explicit at the apply call site.

What changed

  • Added DEFAULT_APPLY_TRANSPORT_TIMEOUT_SECONDS.
  • Passed that timeout explicitly to both preflight and remote compose apply transport calls.
  • Updated fake transports and tests to assert timeout propagation.

Why it changed

Issue #202 called out that apply should not rely on implicit transport defaults for mutation-path SSH commands.

Files touched

  • control-plane/platformctl/apply.py
  • control-plane/platformctl/tests/test_apply_phase3.py
  • control-plane/platformctl/tests/test_apply_env_file.py

Relevant context

  • #202 TRANSPORT-TIMEOUT-01
  • M06 apply-pipeline runtime safety hardening

Runtime evidence

No runtime mutation. Local validation only.

Known constraints

This PR does not change TailscaleTransport internals. It makes apply's timeout contract explicit at the call site.

Explicit out-of-scope

  • No SSH calls.
  • No runtime apply.
  • No post-apply smoke logic; that remains #201.

Requested decision

Approve and merge if checks stay green.

Merge blockers

  • Tests fail.
  • Apply transport calls omit explicit timeout.
  • Patchwarden flags a safety regression.

Spec sources read

  • control-plane/platformctl/apply.py - apply transport calls.
  • control-plane/platformctl/transport/tailscale.py - transport run signature and timeout behavior.
  • control-plane/platformctl/tests/test_apply_phase3.py - apply tests and fake transport.
  • control-plane/platformctl/tests/test_apply_env_file.py - env-file apply fake transport.
  • Forgejo issue #202 - acceptance criteria.

Validation

  • PYTHONPATH=control-plane control-plane/.venv/bin/python -m pytest control-plane/platformctl/tests/test_apply_phase3.py control-plane/platformctl/tests/test_apply_env_file.py - 101 passed
  • PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate all --json - pass

Closes #202

Canary status: missing - fire canary 3+3 manually before merge ## Canary Context Pack ### Product story Prevent remote compose hangs from blocking `platformctl apply` indefinitely by making transport timeouts explicit at the apply call site. ### What changed - Added `DEFAULT_APPLY_TRANSPORT_TIMEOUT_SECONDS`. - Passed that timeout explicitly to both preflight and remote compose apply transport calls. - Updated fake transports and tests to assert timeout propagation. ### Why it changed Issue #202 called out that apply should not rely on implicit transport defaults for mutation-path SSH commands. ### Files touched - `control-plane/platformctl/apply.py` - `control-plane/platformctl/tests/test_apply_phase3.py` - `control-plane/platformctl/tests/test_apply_env_file.py` ### Relevant context - #202 TRANSPORT-TIMEOUT-01 - M06 apply-pipeline runtime safety hardening ### Runtime evidence No runtime mutation. Local validation only. ### Known constraints This PR does not change TailscaleTransport internals. It makes apply's timeout contract explicit at the call site. ### Explicit out-of-scope - No SSH calls. - No runtime apply. - No post-apply smoke logic; that remains #201. ### Requested decision Approve and merge if checks stay green. ### Merge blockers - Tests fail. - Apply transport calls omit explicit timeout. - Patchwarden flags a safety regression. ## Spec sources read - `control-plane/platformctl/apply.py` - apply transport calls. - `control-plane/platformctl/transport/tailscale.py` - transport run signature and timeout behavior. - `control-plane/platformctl/tests/test_apply_phase3.py` - apply tests and fake transport. - `control-plane/platformctl/tests/test_apply_env_file.py` - env-file apply fake transport. - Forgejo issue #202 - acceptance criteria. ## Validation - `PYTHONPATH=control-plane control-plane/.venv/bin/python -m pytest control-plane/platformctl/tests/test_apply_phase3.py control-plane/platformctl/tests/test_apply_env_file.py` - 101 passed - `PYTHONPATH=control-plane control-plane/.venv/bin/python -m platformctl.cli validate all --json` - pass Closes #202
fix(platformctl): set apply transport timeout
All checks were successful
patchwarden-client-dry-run / collect-diff (pull_request) Successful in 5s
platformctl plan / auto-apply scope (pull_request) Successful in 25s
pyfallow / Pyfallow gate (control-plane) (pull_request) Successful in 22s
python-ci / Python 3.11 (pull_request) Successful in 48s
python-ci / Python 3.13 (pull_request) Successful in 45s
base-is-main / guard (pull_request) Successful in 1s
patchwarden-pr-sanity / collect-diff (pull_request) Successful in 5s
patchwarden-client-dry-run / dry-run (pull_request) Successful in 23s
canary-required / collect-diff (pull_request) Successful in 5s
python-ci / Python 3.12 (pull_request) Successful in 47s
canary-required / canary (pull_request) Successful in 15s
patchwarden-pr-sanity / sanity (pull_request) Successful in 2m38s
52c5c9a8bc
Author
Collaborator

Patchwarden PR sanity

  • Status: advisory_findings
  • PR: 647
  • Commit: 52c5c9a8bc7651384a89c838fc05af63a2d625d4
  • Security-sensitive label: present
  • Authority: advisory model review plus deterministic blockers only
  • 3+3 canary: still alive; this does not replace it

Deterministic findings

  • info sensitive-path-touched Sensitive path touched — control-plane/platformctl/apply.py
    • Evidence: control-plane/platformctl/apply.py
    • Next: Route through the existing 3+3/risk-tier process; model review remains advisory.

Model reviewers

global-glm / glm-5.1:cloud

  • Status: ok

  • Verdict: OK

  • medium 120s timeout may be insufficient for remote compose operations

    • Evidence: control-plane/platformctl/apply.py line 46: DEFAULT_APPLY_TRANSPORT_TIMEOUT_SECONDS = 120 applied to both preflight (line 853) and remote compose (line 886)
    • Next: Consider monitoring real-world apply durations post-merge. If compose operations exceed 120s in production, either increase the default or make it configurable via env var (e.g., PLATFORMCTL_APPLY_TIMEOUT_SECONDS)
  • low Same timeout for preflight and compose despite different expected durations

    • Evidence: control-plane/platformctl/apply.py lines 850-854 and 882-886: both transport.run() calls use identical DEFAULT_APPLY_TRANSPORT_TIMEOUT_SECONDS
    • Next: Preflight checks should complete in seconds; compose may need minutes. Consider separate timeouts (e.g., DEFAULT_PREFLIGHT_TIMEOUT_SECONDS=30, DEFAULT_COMPOSE_TIMEOUT_SECONDS=120) for clearer intent and better failure diagnostics

global-deepseek / deepseek-v4-pro:cloud

  • Status: ok
  • Verdict: OK
  • Findings: none

redteam / kimi-k2.6:cloud

  • Status: ok

  • Verdict: NOT_OK

  • high Unhandled transport timeout exception crashes apply

    • Evidence: control-plane/platformctl/apply.py adds timeout=DEFAULT_APPLY_TRANSPORT_TIMEOUT_SECONDS to both preflight and remote compose transport.run calls (lines ~850-860 and ~882-892), but both remain inside try blocks that only catch SSHError; no e
    • Next: Add except clauses for timeout exceptions (e.g., TimeoutError or the transport's timeout type) around both transport.run calls, returning structured error results and invoking rollback for the remote path

Policy notes

  • GLM 5.1 + DeepSeek V4 Pro are the operator-required model mix for this bot.
  • Optional red-team model is enabled only when PLATFORMCTL_PR_SANITY_REDTEAM_MODEL is configured.
  • Auto-merge is not enabled here.
<!-- patchwarden-pr-sanity:pdurlej/platform:PR-647 --> # Patchwarden PR sanity - Status: `advisory_findings` - PR: `647` - Commit: `52c5c9a8bc7651384a89c838fc05af63a2d625d4` - Security-sensitive label: `present` - Authority: advisory model review plus deterministic blockers only - 3+3 canary: still alive; this does not replace it ## Deterministic findings - **`info` `sensitive-path-touched`** Sensitive path touched — `control-plane/platformctl/apply.py` - Evidence: `control-plane/platformctl/apply.py` - Next: Route through the existing 3+3/risk-tier process; model review remains advisory. ## Model reviewers ### `global-glm` / `glm-5.1:cloud` - Status: `ok` - Verdict: `OK` - **`medium`** 120s timeout may be insufficient for remote compose operations - Evidence: `control-plane/platformctl/apply.py line 46: DEFAULT_APPLY_TRANSPORT_TIMEOUT_SECONDS = 120 applied to both preflight (line 853) and remote compose (line 886)` - Next: Consider monitoring real-world apply durations post-merge. If compose operations exceed 120s in production, either increase the default or make it configurable via env var (e.g., PLATFORMCTL_APPLY_TIMEOUT_SECONDS) - **`low`** Same timeout for preflight and compose despite different expected durations - Evidence: `control-plane/platformctl/apply.py lines 850-854 and 882-886: both transport.run() calls use identical DEFAULT_APPLY_TRANSPORT_TIMEOUT_SECONDS` - Next: Preflight checks should complete in seconds; compose may need minutes. Consider separate timeouts (e.g., DEFAULT_PREFLIGHT_TIMEOUT_SECONDS=30, DEFAULT_COMPOSE_TIMEOUT_SECONDS=120) for clearer intent and better failure diagnostics ### `global-deepseek` / `deepseek-v4-pro:cloud` - Status: `ok` - Verdict: `OK` - Findings: none ### `redteam` / `kimi-k2.6:cloud` - Status: `ok` - Verdict: `NOT_OK` - **`high`** Unhandled transport timeout exception crashes apply - Evidence: `control-plane/platformctl/apply.py adds timeout=DEFAULT_APPLY_TRANSPORT_TIMEOUT_SECONDS to both preflight and remote compose transport.run calls (lines ~850-860 and ~882-892), but both remain inside try blocks that only catch SSHError; no e` - Next: Add except clauses for timeout exceptions (e.g., TimeoutError or the transport's timeout type) around both transport.run calls, returning structured error results and invoking rollback for the remote path ## Policy notes - GLM 5.1 + DeepSeek V4 Pro are the operator-required model mix for this bot. - Optional red-team model is enabled only when `PLATFORMCTL_PR_SANITY_REDTEAM_MODEL` is configured. - Auto-merge is not enabled here.
pdurlej approved these changes 2026-05-30 16:42:28 +02:00
pdurlej left a comment

Approved by Codex using operator-authorized temporary admin PAT after all checks green.

Approved by Codex using operator-authorized temporary admin PAT after all checks green.
pdurlej deleted branch codex/m06-apply-transport-timeout 2026-05-30 16:42:30 +02:00
Sign in to join this conversation.
No reviewers
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform!647
No description provided.