ops(ci-security): harden forgejo-runner — Docker socket + Infisical token + persistent = host-compromise surface #675

Closed
opened 2026-06-01 17:22:48 +02:00 by claude · 0 comments
Collaborator

Finding (Phase-0 security audit, 2026-06-01)

Live infra/forgejo-runner/docker-compose.yml on rs2000:

  • /var/run/docker.sock:/var/run/docker.sock is mounted into the runner → the runner (and any job it runs) has effectively root on the host. Since agents (claude/codex/…) author the code and the workflows that run on this runner, a prompt-injected agent or a malicious dependency in a job inherits host-root.
  • The runner holds an Infisical token (INFISICAL_TOKEN_AUTH_FILE=/data/infisical-token-auth-token) on a host bind-mount (./data:/data) → CI has secret-resolution; a compromised job can read/exfiltrate it.
  • The runner is persistent (not ephemeral) → contamination carries between jobs; caps build-integrity at ~SLSA L2.
  • Jobs run with docker:host label.

Why this is the standout finding: the build pipeline is part of the supply chain here, and agents are in that pipeline. Host-root + secret-access from job code is the platform's single highest-value privilege-escalation surface. (Tailnet-gated + single-operator lowers the likelihood — no untrusted external job authors — but prompt-injection / malicious-dependency keep it live.)

Proposed hardening (Codex lane — runtime/critical-infra change = operator-gated)

  1. Remove or proxy the Docker socket. Prefer a socket-proxy (e.g. tecnativa/docker-socket-proxy) exposing only the minimal API the jobs actually need, or a rootless / sysbox runner. If raw socket is truly required for image builds, isolate it to a build-only runner.
  2. Scope the Infisical token tightly — least-privilege to only the paths CI needs, short TTL, ideally per-job rather than a long-lived token on a bind-mount.
  3. Separate build vs apply/deploy runners — the deploy runner (can mutate prod) must not share a trust boundary with the build runner (runs agent-authored job code).
  4. Workspace hygiene — wipe job workspace between runs; no broad host mounts beyond what's needed.
  5. Workflow-change review.forgejo/workflows/* changes require operator review (a CI change is a trust-boundary change).
  6. Ephemeral runners only if hygiene can't be made reliable (hygiene-first, ephemeral-later).

Acceptance

  • Runner no longer exposes the raw Docker socket to job code (removed, proxied, or isolated to a build-only runner).
  • Infisical token scoped least-privilege + short-TTL.
  • Build vs apply/deploy runner separation evaluated + documented.
  • Workflow-change-review expectation documented.
  • No runtime change applied without the operator gate.

Ties

  • Maturity roadmap Phase 5 (runner hygiene → practical SLSA-L2) — PR #674.
  • #673 (sandbox tier complements: contain job execution).
  • Found in the Phase-0 audit; first item of the post-audit round of fixes.

Authored by claude (audit/design lane). The HOW + the gated apply are Codex's lane.

## Finding (Phase-0 security audit, 2026-06-01) Live `infra/forgejo-runner/docker-compose.yml` on rs2000: - **`/var/run/docker.sock:/var/run/docker.sock` is mounted into the runner** → the runner (and any job it runs) has effectively **root on the host**. Since agents (claude/codex/…) author the code *and the workflows* that run on this runner, a prompt-injected agent or a malicious dependency in a job inherits host-root. - The runner **holds an Infisical token** (`INFISICAL_TOKEN_AUTH_FILE=/data/infisical-token-auth-token`) on a host bind-mount (`./data:/data`) → CI has secret-resolution; a compromised job can read/exfiltrate it. - The runner is **persistent** (not ephemeral) → contamination carries between jobs; caps build-integrity at ~SLSA L2. - Jobs run with `docker:host` label. **Why this is the standout finding:** the build pipeline *is* part of the supply chain here, and agents are in that pipeline. Host-root + secret-access from job code is the platform's single highest-value privilege-escalation surface. (Tailnet-gated + single-operator lowers the likelihood — no untrusted external job authors — but prompt-injection / malicious-dependency keep it live.) ## Proposed hardening (Codex lane — runtime/critical-infra change = operator-gated) 1. **Remove or proxy the Docker socket.** Prefer a **socket-proxy** (e.g. `tecnativa/docker-socket-proxy`) exposing only the minimal API the jobs actually need, or a **rootless / sysbox** runner. If raw socket is truly required for image builds, isolate it to a **build-only** runner. 2. **Scope the Infisical token tightly** — least-privilege to only the paths CI needs, short TTL, ideally per-job rather than a long-lived token on a bind-mount. 3. **Separate build vs apply/deploy runners** — the deploy runner (can mutate prod) must not share a trust boundary with the build runner (runs agent-authored job code). 4. **Workspace hygiene** — wipe job workspace between runs; no broad host mounts beyond what's needed. 5. **Workflow-change review** — `.forgejo/workflows/*` changes require operator review (a CI change *is* a trust-boundary change). 6. **Ephemeral runners** only if hygiene can't be made reliable (hygiene-first, ephemeral-later). ## Acceptance - [ ] Runner no longer exposes the raw Docker socket to job code (removed, proxied, or isolated to a build-only runner). - [ ] Infisical token scoped least-privilege + short-TTL. - [ ] Build vs apply/deploy runner separation evaluated + documented. - [ ] Workflow-change-review expectation documented. - [ ] No runtime change applied without the operator gate. ## Ties - Maturity roadmap **Phase 5** (runner hygiene → practical SLSA-L2) — PR #674. - #673 (sandbox tier complements: contain job execution). - Found in the Phase-0 audit; first item of the post-audit round of fixes. *Authored by claude (audit/design lane). The HOW + the gated apply are Codex's lane.*
Sign in to join this conversation.
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/platform#675
No description provided.