ops(secrets): migrate deploy-runner from direct PAT to Infisical Token Auth on machine identity #265
Labels
No labels
W6d-automerge-calibration
agent/claude-code
agent/codex
agent/hermes
agent/iskra
agent/ollama
agent/patchwarden
automerge-candidate
class/security-sensitive
cutover-gate
dependency/blocked
dependency/blocks-others
dependency/cross-repo
dependency/needs-confirmation
domain:agents
domain:ci
domain:docs
domain:forgejo
domain:infra
domain:memory
domain:runtime
domain:signal
domain:ux
flow/architecture
flow/blocked
flow/deployed
flow/done
flow/implementation
flow/intake
flow/maintained
flow/observed
flow/ready
flow/refining
flow/retired
flow/review
iterating
judge/codex-candidate
judge/hermes-candidate
judge/low-confidence
judge/needs-refinement
judge/operator-needed
judge/p0
judge/p1
judge/p2
judge/p3
judge/park
judge/patchwarden-candidate
judge/stale-priority
kind/adr
kind/bug
kind/chore
kind/feature
kind/infra
kind/ops
kind/refactor
kind/research
large-impact
merge/auto
merge/manual
merge/manual-dependency-conflict
merge/manual-failing-tests
merge/manual-merge-conflict
merge/manual-missing-review
merge/manual-operator-preference
merge/manual-red-zone
merge/manual-security-sensitive
merge/manual-unclear-scope
merge/manual-unknown
meta
mode:operator-only
mode:patchwarden-iskra-approved
mode:safe-auto
needs-operator-decision
needs-triage
not-ready
observed/erroring
observed/needs-followup
observed/pending
observed/retire-candidate
observed/unused
observed/used
operator-emotional
owner-attention
phase/02
phase/03
priority:p0
priority:p1
priority:p2
priority:p3
proposed
ready-for-agent
ready-for-operator
recovery
review:claude-reviewed
review:codex-reviewed
review:dziadek-reviewed
review:needs-human
risk/exposure
risk/process
risk/product
risk/runtime
safety:external-write
safety:no-prod-mutation
safety:prod-impact
safety:secret-touch
size/large
size/medium
size/small
size/tiny
size/unknown
source/adr
source/agent-generated
source/manual
source/operator-chat
source/voice-note
status:blocked
status:codex-ready
status:merged:pending-evidence
status:needs-evidence
status:operator-needed
status:parked
tier/full
tier/lite
tier/stacked
tier:0-platform-substrate
tier:1-iskra-value-layer
tier:2-tools-products-modules
type:bug
type:chore
type:docs
type:feat
type:policy
type:research
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
pdurlej/platform#265
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
The RS2000 deploy runner currently holds a direct codex PAT in
/var/lib/forgejo-deploy-runner/platformctl-deploy.envas a temporary state, per Pan Herbatka Recovery Plan § 3 Call 3 (PR #250 merged) and the agreed cutover trade-off: ship cutover first, fix auth method later.Cutover ⚛️ achieved 2026-05-13 (run #621). Time to close the temporary state.
Operator preference (2026-05-13 evening, supersedes Pan Herbatka earlier recommendation)
Operator chooses Token Auth as primary path. Universal Auth available as fallback if Token Auth proves operationally problematic; Pan Herbatka's earlier framing of "Token Auth is wrong method" is withdrawn — root cause of morning failure was likely ACL/scope on existing token, not method class.
What we know about morning Token Auth attempt (2026-05-13)
2cf935a4-b0d1-45b8-97f5-3957db7e5ee0had Token Auth method configured403 Forbiddenfor/api/v3/secrets/rawand/api/v4/secrets/{name}Likely root cause: ACL/permission scope on the machine identity needs proper configuration for the target secret path (
/home-platform/forgejo_accounts/p+codex@durlej.me). The 403 wasn't "wrong method" — it was "correct method, missing read permission on this secret path".Evidence:
state/codex-prep/rs2000-closeout-handover-2026-05-12.md§ What failed > 1.Target state — Token Auth on machine identity
Machine identity (operator provisioned, scoped):
2cf935a4-b0d1-45b8-97f5-3957db7e5ee024324af9-adb3-4604-a7f2-d37243d76204prod/home-platform/forgejo_accounts/p+codex@durlej.meFlow (per
docs/ci/runner-contract.md+ existingscripts/forgejo/deploy-runner-install-infisical-token-auth):/home-platform/forgejo_accounts(or narrower if Infisical supports per-secret ACL)' "" | scripts/forgejo/deploy-runner-install-infisical-token-auth
ops(secrets): migrate deploy-runner from direct PAT to Infisical Universal Auth machine identityto ops(secrets): migrate deploy-runner from direct PAT to Infisical Token Auth on machine identityInfisical Token Auth — installed, verification smoke blocked — 2026-05-14 09:40 CEST
Status: installed on RS2000; soak not started yet
Evidence
/var/lib/forgejo-deploy-runner/infisical-token-auth-token600 forgejo-deploy:forgejo-deploy 333 bytesFindings
https://infisical.pdurlej.comreturns HTTP 403 from RS2000, while the same token works from the Mac and through the local Infisical container endpoint on RS2000.apply.pyprefers direct PAT before Infisical when both are configured, so a smoke before #273 would be a false positive.Open PRs before smoke retry
platformctl applyprefer Infisical Token Auth before direct PAT and add source markers.docs/ci/runner-contract.md.STATUS_NOW.mdso the operator sees the real next gate.Next
After #273 merges:
platformctl-auto-apply.ymlwithmodule=matrix-well-known;forgejo_token_source=infisical-token-authin logs;Issue #265 should remain open until soak passes and the direct PAT is removed.
Infisical Token Auth — installed + verified — 2026-05-14 09:44 CEST
Status: soak period start
Evidence
/var/lib/forgejo-deploy-runner/infisical-token-auth-token, mode0600, ownerforgejo-deploy:forgejo-deploy, size333 bytes; runner active.apply.pynow prefers Infisical Token Auth over direct PAT after #273; direct PAT remains as explicit soak fallback.matrix-well-known, run #729/API #869, success, commitcec1037b511b544f0c7db5ee4ad51bd2ad73e582.forgejo_token_source=infisical-token-authtwice; nodirect-env-fallbackmarker observed.Artifact summary
auto-apply-scope.json: allowedmatrix-well-known, no blocked modules.matrix-well-known.plan.stdout.json:status=in-sync,exitCode=0.matrix-well-known.apply.stdout.json:status=noop,exitCode=0,approved_pr=273.matrix-well-known.health.json:status=OK,exitCode=0.state/modules/matrix-well-known.status.json:status=noop,exitCode=0.Token tracking
Next
forgejo_token_source=infisical-token-auth.Infisical Token Auth soak — 3/3 no-op smokes via Infisical path — 2026-05-14 09:55 CEST
Status: soak evidence threshold met for day 1; direct PAT still preserved until 2026-05-21+
Smoke evidence
matrix-well-known— run #729/API #869forgejo_token_source=infisical-token-authtwicein-sync, exitCode 0noop, exitCode 0OK, exitCode 0dashboard— run #736/API #876forgejo_token_source=infisical-token-authtwicein-sync, exitCode 0noop, exitCode 0OK, exitCode 0matrix-well-known— run #737/API #877forgejo_token_source=infisical-token-authtwicein-sync, exitCode 0noop, exitCode 0OK, exitCode 0No
forgejo_token_source=direct-env-fallbackmarker was observed in these runs.Runtime sanity
Separate reliability finding
waitingwithtask_id=0and required restarting onlyforgejo-deploy-runner.service.Soak state
Codex live audit: deploy-runner Token Auth and canary bridge are separate — 2026-05-18 16:56 CEST
This issue has earlier comments showing deploy-runner Token Auth install/soak evidence. Current live state shows a separate blocker remains for the canary path:
PLATFORMCTL_CANARY_MODE=infisical-machineandPLATFORMCTL_CANARY_ENV=/data/platformctl/canary.env./data/platformctl/canary.envis missing.401 Invalid credentials.Recommendation: keep the deploy-runner token migration evidence, but do not treat it as proof that the canary/provider secrets bridge is solved. The canary bridge needs its own machine identity/readiness proof before
canary-requiredcan become real 3+3 enforcement.W4a read-only evidence is now in PR #436.
Summary, metadata-only:
platformctl-auto-apply.ymllogs checked over the last 48h:16;forgejo_token_source=infisical-token-auth:16logs;forgejo_token_source=direct-env-fallback:0logs;600, ownerforgejo-deploy:forgejo-deploy, size333bytes;PLATFORMCTL_FORGEJO_TOKEN.Conclusion: Infisical Token Auth is live and used, but the direct fallback still exists. The recommended next gate is
w4a-remove-direct-pat-approvedfor a narrow backup/remove/restart/smoke/rollback step. No secret values were read or printed.W4a direct PAT fallback removed — 2026-05-24 19:14 CEST
Role: executor
Status: runtime removal complete; report PR opened: #437
Evidence
w4a-remove-direct-pat-approvedPLATFORMCTL_FORGEJO_TOKEN/var/lib/forgejo-deploy-runner/platformctl-deploy.env.w4a-20260524T170930Z.bakforgejo-deploy-runner.serviceplatformctl-auto-apply.ymlrun #1948 formatrix-well-known→successforgejo_token_source=infisical-token-auth000Next
Merge #437 to record the evidence in-repo. This issue can close after that report merges. #274 remains open for token rotation tracking.