C1: SHA-pin tooling dependencies (2026-04 cycle) (#344)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m45s

## Summary

Monthly tooling dependency refresh, with a one-time conversion from version-tag pins (`rev = "vX.Y.Z"`, `image:tag`, `>=`) to SHA / digest pins everywhere.

## Changes

- **prek hooks**: all `rev = "vX.Y.Z"` → commit SHA + `# vX.Y.Z` comment. Bumped trufflehog (3.94.0→3.95.2), kingfisher (1.91.0→1.97.0), ruff (0.15.7→0.15.12), shfmt (3.13.0→3.13.1), prettier (3.8.1→3.8.3), actionlint (1.7.11→1.7.12).
- **fly/Dockerfile**: tag pins → `image@sha256:...` digest pins. Bumped nginx (1.29.6→1.30.0-alpine), tailscale (v1.94.1→v1.94.2 — still inside the safe pre-1.96.5 range), alloy (v1.14.1→v1.16.0).
- **mise-tasks**: PEP 723 inline deps converted from `>=` to `==` (PEP 508 doesn't support hashes inline). All scripts pinned to current latest: rich 15.0.0, typer 0.25.0, pyyaml 6.0.3, httpx 0.28.1.
- **prek `additional_dependencies`**: ansible-lint==26.4.0, ansible-core==2.20.5.
- **taplo-lint**: pass `--no-schema`. Upstream's `--default-schema-catalogs` returns a format taplo v0.9.3 can't parse — we don't validate against TOML schemas anyway, so this turns off the broken catalog fetch.
- **docs/update-tooling-dependencies**: documents the SHA-pin convention, `docker buildx imagetools inspect` for digest lookup, and `prek clean` before re-verifying (cache grows to several GiB).

Forgejo workflow `actions/checkout@v6.0.2` was already at the latest SHA — no change.

## Test plan

- [x] `prek run --all-files` passes after `prek clean`
- [x] `deploy-fly` workflow builds and deploys the new fly image on merge
- [x] `fly status -a blumeops-proxy` healthy after deploy
- [x] Spot-check a few mise tasks (`mise run blumeops-tasks`, `mise run docs-check-links`) to confirm pinned deps resolve cleanly

Reviewed-on: #344
This commit is contained in:
Erich Blume 2026-04-30 16:51:43 -07:00
commit f6e392b80c
29 changed files with 174 additions and 47 deletions

View file

@ -0,0 +1,108 @@
---
title: Rotate the Fly.io API Token
modified: 2026-04-30
last-reviewed: 2026-04-30
tags:
- how-to
- fly-io
- secrets
---
# Rotate the Fly.io API Token
How to rotate the Fly.io API token used to deploy [[flyio-proxy]]. The token lives in 1Password at `op://blumeops/fly.io admin/add more/deploy-token` and is consumed by [`mise run fly-deploy`](../../../mise-tasks/fly-deploy) and the `deploy-fly` Forgejo workflow (via the `FLY_DEPLOY_TOKEN` secret).
## When to rotate
- Every 75 days (Todoist recurring task)
- After any compromise / accidental disclosure
- If `fly deploy` starts returning auth errors
Fly.io tokens default to a 20-year expiry, but a short rotation cadence limits the blast radius of an undetected leak. Token expiry is set to **90 days** (longer than the rotation window), leaving a 15-day buffer if a rotation is delayed.
## Scope
Use **`fly tokens create org`**, not `deploy`.
| Scope | What it grants | Practical blast radius (this org) |
|-------|---------------|-----------------------------------|
| `deploy` | Manage one app and its resources | Same single-app surface as `org` for current setup |
| `org` | Manage one org and its resources | Adds: ability to create new apps (billing abuse) and read org-level metadata |
| `readonly` | Read one org | Not enough to deploy |
| Personal access token | Full account | Excessive |
The personal Fly org currently contains a single app (`blumeops-proxy`), so the marginal blast radius of `org` over `deploy` is small. The benefit of `org` is that `fly status` works without a `Metrics token unavailable: ... context canceled` warning. That warning happens because `fly status` always tries to fetch org-level metrics-token info, and an app-scoped `deploy` token can't query the org. The warning is benign but persistent and could mask a real future failure.
If a second Fly app is ever added to this org, reconsider — at that point the marginal scope cost of `org` grows.
## Procedure
### 1. Authenticate flyctl with the current token
```fish
fly auth login
```
(Browser-based. Required to mint a new token, since the existing deploy token can't create tokens.)
### 2. Mint the new token
```fish
fly tokens create org \
--org personal \
--name "blumeops-proxy deploy $(date +%Y-%m-%d)" \
--expiry 2160h
```
(`2160h` = 90 days, paired with the 75-day rotation cadence for a 15-day buffer. Capture the output — it's the only time the token is shown.)
### 3. Update 1Password
```fish
op item edit on5slfaygtdjrxmdwezyhfmqsq 'add more.deploy-token=<paste-new-token>' --vault vg6xf6vvfmoh5hqjjhlhbeoaie
```
### 4. Sync to Forgejo Actions
The `deploy-fly` workflow reads the same token from a Forgejo Actions secret named `FLY_DEPLOY_TOKEN`, populated by the `forgejo_actions_secrets` ansible role:
```fish
mise run provision-indri -- --tags forgejo_actions_secrets
```
### 5. Verify
```fish
mise run fly-deploy
```
A successful deploy confirms the new token works locally. Watch for the metrics-token warning — it should be **absent** with an `org`-scoped token. If still present, the rotation produced a `deploy`-scoped token by mistake.
Then trigger the CI workflow (push a no-op commit touching `fly/`, or dispatch manually) to confirm Forgejo Actions has the new secret.
### 6. Revoke the old token
```fish
fly tokens list
fly tokens revoke <old-token-id>
```
## Debugging
### `fly deploy` returns "unauthorized"
Token is invalid (expired, revoked, or wrong scope). Repeat the procedure.
### `Metrics token unavailable: ... context canceled` after rotation
The new token was created with `deploy` scope, not `org`. Either accept it (cosmetic) or re-mint with `fly tokens create org`.
### Forgejo Actions deploy fails but local works
The Forgejo secret wasn't synced. Re-run `mise run provision-indri -- --tags forgejo_actions_secrets` and confirm the secret value in Forgejo matches 1Password.
## Related
- [[flyio-proxy]] — Service reference card
- [[manage-flyio-proxy]] — Day-to-day operations and Tailscale auth-key rotation (separate 90-day rotation)
- [[expose-service-publicly]] — Full setup architecture

View file

@ -28,33 +28,45 @@ Out of scope: ArgoCD-deployed service images, Ansible role versions, NixOS flake
### 1. Check prek hook versions
For each repo in `prek.toml` with a `rev =` value, check the upstream GitHub releases page for a newer tag. Update each `rev` to the latest release tag. Also check `additional_dependencies` entries for PyPI version bumps.
Verify after updating:
For each repo in `prek.toml` with a `rev =` value, check the upstream GitHub releases page for a newer tag. Update each `rev` to the **commit SHA** of the latest release with a trailing `# vX.Y.Z` comment (matches the `additional_dependencies` and Forgejo workflow pinning style). Also check `additional_dependencies` entries for PyPI version bumps and pin them with `==`.
```fish
git ls-remote --tags https://github.com/<owner>/<repo>.git 'refs/tags/v*' | sort -t/ -k3 -V | tail -5
```
Clear the prek cache before verifying — it can grow to several GiB (one venv per hook per version) and old cached environments can mask resolution failures or stale catalogs:
```fish
prek clean
prek run --all-files
```
### 2. Check Fly.io Dockerfile pins
Review `fly/Dockerfile` for pinned image tags:
Review `fly/Dockerfile` for pinned image digests. Each `FROM` and `COPY --from=` uses `image@sha256:...` digest pinning with a comment line above documenting the human-readable version.
- **nginx** — check [Docker Hub](https://hub.docker.com/_/nginx) for latest stable alpine tag
- **grafana/alloy** — check [GitHub releases](https://github.com/grafana/alloy/releases)
- **tailscale/tailscale** — uses `stable` rolling tag, no action needed
- **tailscale/tailscale** — pinned to a known-good version. Do not bump to v1.96.5 or later (MagicDNS regression breaks the proxy boot)
To resolve a tag to a digest:
```fish
docker buildx imagetools inspect docker.io/<image>:<tag>
# Use the top-level "Digest:" line (multi-arch index) — not the per-platform sub-digest
```
After updating, the deploy-fly workflow will build and deploy on merge to main. Verify with `fly status -a blumeops-proxy` after deploy.
### 3. Normalize mise task dependency bounds
### 3. Pin mise task dependencies
Mise tasks use `uv run --script` with inline PEP 723 dependency metadata. Check that lower bounds are consistent across all scripts:
Mise tasks use `uv run --script` with inline PEP 723 dependency metadata. All packages are pinned with `==` (PEP 508 doesn't support hashes inline). Check that pinned versions are consistent across all scripts:
```fish
grep -r 'dependencies' mise-tasks/ | grep '# dependencies'
```
Ensure all scripts using the same package agree on the minimum version. When a package has a new major or breaking minor release, bump the lower bound across all scripts at once.
For each package in use (`httpx`, `rich`, `typer`, `pyyaml`), pick the latest PyPI version and update every script in lockstep — divergence between scripts is the failure mode this catches. Bump everything together; don't leave one script behind.
### 4. Pin Forgejo workflow action versions

View file

@ -76,6 +76,10 @@ The auth key expires every 90 days. To rotate:
2. Re-run setup to stage the new secret: `mise run fly-setup`
3. Deploy to pick up the new secret: `mise run fly-deploy`
## Rotate Fly.io API Token
See [[rotate-fly-deploy-token]] for the full rotation procedure (75-day cadence, `org`-scoped).
## Troubleshooting
**502 Bad Gateway on fresh deploy**: MagicDNS may not be ready when nginx starts. The `start.sh` script polls `nslookup` before launching nginx, but if it still fails, check that `tailscale status` is healthy inside the container.