C0: split gandi-operations docs; add dns-acme-cleanup mise task

Splits the nebulous gandi-operations how-to into two single-topic cards
(manage-eblu-me-dns, rotate-gandi-pat) and adds a mise task for the
recurring _acme-challenge TXT cleanup needed due to a value-comparison
bug in libdns/gandi v1.1.0 that prevents certmagic's cleanup phase from
removing presented TXT values.

The gandi reference card is updated to drop the false "different
credential from Pulumi PAT" claim — verified during the 2026-04-27
incident that Caddy and Pulumi share a single PAT.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Erich Blume 2026-04-27 09:48:46 -07:00
commit 005e2a03ed
10 changed files with 315 additions and 159 deletions

View file

@ -1,90 +0,0 @@
---
title: Gandi Operations
modified: 2026-02-17
last-reviewed: 2026-02-17
tags:
- how-to
- dns
- pulumi
---
# Gandi Operations
How to manage DNS records and cycle the Gandi API token.
## Prerequisites
- Pulumi CLI installed (`brew install pulumi`)
- Access to 1Password blumeops vault (for PAT)
- On the tailnet (Pulumi resolves indri's IP via MagicDNS)
## Preview and Apply DNS Changes
```bash
# Preview changes (always do this first)
mise run dns-preview
# Apply changes
mise run dns-up
```
Both tasks fetch the Gandi PAT from 1Password automatically.
To run Pulumi directly:
```bash
export GANDI_PERSONAL_ACCESS_TOKEN=$(op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/mco6ka3dc3rmw7zkg2dhia5d2m/pat")
cd pulumi/gandi
pulumi preview
pulumi up --yes
```
## Cycle the Gandi PAT
The Gandi Personal Access Token has a maximum lifetime of 90 days. Currently set to 30 days as a security compromise, though shorter may be appropriate given infrequent use.
### 1. Create a new PAT
Go to the [Gandi admin console](https://admin.gandi.net/organizations/1db8d76a-f729-11ed-b8d1-00163e94b645/account/pat) and create a new token:
- **Name:** `blumeops-pulumi` (or similar)
- **Expiration:** 30 days (max 90; shorter is fine if you run this rarely)
- **Required permission:** Manage domain name technical configurations
- **Also enable:** See and renew domain names
Copy the new PAT to your clipboard.
### 2. Update 1Password
With the new PAT on your clipboard:
```bash
op item edit mco6ka3dc3rmw7zkg2dhia5d2m pat="$(pbpaste)" --vault vg6xf6vvfmoh5hqjjhlhbeoaie
```
### 3. Delete the old PAT
Return to the Gandi admin console and delete the previous token.
### 4. Verify
```bash
mise run dns-preview
```
A successful preview confirms the new PAT is working.
## Break-Glass Override
If MagicDNS is unavailable and Pulumi can't resolve indri's IP, set the target IP manually. Find indri's current Tailscale IP via `tailscale status` or the admin console:
```bash
export BLUMEOPS_REVERSE_PROXY_IP=<indri-tailscale-ip>
mise run dns-up
```
## Related
- [[gandi]] - DNS configuration reference
- [[caddy]] - Reverse proxy (also uses a Gandi token for TLS)
- [[update-tailscale-acls]] - Similar Pulumi workflow for Tailscale

View file

@ -0,0 +1,52 @@
---
title: Manage eblu.me DNS Records
modified: 2026-04-27
last-reviewed: 2026-04-27
tags:
- how-to
- dns
- pulumi
---
# Manage eblu.me DNS Records
How to add, change, and apply DNS records for `eblu.me` via [[pulumi]].
## Prerequisites
- Pulumi CLI installed (`brew install pulumi`)
- 1Password access (`blumeops` vault) — Pulumi reads the Gandi PAT from there
- On the tailnet — Pulumi resolves [[indri]]'s IP via MagicDNS at apply time
## Preview and apply
```bash
mise run dns-preview # always do this first
mise run dns-up # apply
```
Both fetch the PAT from 1Password automatically. The Pulumi program is in `pulumi/gandi/`; stack is `eblu-me`.
## Adding a record
Edit `pulumi/gandi/__main__.py` and add a `gandi.livedns.Record(...)`. The stack config (`Pulumi.eblu-me.yaml`) only holds `domain` and `subdomain`; everything else is in the program.
After editing, preview, then apply.
## Break-glass: override the indri target IP
The wildcard `*.ops.eblu.me` is computed from `indri.tail8d86e.ts.net` via MagicDNS at apply time. If MagicDNS is unavailable:
```bash
export BLUMEOPS_REVERSE_PROXY_IP=<indri-tailscale-ip>
mise run dns-up
```
Find the IP via `tailscale status` or the Tailscale admin console.
## Related
- [[gandi]] — Gandi reference card
- [[rotate-gandi-pat]] — Rotate the PAT shared with [[caddy]]
- [[pulumi]] — Pulumi tooling reference
- [[routing]] — Service URLs and routing architecture

View file

@ -144,6 +144,6 @@ Trigger a manual sync on one mirror to confirm the new PAT works:
## Related ## Related
- [[forgejo]] — Forgejo service reference - [[forgejo]] — Forgejo service reference
- [[gandi-operations]] — Similar PAT rotation workflow for Gandi DNS - [[rotate-gandi-pat]] — Similar PAT rotation workflow for Gandi DNS
- [[spork-strategy]] — floating-branch soft-fork strategy explanation - [[spork-strategy]] — floating-branch soft-fork strategy explanation
- [[create-a-spork]] — create a spork on top of a mirror - [[create-a-spork]] — create a spork on top of a mirror

View file

@ -0,0 +1,125 @@
---
title: Rotate the Gandi PAT
modified: 2026-04-27
last-reviewed: 2026-04-27
tags:
- how-to
- dns
- secrets
---
# Rotate the Gandi PAT
How to rotate the Gandi Personal Access Token. **One PAT** is shared by [[caddy]] (TLS via ACME DNS-01) and Pulumi (DNS records). It lives in 1Password at `op://blumeops/gandi - blumeops/pat`.
## When to rotate
- Every 60 days (Todoist recurring task)
- After any compromise / accidental disclosure
- Whenever Gandi starts rejecting the PAT (see [Debugging](#debugging))
Gandi caps PAT lifetime at 90 days; rotating at 60 leaves a 30-day buffer.
## Prerequisites
- Access to the [Gandi PAT admin console](https://admin.gandi.net/organizations/1db8d76a-f729-11ed-b8d1-00163e94b645/account/pat)
- 1Password (`blumeops` vault)
- Ability to run `mise run provision-indri` (ssh to [[indri]] + 1Password biometric)
## Procedure
### 1. Create a new PAT in Gandi
In the [Gandi PAT console](https://admin.gandi.net/organizations/1db8d76a-f729-11ed-b8d1-00163e94b645/account/pat), create a token:
- **Name:** `blumeops`
- **Expiration:** **90 days** (the max — paired with the 60-day rotation cadence)
- **Permissions:**
- Manage domain name technical configurations *(required — DNS records and ACME TXT writes)*
- See and renew domain names
Other permissions are not used.
Copy the new PAT to your clipboard.
### 2. Update 1Password
```bash
op item edit mco6ka3dc3rmw7zkg2dhia5d2m pat="$(pbpaste)" --vault vg6xf6vvfmoh5hqjjhlhbeoaie
```
### 3. Push to indri
The PAT lives in two places: 1Password (read by Pulumi at runtime) and `~/.config/caddy/gandi-token` on indri (read by Caddy at startup). The 1Password edit only updates the first.
```bash
mise run provision-indri --tags caddy
```
This re-fetches the PAT from 1Password, writes it to indri, and restarts Caddy. Caddy will renew any due certificates within minutes.
### 4. Verify
```bash
mise run dns-preview
```
A successful preview confirms Pulumi can use the PAT.
```bash
ssh indri 'tail -50 ~/Library/Logs/mcquack.caddy.err.log' \
| grep -E "obtained|renew|error"
```
Expect to see no `LiveDNS returned a 403` lines, and either no renewal activity (if no certs were due) or `certificate obtained successfully`.
### 5. Delete the old PAT in Gandi
Return to the Gandi PAT console and delete the previous token.
### 6. Clean up orphan ACME records
Each successful Caddy renewal leaves orphan `_acme-challenge.ops` TXT records in the zone (a bug in `libdns/gandi` v1.1.0 — see the script docstring). Cadence aligns with rotation:
```bash
mise run dns-acme-cleanup --dry-run
mise run dns-acme-cleanup
```
## Debugging
### Caddy logs `LiveDNS returned a 403`
The PAT is invalid (expired, revoked, or insufficient scope). **Gandi returns 403 — not 401 — for an expired PAT**, which can read as a permissions issue. The most common cause is plain expiry. Rotate.
### `mise run dns-preview` returns 403
Same root cause — Pulumi and Caddy share this PAT.
### After a fresh PAT, Caddy still fails
Check that the value on indri matches 1Password:
```bash
diff <(ssh indri 'cat ~/.config/caddy/gandi-token') \
<(op read 'op://blumeops/gandi - blumeops/pat')
```
If they differ, `mise run provision-indri --tags caddy` was skipped or failed.
Confirm the new PAT works against Gandi directly:
```bash
curl -s -o /dev/null -w "HTTP %{http_code}\n" \
-H "Authorization: Bearer $(op read 'op://blumeops/gandi - blumeops/pat')" \
https://api.gandi.net/v5/livedns/domains/eblu.me
```
`200` = healthy. `403` = scope or expiry. `401` = malformed token.
## Related
- [[gandi]] — Gandi reference card
- [[manage-eblu-me-dns]] — DNS records workflow (separate operation, same PAT)
- [[caddy]] — Reverse proxy that uses the PAT for TLS
- [[mise-tasks]] — `dns-acme-cleanup`, `provision-indri`, `dns-preview` reference

View file

@ -1,7 +1,7 @@
--- ---
title: Gandi title: Gandi
modified: 2026-04-09 modified: 2026-04-27
last-reviewed: 2026-04-09 last-reviewed: 2026-04-27
tags: tags:
- infrastructure - infrastructure
- networking - networking
@ -20,12 +20,11 @@ DNS hosting provider for the `eblu.me` domain, managed via Pulumi IaC.
| **Provider** | Gandi LiveDNS | | **Provider** | Gandi LiveDNS |
| **IaC** | `pulumi/gandi/` | | **IaC** | `pulumi/gandi/` |
| **Stack** | `eblu-me` | | **Stack** | `eblu-me` |
| **PAT** | `op://blumeops/gandi - blumeops/pat` |
## What It Does ## What It Does
Gandi hosts the DNS records that make `*.ops.eblu.me` resolve to [[indri]]'s Tailscale IP (`indri.tail8d86e.ts.net`). Since Tailscale IPs are not publicly routable, this gives services real DNS names while keeping them private to the tailnet. Gandi hosts the DNS records that make `*.ops.eblu.me` resolve to [[indri]]'s Tailscale IP. Since Tailscale IPs are not publicly routable, this gives services real DNS names while keeping them private to the tailnet. The target IP is resolved dynamically from `indri.tail8d86e.ts.net` at deploy time.
The target IP is resolved dynamically from `indri.tail8d86e.ts.net` at deploy time, so if indri's Tailscale IP changes, re-running the deployment is sufficient.
## DNS Records ## DNS Records
@ -46,38 +45,25 @@ Both records point to [[indri]], which runs [[caddy]] as the reverse proxy for a
| `cv.eblu.me` | CNAME | `blumeops-proxy.fly.dev` | 300s | | `cv.eblu.me` | CNAME | `blumeops-proxy.fly.dev` | 300s |
| `forge.eblu.me` | CNAME | `blumeops-proxy.fly.dev` | 300s | | `forge.eblu.me` | CNAME | `blumeops-proxy.fly.dev` | 300s |
Public CNAMEs point to [[flyio-proxy]] on Fly.io. See [[expose-service-publicly]] for adding new public services. Public CNAMEs point to [[flyio-proxy]] on Fly.io. See [[expose-service-publicly]] for adding new public services. See [[routing]] for the full service URL map.
See [[routing]] for the full service URL map.
## Pulumi Configuration
The Pulumi program lives in `pulumi/gandi/`:
- `__main__.py` - Creates A and CNAME records via `pulumiverse_gandi`
- `Pulumi.eblu-me.yaml` - Stack config (domain, subdomain)
Stack config values:
| Key | Value |
|-----|-------|
| `blumeops-dns:domain` | `eblu.me` |
| `blumeops-dns:subdomain` | `ops` |
A break-glass override is available via the `BLUMEOPS_REVERSE_PROXY_IP` environment variable, which bypasses dynamic IP resolution.
## TLS Integration ## TLS Integration
[[caddy]] uses Gandi's API separately (via `GANDI_BEARER_TOKEN`) for ACME DNS-01 challenges to obtain a wildcard Let's Encrypt certificate for `*.ops.eblu.me`. This is a different credential from the Pulumi PAT. [[caddy]] uses this same Gandi PAT for ACME DNS-01 challenges to obtain a wildcard Let's Encrypt certificate for `*.ops.eblu.me`. Caddy reads the PAT from `~/.config/caddy/gandi-token` on [[indri]], populated by ansible from 1Password.
## Authentication ## Authentication
Gandi requires a Personal Access Token (PAT) for API access. PATs have a maximum lifetime of 90 days (currently set to 30). See [[gandi-operations]] for deployment and PAT cycling instructions. One Gandi Personal Access Token, shared by Pulumi and Caddy. Gandi caps PATs at 90 days; rotate every 60 days via [[rotate-gandi-pat]].
## ACME Challenge Cleanup
Caddy's renewal flow leaves `_acme-challenge.ops` TXT orphans in the zone — a value-comparison bug in `libdns/gandi` v1.1.0 makes the cleanup phase a no-op. Run `mise run dns-acme-cleanup` periodically (alongside PAT rotation works well).
## Related ## Related
- [[gandi-operations]] - PAT cycling and deployment how-to - [[manage-eblu-me-dns]] — Add/change DNS records via Pulumi
- [[routing]] - Service URLs and routing architecture - [[rotate-gandi-pat]] — Rotate the shared Gandi PAT
- [[caddy]] - Reverse proxy using Gandi for TLS - [[routing]] — Service URLs and routing architecture
- [[tailscale]] - Tailnet networking - [[caddy]] — Reverse proxy using this PAT for TLS
- [[indri]] - Server hosting Caddy (DNS target) - [[tailscale]] — Tailnet networking
- [[indri]] — Server hosting Caddy (DNS target)

View file

@ -39,6 +39,7 @@ Run `mise tasks --sort name` for the live list with descriptions.
| `fly-shutoff` | Emergency shutoff: stop all Fly.io proxy machines | | `fly-shutoff` | Emergency shutoff: stop all Fly.io proxy machines |
| `dns-preview` | Preview DNS changes with [[pulumi]] | | `dns-preview` | Preview DNS changes with [[pulumi]] |
| `dns-up` | Apply DNS changes with [[pulumi]] | | `dns-up` | Apply DNS changes with [[pulumi]] |
| `dns-acme-cleanup` | Delete orphaned `_acme-challenge.ops` TXT records (libdns/gandi v1.1.0 workaround) |
| `tailnet-preview` | Preview Tailscale ACL changes with [[pulumi]] | | `tailnet-preview` | Preview Tailscale ACL changes with [[pulumi]] |
| `tailnet-up` | Apply Tailscale ACL changes with [[pulumi]] | | `tailnet-up` | Apply Tailscale ACL changes with [[pulumi]] |

View file

@ -49,7 +49,8 @@ mise run tailnet-up # Apply ACL/tag changes
## Related ## Related
- [[gandi-operations]] — DNS PAT rotation and Pulumi workflow - [[manage-eblu-me-dns]] — DNS records workflow
- [[rotate-gandi-pat]] — Rotate the Gandi PAT
- [[update-tailscale-acls]] — ACL editing and Pulumi workflow - [[update-tailscale-acls]] — ACL editing and Pulumi workflow
- [[gandi]] — DNS hosting - [[gandi]] — DNS hosting
- [[tailscale]] — Tailnet configuration - [[tailscale]] — Tailnet configuration

112
mise-tasks/dns-acme-cleanup Executable file
View file

@ -0,0 +1,112 @@
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.12"
# dependencies = ["httpx>=0.28.1", "rich>=14.0.0", "typer>=0.24.0"]
# ///
#MISE description="Delete orphaned ACME challenge TXT records in eblu.me"
#USAGE flag "--dry-run" help="List orphans without deleting"
"""Clean up orphaned _acme-challenge TXT records in the eblu.me zone.
Workaround for libdns/gandi v1.1.0: its DeleteRecords compares unquoted
certmagic values to Gandi-quoted stored values, so cleanup is a silent
no-op. Without this script, the rrset grows by ~2 values per successful
Caddy renewal cycle.
In healthy steady state these records should be absent. Run alongside
PAT rotation, or any time after Caddy ACME activity.
"""
import os
import subprocess
from typing import Annotated
import httpx
import typer
from rich.console import Console
from rich.table import Table
DOMAIN = "eblu.me"
RRSET = "_acme-challenge.ops"
GANDI_API = "https://api.gandi.net/v5/livedns"
OP_PAT_REF = "op://blumeops/gandi - blumeops/pat"
def resolve_token(console: Console) -> str:
env_token = os.environ.get("GANDI_PERSONAL_ACCESS_TOKEN", "").strip()
if env_token:
return env_token
console.print("[dim]Reading Gandi PAT from 1Password...[/dim]")
try:
result = subprocess.run(
["op", "read", OP_PAT_REF],
capture_output=True,
text=True,
check=True,
)
return result.stdout.strip()
except (subprocess.CalledProcessError, FileNotFoundError) as e:
console.print(f"[red]Failed to read PAT from 1Password:[/red] {e}")
raise typer.Exit(1)
app = typer.Typer(add_completion=False)
@app.command()
def main(
dry_run: Annotated[
bool,
typer.Option("--dry-run", help="List orphans without deleting"),
] = False,
) -> None:
"""Delete orphan _acme-challenge TXT records in eblu.me."""
console = Console()
token = resolve_token(console)
url = f"{GANDI_API}/domains/{DOMAIN}/records/{RRSET}/TXT"
headers = {"Authorization": f"Bearer {token}"}
with httpx.Client(timeout=15, headers=headers) as client:
resp = client.get(url)
if resp.status_code == 404:
console.print(
f"[green]Clean — {RRSET}.{DOMAIN} TXT rrset is absent.[/green]"
)
raise typer.Exit(0)
resp.raise_for_status()
values = resp.json().get("rrset_values", [])
if not values:
console.print(
f"[green]Clean — {RRSET}.{DOMAIN} TXT rrset is empty.[/green]"
)
raise typer.Exit(0)
table = Table(title=f"Orphan ACME challenge values: {RRSET}.{DOMAIN}")
table.add_column("#", justify="right")
table.add_column("Value")
for i, v in enumerate(values, 1):
table.add_row(str(i), v)
console.print(table)
console.print(f"\n[bold]{len(values)}[/bold] orphan(s).")
if dry_run:
console.print("\n[dim]Dry run — no records deleted.[/dim]")
raise typer.Exit(0)
del_resp = client.delete(url)
if del_resp.status_code == 204:
console.print(
f"[green]Deleted {RRSET}.{DOMAIN} TXT "
f"({len(values)} values).[/green]"
)
else:
console.print(
f"[red]Delete failed: HTTP {del_resp.status_code}[/red]\n"
f"{del_resp.text[:300]}"
)
raise typer.Exit(1)
if __name__ == "__main__":
app()

View file

@ -27,50 +27,19 @@ pulumi stack select eblu-me # or: pulumi stack init eblu-me
## Authentication ## Authentication
This project requires a Gandi Personal Access Token (PAT) with LiveDNS permissions. This project uses a Gandi Personal Access Token (PAT) shared with Caddy. See the [Gandi reference card](../../docs/reference/infrastructure/gandi.md) and [Rotate the Gandi PAT](../../docs/how-to/configuration/rotate-gandi-pat.md).
**The PAT expires every 30 days and must be cycled manually.** The mise tasks handle fetching the PAT from 1Password:
### Cycling the PAT
1. Go to [Gandi PAT Management](https://admin.gandi.net/organizations/1db8d76a-f729-11ed-b8d1-00163e94b645/account/pat)
2. Create a new PAT:
- Name: `blumeops-pulumi` (or similar)
- Expiration: 30 days (maximum is 90; shorter is fine if used rarely)
- Permissions required:
- **Manage domain name technical configurations** (required for DNS records)
- See and renew domain names
- Optional permissions (enabled but not strictly required):
- See & download SSL certificates
- Manage Cloud resources
- See Cloud resources
- View Organization
- Deploy Web Hosting instances
- Manage Web Hosting instances
- See and renew Web Hosting instances
3. Update 1Password:
```bash
# Update the existing item with the new PAT value
op item edit mco6ka3dc3rmw7zkg2dhia5d2m pat="<NEW_PAT_VALUE>" --vault vg6xf6vvfmoh5hqjjhlhbeoaie
```
4. Delete the old PAT from Gandi admin console
### Running with Authentication
The mise task handles fetching the PAT from 1Password:
```bash ```bash
mise run dns-up # Preview and apply changes
mise run dns-preview # Preview only mise run dns-preview # Preview only
mise run dns-up # Preview and apply
``` ```
Or manually: Or manually:
```bash ```bash
export GANDI_PERSONAL_ACCESS_TOKEN=$(op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/mco6ka3dc3rmw7zkg2dhia5d2m/pat") export GANDI_PERSONAL_ACCESS_TOKEN=$(op read "op://blumeops/gandi - blumeops/pat")
pulumi up pulumi up
``` ```

View file

@ -8,7 +8,7 @@ This program manages DNS records for blumeops infrastructure:
Authentication: Authentication:
Set GANDI_PERSONAL_ACCESS_TOKEN environment variable. Set GANDI_PERSONAL_ACCESS_TOKEN environment variable.
See docs/how-to/gandi-operations.md for PAT management instructions. See docs/how-to/configuration/rotate-gandi-pat.md for PAT management.
""" """
import os import os