2026-02-03 18:51:57 -08:00
---
2026-02-07 21:44:57 -08:00
title: Tailscale Operator
Localize the Tailscale operator stack (k8s-operator + indri ProxyClass) (#374)
Weekly non-local-container task: localize the Tailscale operator stack on **both clusters**.
## What
- **`containers/tailscale-operator/`** (new) — builds `cmd/k8s-operator` v1.94.2 from the forge mirror, mirroring upstream's mkctr recipe (`/usr/local/bin/operator`, `ts_kube,ts_package_container` go tags, version stamps). `container.py` (dagger) for indri/arm64; `default.nix` for ringtail/amd64.
- **`containers/tailscale/container.py`** (new) — dagger/arm64 build of the proxy image (containerboot), mirroring the upstream Dockerfile (iptables-legacy symlinks, `/tailscale/run.sh` compat). Ringtail already consumes the existing nix build; this completes parity for indri.
- **Version pinned at v1.94.2** (same as currently deployed) — this PR is a pure supply-chain swap, no version change. v1.96.x is avoided deliberately (MagicDNS-in-containers regression).
- Docs-first: tailscale-operator card gains **Local Images** and **Rollout Safety** sections.
## Rollout plan (after image builds)
1. Manifest commit: per-overlay `images:` override for the operator + ProxyClass strategic-merge patch on indri (kustomize `images:` can't touch CR fields).
2. `argocd app set tailscale-operator --revision <branch> && argocd app sync` — indri first, verify, then ringtail.
3. **Shadow-device safety**: device identity lives in the tailscale state Secrets; an image swap re-uses existing node keys, so no `-1` clones. State Secrets are not touched. Post-sync verification: pod health, device names unchanged, `mise run services-check`.
## Follow-ups (not this PR)
- `dnsconfig` nameserver image (`tailscale/k8s-nameserver:stable`) still upstream.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/374
2026-06-09 17:45:23 -07:00
modified: 2026-06-09
last-reviewed: 2026-06-09
2026-02-03 18:51:57 -08:00
tags:
- kubernetes
- tailscale
---
# Tailscale Kubernetes Operator
The Tailscale operator enables Kubernetes services to be exposed directly on the Tailscale network via Ingress resources.
## Quick Reference
| Property | Value |
|----------|-------|
| **Namespace ** | `tailscale` |
2026-06-08 07:00:48 -07:00
| **Upstream ** | `mirrors/tailscale` on forge (static manifest, pinned `v1.94.2` ) |
| **ArgoCD Apps ** | `tailscale-operator` (indri/minikube), `tailscale-operator-ringtail` (ringtail/k3s) |
The operator runs on **both ** clusters — indri's minikube and ringtail's k3s.
Both apps layer on the shared `tailscale-operator-base` kustomize directory
(operator manifest, `ProxyClass` , `dnsconfig` ); each cluster supplies its own
Localize the Tailscale operator stack (k8s-operator + indri ProxyClass) (#374)
Weekly non-local-container task: localize the Tailscale operator stack on **both clusters**.
## What
- **`containers/tailscale-operator/`** (new) — builds `cmd/k8s-operator` v1.94.2 from the forge mirror, mirroring upstream's mkctr recipe (`/usr/local/bin/operator`, `ts_kube,ts_package_container` go tags, version stamps). `container.py` (dagger) for indri/arm64; `default.nix` for ringtail/amd64.
- **`containers/tailscale/container.py`** (new) — dagger/arm64 build of the proxy image (containerboot), mirroring the upstream Dockerfile (iptables-legacy symlinks, `/tailscale/run.sh` compat). Ringtail already consumes the existing nix build; this completes parity for indri.
- **Version pinned at v1.94.2** (same as currently deployed) — this PR is a pure supply-chain swap, no version change. v1.96.x is avoided deliberately (MagicDNS-in-containers regression).
- Docs-first: tailscale-operator card gains **Local Images** and **Rollout Safety** sections.
## Rollout plan (after image builds)
1. Manifest commit: per-overlay `images:` override for the operator + ProxyClass strategic-merge patch on indri (kustomize `images:` can't touch CR fields).
2. `argocd app set tailscale-operator --revision <branch> && argocd app sync` — indri first, verify, then ringtail.
3. **Shadow-device safety**: device identity lives in the tailscale state Secrets; an image swap re-uses existing node keys, so no `-1` clones. State Secrets are not touched. Post-sync verification: pod health, device names unchanged, `mise run services-check`.
## Follow-ups (not this PR)
- `dnsconfig` nameserver image (`tailscale/k8s-nameserver:stable`) still upstream.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/374
2026-06-09 17:45:23 -07:00
`ProxyGroup` (indri: 2 replicas, ringtail: 1) and OAuth `ExternalSecret` . See
[[ringtail]] and [[migrate-wave1-ringtail]] for the ongoing migration of k8s
workloads onto ringtail.
## Local Images
Both the operator and the proxy run locally-built images from the forge
mirror (`mirrors/tailscale` ), not Docker Hub:
| Image | Build | Used by |
|-------|-------|---------|
| `blumeops/tailscale-operator` | `containers/tailscale-operator/` (`container.py` for indri/arm64, `default.nix` `-nix` tag for ringtail/amd64) | operator Deployment, via each overlay's `images:` override |
| `blumeops/tailscale` | `containers/tailscale/` (same dual build) | `ProxyClass` proxy pods, via a strategic-merge patch in each overlay |
The ProxyClass image must be set with a **patch ** , not kustomize's `images:`
directive — that directive only rewrites standard container fields, not
custom-resource fields like `ProxyClass.spec.statefulSet.pod.tailscaleContainer.image` .
The `dnsconfig` nameserver image (`tailscale/k8s-nameserver:stable` ) is still
upstream — a known follow-up.
## Rollout Safety (device identity)
Proxy and operator tailnet identity lives in Kubernetes state Secrets in the
`tailscale` namespace, not in pods or images. An image swap rolls the
Deployment/StatefulSets but pods re-authenticate with their existing node
keys — devices keep their names. Shadow devices (`foo-1` suffixes) appear only
when a pod registers * fresh * while a stale device record still holds the name
(deleted state Secrets, cluster rebuilds). When rolling out image changes:
1. Never delete the `tailscale` namespace state Secrets.
2. Verify after sync: pods healthy, device names unchanged in the admin
console, `mise run services-check` green.
3. If a collision does occur: delete the stale device in the admin console
AND the affected state Secret, then restart the pod (see
[[rebuild-minikube-cluster]]).
2026-02-03 18:51:57 -08:00
## How It Works
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly
## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.
## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards
## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
Ingresses use a shared ProxyGroup (`ingress` ) rather than per-service Tailscale nodes. When you create an Ingress with `ingressClassName: tailscale` :
2026-02-03 18:51:57 -08:00
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly
## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.
## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards
## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
1. Operator configures the shared ProxyGroup pods to serve the new Ingress
2. Service gets a VIP (Virtual IP) address on the tailnet
3. Service becomes accessible at `<hostname>.tail8d86e.ts.net`
4. TLS is handled automatically via Tailscale
2026-06-08 07:00:48 -07:00
Two requirements for VIP routing to work:
1. Tailnet clients must have `--accept-routes` enabled to route to VIP addresses.
2. Ingress rules must **not ** set an explicit `host:` field. The ProxyGroup
proxy receives the FQDN as the `Host` header (e.g.
`prometheus.tail8d86e.ts.net` ), which won't match a short name. Use
`host: "*"` or omit `host:` entirely.
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly
## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.
## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards
## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
Services can be individually tagged (e.g., `tag:flyio-target` ) via Ingress annotations to control which ACL grants apply. See [[expose-service-publicly]] for the tagging workflow.
2026-02-03 18:51:57 -08:00
## Limitations
Services exposed via Tailscale Ingress are **not accessible ** from:
- Other Kubernetes pods (they're not Tailscale clients)
- Docker containers on indri
2026-02-04 17:21:34 -08:00
For pod-to-service communication, use [[routing|Caddy]] (`*.ops.eblu.me` ) instead.
2026-02-03 18:51:57 -08:00
## Related
- [[tailscale]] - Network configuration
- [[routing]] - Service routing options
- [[apps]] - Application registry