Commit graph

447 commits

Author SHA1 Message Date
07fb48626d Add Authentik SSO integration for Jellyfin (#239)
## Summary
- Add Authentik OIDC provider + application for Jellyfin via blueprint (all authenticated users allowed, no policy binding)
- Wire `jellyfin-client-secret` through ExternalSecret and Authentik worker deployment
- Install [jellyfin-plugin-sso](https://github.com/9p4/jellyfin-plugin-sso) v4.0.0.3 via Ansible, with OIDC config template
- Authentik `admins` group maps to Jellyfin administrator role
- Local login left enabled; SSO is additive

## Deployment and Testing
- [ ] Sync ArgoCD `authentik` app on branch — verify provider + application appear in Authentik admin
- [ ] `mise run provision-indri -- --tags jellyfin --check --diff` (dry run)
- [ ] `mise run provision-indri -- --tags jellyfin` (deploy plugin + config)
- [ ] Test SSO flow: `https://jellyfin.ops.eblu.me/sso/OID/start/authentik`
- [ ] Verify `eblume` account auto-links via `preferred_username` match
- [ ] Verify admins group → Jellyfin admin
- [ ] Reset ArgoCD app revision to main after merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/239
2026-02-21 20:05:44 -08:00
e1c2892878 Fix container tags deleted during old-tag cleanup
Five container manifests were removed when deleting old-style tags
(shared digests). Rebuild on a72a0d8 and update references.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 16:26:29 -08:00
a72a0d8e8e Update all container images to new upstream-version tagging scheme (#238)
## Summary
- Updates all 15 container image references across 14 ArgoCD manifest files
- Migrates from old internal `vX.Y.Z` tags to new `v<upstream-version>-<sha>` format
- Covers: authentik, cv, devpi, forgejo-runner, homepage, kiwix-serve, kubectl, miniflux, navidrome, ntfy, quartz, teslamate, transmission

## Deployment and Testing
- [ ] Sync all ArgoCD apps on branch revision
- [ ] Verify all services come up healthy
- [ ] Merge and re-sync on main
- [ ] Clean up old-style tags from zot registry

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/238
2026-02-21 15:58:11 -08:00
55d31c9c0b Docs pass: update zot Mikado chain for completion
- harden-zot-registry: fix Authentik hostname, check off all
  verified items, add metrics config to "what was done"
- enforce-tag-immutability: fix admins permissions (was missing
  update)
- agent-change-process: clarify that requires: is permanent and
  status: active is the only completion marker
- zot reference: update modified date
- wire-ci-registry-auth fragment: add metrics fix
- Remove stale harden-zot-mikado-cards.ai.md planning fragment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 15:32:34 -08:00
8775c3841a Allow anonymous access to zot /metrics endpoint
Add accessControl.metrics.users with empty string to allow
unauthenticated Prometheus/Alloy scraping. Zot represents
anonymous users with an empty username internally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 12:37:59 -08:00
ff63679efb Enable zot registry auth + wire CI credentials (#237)
## Summary

- Enable OIDC + API key authentication on zot registry with three-tier accessControl
  - `anonymousPolicy: ["read"]` — anyone can pull
  - `artifact-workloads` group: `["read", "create"]` — CI push, no overwrite/delete
  - `admins` group: `["read", "create", "update", "delete"]` — break-glass
- Wire both CI push paths (Dagger and Nix/skopeo) with `ZOT_CI_API_KEY` credentials
- Add `artifact-workloads` PolicyBinding in Authentik blueprint for zot app access
- Add `ZOT_CI_API_KEY` to Forgejo Actions secrets via existing ansible role

Completes the `wire-ci-registry-auth` and `harden-zot-registry` Mikado cards.

## Manual Deployment Steps (after merge)

1. Deploy Authentik blueprint: `argocd app sync authentik`
2. In Authentik admin UI: set a password for the `zot-ci` service account
3. Deploy zot config: `mise run provision-indri -- --tags zot`
4. Log in to `https://registry.ops.eblu.me` as `zot-ci` via OIDC → generate API key
5. Store API key in 1Password as `zot-ci-apikey` in blumeops vault
6. Sync Forgejo secrets: `mise run provision-indri -- --tags forgejo_actions_secrets`
7. Trigger a test container build to verify CI push
8. Verify anonymous pull: `curl -sf https://registry.ops.eblu.me/v2/_catalog`

## Uncertainties

- **Zot `accessControl` group matching with OIDC:** Groups from Authentik's `profile` scope claim should map to zot policy groups, but the exact claim-to-group matching needs runtime verification
- **`http.auth.apikey: true`:** This config key is documented but needs verification against the specific zot version built from source on indri
- **API key permissions:** Need to confirm zot API keys inherit the generating user's group for accessControl evaluation

## Test Plan

- [ ] `mise run provision-indri -- --check --diff --tags zot` shows expected config changes
- [ ] Anonymous pull works after deploy
- [ ] Unauthenticated push fails (401)
- [ ] OIDC browser login redirects to Authentik and back
- [ ] API key push works after key generation
- [ ] CI push succeeds with both Dagger and skopeo paths
- [ ] `mise run services-check` passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/237
2026-02-21 12:20:29 -08:00
30a7c4de9b Close register-zot-oidc-client Mikado card
Completed in PR #236. Updated card to reflect what was actually
implemented, including deviations (worker env var wiring, manual
service account setup).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 08:49:32 -08:00
21b6533aea Register Zot as OIDC client in Authentik (#236)
## Summary
- Add Authentik blueprint (`zot.yaml`) with OAuth2 provider, application, `artifact-workloads` group, and `zot-ci` service account
- Wire `zot-client-secret` through ExternalSecret → worker Deployment env var → blueprint `!Env`
- Add Ansible pre_task to fetch OIDC secret from 1Password (item ID `oor7os5kapczgpbwv7obkca4y4`)
- Add `oidc-credentials.json.j2` template and deploy task in zot role (with `when` guard)

## Manual Steps Required Before Deploy
1. Generate client secret: `openssl rand -hex 32`
2. Store in 1Password: add field `zot-client-secret` to "Authentik (blumeops)" item in vault `blumeops`

## What This Does NOT Do
- Does NOT modify `config.json.j2` (that's the root goal `harden-zot-registry`)
- Does NOT wire CI auth (that's `wire-ci-registry-auth`)
- Does NOT set service account password or API keys (manual post-deploy)

## Verification
After ArgoCD sync:
- [ ] Authentik admin UI shows "Zot Registry" application
- [ ] OIDC discovery at `https://authentik.ops.eblu.me/application/o/zot/.well-known/openid-configuration` returns valid JSON
- [ ] Blueprint status is `successful`
- [ ] `artifact-workloads` group exists with `zot-ci` service account

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/236
2026-02-21 08:45:06 -08:00
04e036c603 Fold enforce-tag-immutability into harden-zot-registry (#235)
## Summary

- Removed `status: active` from `enforce-tag-immutability` card — its requirements are folded into the parent `harden-zot-registry` goal's `accessControl` configuration
- Updated `harden-zot-registry` with three-tier access control spec (anonymous read, artifact-workloads read+create, admins full)
- Added `artifact-workloads` group creation step to `register-zot-oidc-client`
- Added service account context to `wire-ci-registry-auth`

## Rationale

Tag immutability requires authentication to be meaningful. Without auth, everyone is anonymous and gets the same policy. Rather than client-side push checks, the registry enforces immutability server-side: CI gets `["read", "create"]` (no update/delete), so pushing an existing tag is rejected by zot itself.

## Test plan

- [ ] `mise run docs-check-links` passes
- [ ] `mise run docs-mikado` shows enforce-tag-immutability as resolved

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/235
2026-02-21 08:05:16 -08:00
64691da4fb Complete adopt-commit-based-container-tags Mikado card
All 14 containers now have commit-SHA-based tags in the registry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 23:28:45 -08:00
96a2d420fb Update install-dagger-on-nix-runner card with actual resolution
Dagger can't run on the bare nix runner (needs container runtime).
Used nix eval directly instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 23:23:06 -08:00
b8bc0bffd8 Update ringtail flake.lock (remove dagger input)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 23:21:36 -08:00
02f2be5523 Use nix eval instead of dagger for nix runner version extraction
Dagger CLI needs a container runtime (Docker/containerd) to start its
engine, which the bare nix runner doesn't have. Use nix eval directly
instead — it's already available and more appropriate for a nix host.

Reverts the dagger flake input since it's not usable here.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 23:21:16 -08:00
de037ae51f Update ringtail flake.lock with dagger/nix input
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 23:15:19 -08:00
a2d2808270 Add dagger CLI to nix runner via github:dagger/nix flake
Dagger was removed from nixpkgs due to trademark concerns. Use the
official dagger/nix flake as a flake input instead, passing the package
through via specialArgs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 23:14:46 -08:00
e0d5f28147 Add dagger to nix-container-builder runner (#234)
## Summary
- Add `dagger` to `hostPackages` for the ringtail nix-container-builder runner
- Needed for `dagger call nix-version` fallback in the nix build workflow (authentik)
- `hostPackages` is scoped to the runner's systemd unit PATH, not system-wide
- Marks `install-dagger-on-nix-runner` Mikado card complete

## Deployment and Testing
- [ ] Merge, then `mise run provision-ringtail`
- [ ] `mise run container-build-and-release authentik` to verify nix build succeeds

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/234
2026-02-20 23:09:01 -08:00
a68a170a10 Add install-dagger-on-nix-runner Mikado card (#233)
## Summary
- New Mikado card: the ringtail nix-container-builder runner lacks dagger, which the nix workflow needs for `dagger call nix-version` (authentik version extraction fallback)
- Re-opens `adopt-commit-based-container-tags` with this new prerequisite
- All other containers (11 Dockerfile-only, nettest + ntfy with nix) build fine — only authentik's nix build is blocked

## Deployment and Testing
- Docs only, no deployment needed

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/233
2026-02-20 23:03:12 -08:00
ffa8727660 Adopt commit-based container tags (#232)
## Summary
- Replace git-tag-triggered container builds with path-based triggers on main and workflow_dispatch
- Image tags now encode upstream app version + commit SHA (`vX.Y.Z-<sha>`) for full traceability
- Replace `container-tag-and-release` task with `container-build-and-release` (dispatches workflows via Forgejo API)
- Update dagger `publish()` to accept `commit_sha` parameter
- Update all docs and references to the new workflow

## Deployment and Testing
- [ ] Merge to main
- [ ] `mise run container-build-and-release <name>` for each container to populate new-format tags
- [ ] Verify tags in registry via `mise run container-list`
- [ ] Existing images untouched — old tags remain available

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/232
2026-02-20 22:56:20 -08:00
0e2c10176d Harden zot registry, pt 1 (#231)
## Summary
- Enable OIDC + API key authentication on zot with anonymous pull preserved
- Enforce tag immutability for version tags
- Adopt commit-SHA-based container image tagging

Details in the [[harden-zot-registry]] Mikado chain (`mise run docs-mikado harden-zot-registry`).

## Test plan
- [ ] Anonymous pull still works
- [ ] Unauthenticated push fails (401)
- [ ] CI container builds pass with new auth and tagging
- [ ] `mise run services-check` passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/231
2026-02-20 22:50:01 -08:00
6d7071e5ec Add commit-based container tagging prereq to harden-zot-registry chain (#230)
## Summary

- New Mikado card: `adopt-commit-based-container-tags` — replaces git-tag-triggered container builds with path-based main-branch triggers and manual workflow dispatch
- Image tags become `vX.Y.Z-<sha>` (with `-main` suffix for main branch builds, `-nix` for Nix builds), tying versions to the actual bundled app version and exact source commit
- `container-tag-and-release` mise task to be renamed to `container-build-and-release`, triggering workflow dispatch with the current HEAD SHA
- Added as soft prereq to `harden-zot-registry` Mikado chain

## Test plan

- [x] Pre-commit hooks pass (docs-check-index, docs-check-links, etc.)
- [ ] Review card content for completeness

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/230
2026-02-20 18:26:27 -08:00
379bcb98af Create C2 Mikado cards for harden-zot-registry (#229)
## Summary
- Replace the old pre-Mikado plan doc (`docs/how-to/plans/harden-zot-registry.md`) with a proper C2 Mikado chain in `docs/how-to/zot/`
- Root goal: `harden-zot-registry` — enable OIDC + API key auth on zot with anonymous pull preserved
- Three leaf prereqs: `register-zot-oidc-client`, `wire-ci-registry-auth`, `enforce-tag-immutability`
- Add Zot section to `how-to.md` index, remove plan entry from plans index
- All doc checks pass (`docs-check-links`, `docs-check-index`, `docs-mikado`)

## Changes
- **New:** `docs/how-to/zot/harden-zot-registry.md` — C2 Mikado root goal
- **New:** `docs/how-to/zot/register-zot-oidc-client.md` — Register OIDC client in Authentik
- **New:** `docs/how-to/zot/wire-ci-registry-auth.md` — Wire CI push paths with registry auth
- **New:** `docs/how-to/zot/enforce-tag-immutability.md` — Prevent version tag overwrites
- **Deleted:** `docs/how-to/plans/harden-zot-registry.md` — Old plan doc (content absorbed into Mikado cards)
- **Updated:** `docs/how-to/how-to.md` — Add Zot section, remove plan entry
- **Updated:** `docs/how-to/plans/plans.md` — Remove plan entry

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/229
2026-02-20 17:56:25 -08:00
cd50c1454a Integrate Forgejo with Authentik OIDC (#228)
## Summary

- Refactor Authentik blueprints: extract shared `admins` group into `common.yaml`, add `groups` scope mapping to all providers for group-based admin propagation
- Add Forgejo OAuth2 provider and application blueprint (`forgejo.yaml`)
- Add `forgejo-client-secret` to ExternalSecret and worker deployment env
- Configure Forgejo `[oauth2_client]` with `ACCOUNT_LINKING=login` to safely link existing accounts
- Update documentation (forgejo.md, authentik.md, federated-login.md)

## Deployment and Testing

After merge, deployment requires these steps in order:

1. **Authentik (ArgoCD):**
   - `argocd app set authentik --revision feature/forgejo-authentik-oidc && argocd app sync authentik`
   - Verify: Forgejo app/provider visible in Authentik admin UI
   - Verify: Grafana SSO still works (blueprint refactor)

2. **Forgejo app.ini (Ansible):**
   - `mise run provision-indri -- --tags forgejo --check --diff` (dry run)
   - `mise run provision-indri -- --tags forgejo` (apply, restarts Forgejo)

3. **Create Forgejo auth source (CLI on indri):**
   ```
   ssh indri 'sudo -u forgejo /opt/homebrew/bin/forgejo admin auth add-oauth \
     --name authentik \
     --provider openidConnect \
     --key forgejo \
     --secret "$(op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/Authentik (blumeops)/forgejo-client-secret")" \
     --auto-discover-url https://authentik.ops.eblu.me/application/o/forgejo/.well-known/openid-configuration \
     --scopes "openid email profile groups" \
     --group-claim-name groups \
     --admin-group admins'
   ```

4. **Link eblume account:** Sign in with Authentik on Forgejo, confirm link with local password

5. **Verify:** `tea repo list`, Forgejo Actions, local password break-glass

After merge: `argocd app set authentik --revision main && argocd app sync authentik`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/228
2026-02-20 17:39:50 -08:00
e0c6b7df99 Add Authentik to homepage dashboard
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 13:03:48 -08:00
71cb256527 Deploy Authentik identity provider (C2 Mikado) (#227)
## Summary
C2 Mikado chain for deploying Authentik as the SSO identity provider, replacing Dex.

This PR will evolve over multiple sessions. Each iteration adds documentation (prerequisite cards) and eventually code as leaf nodes are resolved.

## Current Mikado State
- **Goal:** `deploy-authentik` (active)
- **Leaf prerequisites:**
  - `build-authentik-container` — Build Nix container image
  - `provision-authentik-database` — Create PostgreSQL database on CNPG cluster
  - `create-authentik-secrets` — Create 1Password item with credentials

## Process refinements
- Updated agent-change-process with lessons from first attempt: reset code before committing cards, open PRs early

## Test plan
- [ ] `mise run docs-mikado` shows correct dependency chain
- [ ] Leaf nodes can be worked independently
- [ ] Container builds on ringtail
- [ ] Authentik starts and reaches healthy state
- [ ] Forgejo OAuth2 connector works

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/227
2026-02-20 12:55:59 -08:00
174c6414ac Convert deploy-authentik plan to C2 Mikado chain (#226)
## Summary
- Strip detailed phase instructions from deploy-authentik plan (400→50 lines)
- Retain architecture decisions (ringtail, CNPG on indri, Nix containers, kustomize, Tailscale+Caddy) and open questions
- Add `status: active` frontmatter — now visible as a root goal in `mise run docs-mikado`
- Update plans index to reflect Active (C2) status

This is the first real use of the C2 Mikado chain system from #225. Future sessions will discover prerequisites, create sub-cards with `requires`, and work leaf nodes first.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/226
2026-02-20 08:22:19 -08:00
c1f7b2a9a3 Add agent change process (C0/C1/C2) and docs-mikado tool (#225)
## Summary
- Introduce C0/C1/C2 change classification based on the Mikado method, where documentation cards serve as persistent context for agents across sessions
- Add `docs-mikado` mise task to visualize active Mikado dependency chains from `status: active` and `requires` frontmatter fields
- Rename `zk-docs` task to `ai-docs`

## Changes
- **New:** `docs/how-to/agent-change-process.md` — methodology card
- **New:** `mise-tasks/docs-mikado` — Python uv script for dependency graph visualization
- **Renamed:** `mise-tasks/zk-docs` → `mise-tasks/ai-docs`
- **Updated:** `CLAUDE.md` — added Change Classification section, updated references
- **Updated:** `ai-assistance-guide.md`, `exploring-the-docs.md`, `how-to.md` — updated references and index

## Verification
- [x] `mise run ai-docs` works
- [x] `mise run docs-mikado` runs (no active chains yet, as expected)
- [x] `docs-check-links` — all valid
- [x] `docs-check-index` — all indexed
- [x] `docs-check-frontmatter` — all valid
- [x] All pre-commit hooks pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/225
2026-02-20 08:15:20 -08:00
c3748a0638 Add Authentik deployment plan (#224)
## Summary
- New plan document at `docs/how-to/plans/deploy-authentik.md` covering full Authentik deployment
- 6 phases: Helm analysis, prerequisites (CNPG/Redis/1Password), Nix containers, kustomize manifests, networking, monitoring
- Authentik replaces Dex as the identity provider for central user management and multi-protocol SSO
- Updated plans index and how-to index

## Deployment and Testing
- [x] Pre-commit hooks pass (docs-check-links, docs-check-index, docs-check-frontmatter)
- [ ] Review plan content for accuracy and completeness
- No deployment needed — documentation only

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/224
2026-02-20 07:06:56 -08:00
Forgejo Actions
18f1ac61fc Update docs release to v1.10.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-19 20:45:43 -08:00
d21798b1f3 Document Dex OIDC and add services-check integration (#223) v1.10.0
## Summary
- Create Dex reference card (`docs/reference/services/dex.md`) with quick reference, architecture, identity source, storage, OIDC clients, secrets, and endpoints
- Write federated login explanation article (`docs/explanation/federated-login.md`) covering the Dex + Forgejo two-layer auth model, login flow, and break-glass access
- Add Dex to `services-check` (HTTP health endpoint + k3s pod check)
- Update Grafana docs with new Authentication section documenting SSO via Dex
- Update Forgejo docs with OAuth2 Provider section documenting its role as upstream identity source
- Add Dex to ringtail workloads table and reference service index
- Move `adopt-oidc-provider` plan to `completed/` with final design reflecting actual implementation

## Test plan
- [ ] `mise run services-check` passes (includes new Dex checks)
- [ ] `docs-check-links` passes (all wiki-links resolve)
- [ ] `docs-check-index` passes (new docs are indexed)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/223
2026-02-19 20:44:23 -08:00
0cdc143227 Deploy Dex OIDC identity provider with Grafana SSO (#222)
## Summary
- Deploys Dex OIDC identity provider on ringtail k3s cluster as central authentication service
- Integrates Grafana as first SSO client via `auth.generic_oauth`
- Uses Kubernetes CRD storage backend (no PVC needed)
- All secrets (bcrypt hash, client secrets) injected via ExternalSecrets from 1Password item "Dex (blumeops)"
- NixOS-built container image via `containers/dex/default.nix`

## Pre-requisites (manual, before deployment)
1. Create 1Password item "Dex (blumeops)" in `blumeops` vault with fields:
   - `password`: strong generated password for Dex login
   - `static-password-hash`: bcrypt hash of above (`htpasswd -BnC 10 eblume`, copy hash after `eblume:`)
   - `grafana-client-secret`: random 32-char hex (`openssl rand -hex 16`)
2. Build container: `mise run container-tag-and-release dex v1.0.0`

## Deployment sequence
1. Build container: `mise run container-tag-and-release dex v1.0.0`
2. Deploy Caddy: `mise run provision-indri -- --tags caddy`
3. Sync ArgoCD: `argocd app sync apps` → `argocd app sync dex`
4. Verify Dex: `curl https://dex.ops.eblu.me/.well-known/openid-configuration`
5. Sync Grafana: `argocd app sync grafana-config` → `argocd app sync grafana`
6. Test SSO: Visit `https://grafana.ops.eblu.me/login`, click "Sign in with Dex"

## Verification
- [ ] Container image exists: `mise run container-list` shows `dex:v1.0.0-nix`
- [ ] `curl https://dex.ops.eblu.me/.well-known/openid-configuration` returns valid OIDC discovery
- [ ] `curl https://dex.ops.eblu.me/healthz` returns healthy
- [ ] Grafana login shows "Sign in with Dex" button alongside local login
- [ ] OIDC flow: click Dex → enter credentials → redirect back → logged in as Admin
- [ ] Break-glass: local admin login still works
- [ ] `mise run services-check` passes

## Files changed
| File | Action | Purpose |
|------|--------|---------|
| `containers/dex/default.nix` | Create | NixOS container build |
| `argocd/apps/dex.yaml` | Create | ArgoCD app targeting ringtail |
| `argocd/manifests/dex/*` (8 files) | Create | K8s manifests (RBAC, ExternalSecret, Deployment, Service, Ingress) |
| `argocd/manifests/grafana-config/external-secret-dex-oauth.yaml` | Create | Grafana OIDC client secret |
| `argocd/manifests/grafana-config/kustomization.yaml` | Modify | Add new ExternalSecret resource |
| `argocd/manifests/grafana/values.yaml` | Modify | Add `auth.generic_oauth` config + envFromSecrets |
| `ansible/roles/caddy/defaults/main.yml` | Modify | Add `dex.ops.eblu.me` reverse proxy entry |
| `docs/changelog.d/feature-dex-oidc.feature.md` | Create | Changelog fragment |

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/222
2026-02-19 20:24:24 -08:00
b876e39981 Replace Homepage Helm chart with kustomize manifests and custom Dockerfile (#221)
## Summary
- Replace third-party Helm chart (jameswynn/homepage v2.1.0, pinned at app v1.2.0) with plain kustomize manifests and a custom Dockerfile building from forge mirror at v1.10.1
- Adds Dockerfile (`containers/homepage/`) with multi-stage build (node:22-slim builder, node:22-alpine runtime)
- Creates kustomize manifests: Deployment, Service, ConfigMap (6 config files), ServiceAccount, ClusterRole, ClusterRoleBinding
- Keeps existing ingress-tailscale.yaml and all 6 ExternalSecret resources unchanged
- Updates ArgoCD app definition from multi-source Helm to single directory source

## Prerequisite
- Homepage source mirrored at forge.ops.eblu.me/eblume/homepage.git 
- Container must be built and pushed before syncing: `mise run container-release homepage v1.10.1`

## Deployment and Testing
- [ ] Build and push container image: `mise run container-release homepage v1.10.1`
- [ ] Branch-test via ArgoCD: `argocd app set homepage --revision feature/homepage-kustomize && argocd app sync homepage`
- [ ] Verify dashboard loads at go.ops.eblu.me / go.tail8d86e.ts.net
- [ ] Verify k8s autodiscovery works (services appear on dashboard)
- [ ] Verify widgets load (weather, Forgejo, Jellyfin, etc.)
- [ ] After merge: `argocd app set homepage --revision main && argocd app sync homepage`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/221
2026-02-19 18:29:19 -08:00
869f6bd20d Review: update-documentation doc (#220)
## Summary
- Add missing workflow step 8: Fly.io proxy cache purge after deploy
- Remove misleading "Directory" column from changelog fragment types table (all fragments use flat `<name>.<type>.md` pattern, not subdirectories)
- Stamp `last-reviewed: 2026-02-19`

## Review notes
Verified all claims against actual workflow YAML, Dagger pipeline, ArgoCD manifests, towncrier config, and Quartz config files. Everything else checks out.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/220
2026-02-19 17:40:05 -08:00
cabd0bc9cf Update Frigate zone masks and expand alert notifications (#219)
## Summary
- Synced driveway_entrance zone coordinates from live Frigate config (adjusted mask boundaries)
- Added `inertia: 3` and `loitering_time: 0` to driveway_entrance zone
- Expanded review alerts to require either `driveway_entrance` or `driveway` zone (was entrance only)
- Updated frigate-notify config to allow alerts from both `driveway_entrance` and `driveway` zones

## Deployment and Testing
- [ ] Merge and sync frigate ArgoCD app on ringtail
- [ ] Sync frigate-notify (restart pod to pick up ConfigMap change)
- [ ] Verify alerts fire for person/car in driveway zone

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/219
2026-02-19 17:32:02 -08:00
291fff345c Fix services-check and update docs for Frigate migration to ringtail (#218)
## Summary
- Move mosquitto, ntfy, frigate, frigate-notify pod checks from `minikube-indri` to `k3s-ringtail` context in `services-check`
- Add `nvidia-device-plugin` pod check for ringtail k3s
- Rename "Kubernetes pods" section to "Indri minikube pods" for clarity
- Update 8 documentation files to reflect the migration completed in PRs #216/#217

## Files Changed
| File | Change |
|------|--------|
| `mise-tasks/services-check` | Move 4 pod checks to k3s-ringtail, add nvidia-device-plugin |
| `docs/reference/services/frigate.md` | Image→tensorrt, detector→ONNX/CUDA, shm→512Mi |
| `docs/reference/infrastructure/ringtail.md` | List actual k3s workloads |
| `docs/reference/infrastructure/indri.md` | Note frigate migration |
| `docs/explanation/architecture.md` | Add ringtail to diagram + compute layer |
| `docs/reference/kubernetes/cluster.md` | Note two clusters, add k3s section |
| `docs/reference/reference.md` | Update frigate/ntfy location |
| `docs/how-to/plans/completed/operationalize-reolink-camera.md` | Add post-completion migration note |
| `CLAUDE.md` | Add k3s-ringtail context guidance |

## Test plan
- [ ] `mise run services-check` — all checks pass
- [ ] Review each doc for accuracy against deployed state

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/218
2026-02-19 14:38:21 -08:00
d5d32fe91f Port Frigate NVR to ringtail k3s with GPU acceleration (#217)
## Summary

- Enable NVIDIA container toolkit on ringtail NixOS and configure k3s containerd with nvidia runtime
- Add NVIDIA device plugin ArgoCD app (RuntimeClass + DaemonSet) to expose `nvidia.com/gpu` resources
- Re-target Frigate from indri minikube (arm64, ZMQ detector) to ringtail k3s (x86_64, TensorRT/ONNX)
- Switch Frigate image to `-tensorrt` variant with GPU resource limits and increased shared memory

## Manual Prerequisites

1. **NFS access**: Verify ringtail can mount `sifaka:/volume1/frigate`
   ```fish
   ssh ringtail 'sudo mount -t nfs sifaka:/volume1/frigate /mnt/storage1 && ls /mnt/storage1 && sudo umount /mnt/storage1'
   ```
2. **YOLO model**: Verify `/volume1/frigate/models/yolov9m.onnx` exists on sifaka

## Deployment Steps

1. Provision ringtail: `mise run provision-ringtail`
2. Sync ArgoCD apps: `argocd app sync apps --prune`
3. Deploy NVIDIA device plugin: `argocd app sync nvidia-device-plugin`
4. Verify GPU: `kubectl --context=k3s-ringtail get nodes -o json | jq '.items[].status.capacity'`
5. Deploy Frigate: `argocd app sync frigate`

## Verification

- [ ] `nvidia.com/gpu: 1` visible in node capacity
- [ ] Frigate pod running with GPU allocated
- [ ] Frigate UI loads at `https://nvr.ops.eblu.me`
- [ ] Detector shows ONNX/TensorRT on System page
- [ ] Camera feed with bounding boxes in live view
- [ ] TensorRT engine build completes (watch logs on first start)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/217
2026-02-19 14:27:04 -08:00
16a4a9a616 Port Mosquitto and ntfy to ringtail k3s, retire Apple Silicon Detector (#216)
## Summary
- Delete `ansible/roles/frigate_detector/` and remove from indri playbook — the Apple Silicon Detector is retired
- Move Mosquitto (MQTT) ArgoCD app from indri minikube to ringtail k3s
- Move ntfy ArgoCD app from indri minikube to ringtail k3s
- Update Frigate docs to reflect detector removal and planned RTX 4080 migration
- Manifests are reused as-is (same `argocd/manifests/mosquitto/` and `argocd/manifests/ntfy/`), just pointed at ringtail

## Deployment

After merge:
1. Sync indri ArgoCD `apps` app with prune to remove old mosquitto/ntfy apps:
   ```
   argocd app sync apps --prune
   ```
2. Sync new ringtail apps:
   ```
   argocd app sync mosquitto-ringtail
   argocd app sync ntfy-ringtail
   ```
3. Manually clean up the detector LaunchAgent on indri:
   ```
   ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.frigate-detector.plist'
   ssh indri 'rm ~/Library/LaunchAgents/mcquack.eblume.frigate-detector.plist'
   ```

## Notes
- Frigate on indri will lose MQTT/ntfy connectivity — this is expected (user confirmed no downtime concerns)
- ntfy Tailscale Ingress hostname `ntfy` will transfer from indri ProxyGroup to ringtail ProxyGroup
- Caddy on indri proxies `ntfy.ops.eblu.me` → `ntfy.tail8d86e.ts.net`, so no Caddy changes needed
- Frigate + frigate-notify will be ported to ringtail in a follow-up PR

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/216
2026-02-19 11:22:44 -08:00
61ca1ca305 Deploy Tailscale operator on ringtail k3s cluster (#215)
## Summary
- Extract shared Tailscale operator resources (CRDs, RBAC, Deployment, ProxyClass, DNSConfig) into `tailscale-operator-base/` so both clusters reference the same manifests
- Add `tailscale-operator-ringtail/` overlay with 1-replica ProxyGroup and ExternalSecret for the shared OAuth client
- Add ArgoCD Application targeting `ringtail.tail8d86e.ts.net:6443`
- Update `.yamllint.yaml` ignore path for the moved `operator.yaml`

## Deployment and Testing
- [ ] Sync `apps` app to pick up the new Application definition
- [ ] `argocd app sync tailscale-operator-ringtail`
- [ ] Verify ExternalSecret syncs: `kubectl --context=k3s-ringtail -n tailscale get externalsecret`
- [ ] Verify operator pod runs: `kubectl --context=k3s-ringtail -n tailscale get pods`
- [ ] Verify ProxyGroup ready: `kubectl --context=k3s-ringtail -n tailscale get proxygroups`
- [ ] Verify indri operator still works: `argocd app diff tailscale-operator`
- [ ] Check Tailscale admin for new operator device with `tag:k8s-operator`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/215
2026-02-19 09:33:05 -08:00
695089499e Nix container build for nettest (#214)
## Summary
- Add `containers/nettest/default.nix` using `dockerTools.buildLayeredImage` with curl, jq, dnsutils, cacert, and bash — equivalent to the existing Dockerfile
- Update `container-tag-and-release` to require `--nix` or `--dockerfile` flag when both build types exist for a container
- Update `container-list` to show `[dockerfile+nix]` label when both exist

## Deployment and Testing
- [ ] SSH to ringtail, run `nix build -f containers/nettest/default.nix -o result` to verify the nix expression builds
- [ ] Tag `nettest-nix-v1.0.0`, confirm `build-container-nix` workflow runs on `nix-container-builder` runner and pushes to registry
- [ ] Smoke test on ringtail k3s: `kubectl run nettest --image=registry.ops.eblu.me/blumeops/nettest:v1.0.0 --restart=Never && kubectl logs nettest`
- [ ] Verify `mise run container-list` shows `[dockerfile+nix]` for nettest
- [ ] Verify `mise run container-tag-and-release nettest v1.1.0` prompts for build type

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/214
2026-02-19 08:42:58 -08:00
b475a1fcd7 Fix 1Password secret tasks always reporting changed in ringtail playbook (#213)
## Summary
- Replace `changed_when: true` with `register` + output inspection on the two 1Password secret tasks in `ringtail.yml`
- Tasks now correctly report `ok` when the secret content hasn't changed, and `changed` only when `kubectl apply` outputs `configured` or `created`

## Test plan
- [ ] Run `mise run provision-ringtail` twice — second run should show both tasks as `ok` not `changed`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/213
2026-02-19 07:25:24 -08:00
8f89239c78 Inhibit idle lock for fullscreen windows on ringtail (#212)
## Summary
- Adds `inhibit_idle fullscreen` window commands to sway config on ringtail
- Covers both Wayland-native (`app_id`) and XWayland (`class`) windows
- Prevents swayidle from locking the screen during gamepad-only gaming sessions where controller input isn't detected by the Wayland idle tracker

## Notes
This is a blanket fullscreen inhibit. A more targeted approach (daemon monitoring `/dev/input` gamepad events) may be desired later to allow idle lock during long-running fullscreen apps like Factorio.

## Deployment and Testing
- [ ] `mise run provision-ringtail` to deploy
- [ ] Run a fullscreen app and verify swayidle doesn't lock after 15 minutes
- [ ] Verify lock still activates when no fullscreen window is present

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/212
2026-02-19 07:20:05 -08:00
9829a6f971 Add screen lock and idle management to ringtail (#211)
## Summary
- Configure **swayidle** to lock screen (swaylock) after 15 minutes of inactivity
- Turn off display (DPMS) after 60 minutes, auto-restore on activity
- **swaylock** themed with Catppuccin Macchiato to match existing Sway config
- Add `Mod4+l` keybinding for manual screen lock
- Add PAM service for swaylock authentication
- Disable system suspend/hibernate entirely (workstation should never sleep)

## What changes
All changes in `nixos/ringtail/configuration.nix`:
- `security.pam.services.swaylock` — required for swaylock to authenticate on NixOS
- `systemd.sleep.extraConfig` — blocks all sleep/hibernate modes
- `programs.swaylock` (home-manager) — lock screen appearance config
- `services.swayidle` (home-manager) — idle timeout daemon with lock + DPMS events
- New keybinding `Mod4+l` for manual lock

## Deployment and Testing
- [ ] `mise run provision-ringtail`
- [ ] Verify swayidle is running: `systemctl --user status swayidle`
- [ ] Test manual lock with `Super+l`
- [ ] Verify display DPMS off after idle (can lower timeout temporarily to test)
- [ ] Confirm machine does not suspend: `systemctl status sleep.target`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/211
2026-02-19 06:46:37 -08:00
630ebcd12d Add ringtail DeviceTags and homelab-to-homelab SSH rule (#210)
## Summary
- Add `ringtail` DeviceTags Pulumi resource with `tag:homelab` + `tag:blumeops` (matching indri/sifaka pattern)
- Remove the bootstrap `ringtail_key` auth key — ringtail is already on the tailnet
- Add SSH ACL rule allowing `tag:homelab` → `tag:homelab` SSH, unblocking cross-host management (e.g., ringtail running ansible against indri)

## Deployment and Testing
- [ ] `mise run tailnet-preview` — dry run, confirm diff
- [ ] `mise run tailnet-up` — apply
- [ ] From ringtail: `ssh indri 'hostname'` — should succeed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/210
2026-02-18 21:48:11 -08:00
aa04618829 Fix k3s health check to use explicit KUBECONFIG path
k3s kubectl on ringtail needs KUBECONFIG set since the eblume
user doesn't have it in their default environment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 21:26:00 -08:00
1f2134bf0a Fix provision-ringtail ls-remote matching with mirror refs
git ls-remote returns multiple lines when a mirror ref exists
(e.g. refs/remotes/remote_mirror_*/main). Take only the first
line to avoid a false mismatch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 21:22:46 -08:00
918df9e642 Add k3s, 1Password Connect, and systemd nix-container-builder to ringtail (#209)
## Summary

  Extends ringtail from a desktop/gaming NixOS box into an infrastructure node with a k3s cluster, secrets management, and a Forgejo Actions
  runner for building containers with Nix.

  ### K3s cluster
  - Single-node k3s with Traefik/ServiceLB/metrics-server disabled (minimal footprint)
  - TLS SAN set to `ringtail.tail8d86e.ts.net` so ArgoCD on indri can manage it via Tailscale
  - Containerd registry mirrors pull through Zot on indri (`k3s-registries.yaml`)
  - Tailscale interface added to `trustedInterfaces` for cross-node ArgoCD access
  - `kubectl` added to system packages

  ### 1Password Connect + External Secrets Operator
  - Four new ArgoCD apps targeting `k3s-ringtail`: `1password-connect-ringtail`, `external-secrets-crds-ringtail`, `external-secrets-ringtail`,
  `external-secrets-config-ringtail`
  - Reuses the same Helm charts/values as indri, just pointed at ringtail's k3s API server
  - Bootstrap secrets (`op-credentials`, `onepassword-token`) provisioned by Ansible pre_tasks via `op read`, then applied to the `1password`
  namespace in post_tasks

  ### Systemd Forgejo Actions runner
  - Native `services.gitea-actions-runner` with `forgejo-runner` package — no DinD, no k8s pod, runs directly on the NixOS host
  - Label `nix-container-builder:host` — jobs execute on the host with `nix`, `skopeo`, `nodejs`, etc. in PATH
  - Registration token fetched from 1Password (`Forgejo Secrets/runner_reg`) by Ansible and written to `/etc/forgejo-runner/token.env`
  - Runner's dynamic user (`gitea-runner`) added to `nix.settings.trusted-users` for nix daemon access

  ### Nix container build workflow
  - New `.forgejo/workflows/build-container-nix.yaml` triggers on `*-nix-v[0-9]*` tags (e.g. `nettest-nix-v1.0.0`)
  - Builds with `nix build -f containers/<name>/default.nix`, pushes to Zot via `skopeo copy`
  - Existing Dockerfile workflow guarded with `if: !contains(github.ref_name, '-nix-v')` to avoid double-triggering

  ### Mise task updates
  - `container-tag-and-release` auto-detects `default.nix` vs `Dockerfile` and uses the appropriate tag format (`-nix-v` vs `-v`)
  - `container-list` shows build type indicator (`[nix]` / `[dockerfile]`)

  ## Post-merge

  1. `mise run provision-ringtail` — deploys k3s token, runner token, NixOS rebuild
  2. Register k3s cluster in ArgoCD (first time only):
     ```fish
     ssh ringtail 'sudo cat /etc/rancher/k3s/k3s.yaml' | \
       sed 's|127.0.0.1|ringtail.tail8d86e.ts.net|' > /tmp/k3s-ringtail.yaml
     set -x KUBECONFIG /tmp/k3s-ringtail.yaml
     argocd cluster add default --name k3s-ringtail
  3. Sync ArgoCD apps in order: 1password-connect-ringtail -> external-secrets-crds-ringtail -> external-secrets-ringtail ->
  external-secrets-config-ringtail
  4. Verify runner: ssh ringtail 'systemctl status gitea-runner-nix-container-builder'
  5. Check Forgejo admin panel for ringtail-nix-builder runner online
  6. Test: create containers/<name>/default.nix, tag with <name>-nix-v0.1.0

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/209
2026-02-18 21:15:30 -08:00
535f897054 Polish ringtail NixOS config and add documentation (#208)
## Summary
- Fix Super+Return keybinding to launch wezterm in sway
- Set fish as default login shell
- Remove `initialPassword` (real password already set)
- Add 1Password CLI + GUI, chezmoi, and dev tool packages (neovim, eza, fd, fzf, zoxide, starship, atuin, bat, ripgrep)
- Add ringtail reference card, update host inventory and reference index
- Changelog fragment

## Post-merge deployment
- `mise run provision-ringtail` to rebuild NixOS
- On ringtail: launch 1Password GUI, enable CLI integration (Settings > Developer > CLI integration)
- Chezmoi needs `.chezmoiignore` updates in the dotfiles repo (separate task)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/208
2026-02-18 17:53:47 -08:00
b76f2314c2 Add force: true to ringtail git task
nixos-rebuild can dirty the tree (e.g. flake.lock updates), which
blocks the Ansible git module. Force ensures we always reset to the
upstream state.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 09:32:23 -08:00
7bf46f4e28 Add flake.lock for ringtail NixOS config
Prevents 'Git tree is dirty' warnings during nixos-rebuild.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 09:31:21 -08:00
5a087c10df Fix deprecated greetd.tuigreet package reference
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 09:30:01 -08:00
4b7491c58f Add python3 to ringtail for Ansible compatibility
NixOS doesn't include Python by default. Ansible needs it on the
managed host for module execution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 09:29:09 -08:00