blumeops/docs/how-to/plans/harden-zot-registry.md
Erich Blume 651fed8f1a Transcribe backlog tasks into plan documents (#151)
## Summary
- **adopt-oidc-provider:** Dex-based OIDC identity provider for SSO across services (status: Planning — service dependency/recovery design needed)
- **harden-zot-registry:** OIDC + API key auth and tag immutability for zot (depends on OIDC provider + Dagger CI)
- **forgejo-actions-dashboard:** Custom textfile Prometheus exporter + Grafana dashboard for Forgejo Actions CI metrics
- **operationalize-reolink-camera:** Cloud-free Frigate NVR with ONNX detection, NFS ring buffer recording to sifaka (depends on network segmentation)
- **add-unifi-pulumi-stack:** Expanded with NFS security motivation, BlumeOps Services subnet, IoT/appliance segregation, firewall rules

## Test plan
- [x] Pre-commit hooks pass (all 3 commits)
- [x] `docs-check-links` passes
- [x] `docs-check-index` passes
- [x] `docs-check-filenames` passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/151
2026-02-11 11:47:23 -08:00

9.7 KiB

title tags
Plan: Harden Zot Registry
how-to
plans
zot
registry
security

Plan: Harden Zot Registry

Status: Planned (not yet executed) Sequence: Execute after adopt-dagger-ci and adopt-oidc-provider — the Dagger migration will change how images are built and pushed, and the OIDC provider supplies the identity layer that zot's auth and API key features depend on.

Background

Zot is the BlumeOps OCI container registry, running natively on indri. It serves two roles: a pull-through cache for upstream registries (Docker Hub, GHCR, Quay) and the private image store for blumeops/* images.

Currently, zot has no authentication — the security boundary is the Tailscale ACL. This was an acceptable starting point, but has two gaps:

  1. Any tailnet client can push images — there's no distinction between pull (which k8s pods need) and push (which only CI should do). A compromised service or misconfigured pod could overwrite production images.
  2. Tags are mutable — pushing the same tag twice silently overwrites the previous image. There's no protection against accidental or malicious tag clobbering.

Goals

  • Authenticated push — only CI (Forgejo Actions / Dagger) can push images; all other clients are pull-only
  • Tag immutability — once a version tag is pushed, it cannot be overwritten
  • No disruption to pulls — k8s pods and pull-through caching continue to work without authentication
  • Minimal complexity — use zot's built-in OIDC and API key features with the BlumeOps identity provider

Current State

Push Mechanism

Images are currently pushed via the composite action at .forgejo/actions/build-push-image/action.yaml:

  1. docker buildx build creates the image
  2. docker save exports to a tarball
  3. skopeo copy pushes to registry.ops.eblu.me (no credentials needed)

The action pushes two tags per build: a version tag (e.g., v1.2.0) and the git commit SHA.

Zot Configuration

The config template (ansible/roles/zot/templates/config.json.j2) has no accessControl or http.auth section. The HTTP listener binds to 0.0.0.0:5050 with no TLS (Caddy terminates TLS at registry.ops.eblu.me).

Plan

1. Add Authentication for Push (OIDC + API Keys)

Zot supports native OIDC authentication with a built-in API key feature designed for exactly this use case. The approach:

  1. OIDC for browser login — zot delegates authentication to the BlumeOps OIDC provider (see adopt-oidc-provider). Human users log in via browser redirect.
  2. API keys for CI — after logging in via OIDC, generate a scoped API key for Forgejo CI / Dagger. API keys are zot-native tokens (zak_...) that work with docker login, skopeo, and Dagger's with_registry_auth(). They can be scoped to specific repositories and given expiration dates.
  3. Access controlanonymousPolicy allows unauthenticated pull; push requires authentication.
{
  "http": {
    "auth": {
      "openid": {
        "providers": {
          "oidc": {
            "name": "BlumeOps",
            "credentialsFile": "/Users/erichblume/.config/zot/oidc-credentials.json",
            "issuer": "https://dex.ops.eblu.me",
            "scopes": ["openid", "profile", "email"]
          }
        }
      },
      "apikey": true
    },
    "accessControl": {
      "repositories": {
        "**": {
          "anonymousPolicy": ["read"],
          "defaultPolicy": ["read", "create", "update"],
          "policies": [
            {
              "users": ["eblume"],
              "actions": ["read", "create", "update", "delete"]
            }
          ]
        }
      },
      "adminPolicy": {
        "users": ["eblume"],
        "actions": ["read", "create", "update", "delete"]
      }
    }
  }
}

The OIDC credentials file (client ID and secret) is deployed by Ansible from 1Password — never committed to the repo.

CI push flow after setup:

  1. Log in to zot UI via browser (OIDC redirect to Dex)
  2. Generate an API key: POST /zot/auth/apikey with label forgejo-ci, scoped to blumeops/**
  3. Store the key in 1Password (op://blumeops/zot-ci-apikey/credential)
  4. CI uses the key: docker login -u eblume -p zak_... registry.ops.eblu.me

This ensures:

  • k8s pods, minikube containerd, and pull-through caching all continue to work anonymously (read-only)
  • Push requires a valid API key tied to an OIDC identity
  • No standalone password files (htpasswd) to manage — identity flows from the central IdP

2. Enforce Tag Immutability

Zot does not have a built-in tag immutability feature at the registry level. Options to consider during execution:

  • Registry-side: Check if newer zot versions (post-2.1) have added immutability policies. If so, configure in config.json.
  • Push-side enforcement: The simpler approach — check whether a tag already exists before pushing. The current build-push-image action (and its eventual Dagger replacement) should query the registry API (GET /v2/<name>/tags/list) and fail the build if the version tag already exists. Commit SHA tags are inherently unique and don't need this check.

The push-side approach is pragmatic: it prevents accidental overwrites in the normal CI flow. Combined with authenticated push, a tag can only be overwritten by someone with CI credentials who deliberately bypasses the check.

See: .forgejo/actions/build-push-image/action.yaml — this is where the pre-push tag check would be added in the current workflow. After adopt-dagger-ci, the equivalent check goes in the Dagger Container.publish() wrapper.

3. Update Ansible Role

The ansible/roles/zot/ role needs:

  • New template: oidc-credentials.json.j2 (client ID and secret for the Dex OIDC client)
  • Updated config template: config.json.j2 gains http.auth (openid + apikey) and accessControl sections
  • Updated config template: config.json.j2 gains externalUrl set to https://registry.ops.eblu.me (required for OIDC callback redirects behind Caddy)
  • New variables: zot_oidc_client_id and zot_oidc_client_secret sourced from 1Password in the playbook's pre_tasks
  • Handler: restart zot LaunchAgent after config changes (already exists)

4. Update CI Push Credentials

After adopt-dagger-ci, the Dagger module will use the zot API key for registry auth:

api_key = dag.set_secret("registry-api-key",
    os.environ["ZOT_CI_API_KEY"])
container.with_registry_auth("registry.ops.eblu.me", "eblume", api_key)
container.publish("registry.ops.eblu.me/blumeops/image:tag")

5. Update Minikube Containerd Config

The minikube containerd config (ansible/roles/minikube/tasks/main.yml) currently talks to zot without credentials. Since anonymous pull remains allowed, no changes are needed for containerd.

Execution Steps

  1. Prerequisite: OIDC provider is running (see adopt-oidc-provider)

    • Dex (or chosen provider) is deployed and serving https://dex.ops.eblu.me
    • A zot OIDC client is registered with the provider
  2. Update Ansible role

    • Add OIDC credentials template
    • Update config.json.j2 with auth (openid + apikey) and access control
    • Store OIDC client credentials in 1Password
    • Test with mise run provision-indri -- --tags zot --check --diff
  3. Deploy and verify pulls still work

    • mise run provision-indri -- --tags zot
    • Verify anonymous pull: curl -sf https://registry.ops.eblu.me/v2/_catalog
    • Verify unauthenticated push fails: skopeo copy ... docker://registry.ops.eblu.me/blumeops/test:fail (should get 401)
  4. Set up OIDC login and generate CI API key

    • Log in to zot UI via browser (OIDC flow through Dex)
    • Generate an API key for CI use, store in 1Password
    • Verify authenticated push works: docker login -u eblume -p zak_... registry.ops.eblu.me
  5. Add tag immutability check to push workflow

    • Add pre-push tag existence check to Dagger module (or build-push-image action)
    • Test by attempting to push an existing tag
  6. Update documentation

    • Update docs/reference/services/zot.md security model section
    • Add changelog fragment

Verification Checklist

  • Anonymous pull works (k8s pods, containerd, curl)
  • Pull-through caching still works (pull an uncached image from docker.io)
  • Unauthenticated push is rejected (401)
  • OIDC browser login works (redirect to Dex and back)
  • API key generation works from zot UI
  • Authenticated push with API key succeeds
  • Pushing a duplicate version tag fails (immutability check)
  • Pushing a new commit SHA tag succeeds
  • Grafana dashboard still shows zot metrics
  • mise run services-check passes

Open Questions

  • Immutability granularity: Should immutability apply only to semver tags (v*) or also to commit SHA tags? SHA tags are unique by nature, so immutability is only meaningful for version tags.
  • API key rotation: API keys can have expiration dates. Decide on a rotation policy — e.g., annual expiry with a reminder, or no expiry with manual rotation.

Reference Pattern Files

File Purpose
ansible/roles/zot/templates/config.json.j2 Current zot config (no auth)
ansible/roles/zot/tasks/main.yml Zot deployment tasks
ansible/roles/zot/defaults/main.yml Zot default variables
.forgejo/actions/build-push-image/action.yaml Current image push workflow (skopeo)
ansible/roles/minikube/tasks/main.yml Containerd registry mirror config
docs/reference/services/zot.md Zot reference documentation