From f05e5cccdffa5f95bba1e50356b1db7efabe58bc Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Mon, 23 Feb 2026 15:06:00 -0800 Subject: [PATCH] Review Grafana: replace Helm upgrade plan with C2 Mikado chain (#258) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Summary - Delete the old 3-phase Helm chart upgrade plan (predates Mikado system) - Create C2 Mikado chain with goal card `upgrade-grafana` and two leaf prereqs: - `kustomize-grafana-deployment` — convert Helm to kustomize manifests - `build-grafana-container` — home-built Grafana 12.x image (no upstream containers) - Record first-ever Grafana review: currently at v11.4.0 on Helm chart 8.8.2 - Update service-versions.yaml, how-to index, and plans index ## Service Review Findings - Grafana is healthy and synced in ArgoCD - Running v11.4.0, latest upstream is 12.3.3 - Breaking changes for 12.x are low-risk (React panels only, UIDs compliant) - PVC is disposable — dashboards and datasources are all config-provisioned ## Deployment and Testing - [ ] No deployment needed — documentation-only change - [ ] `docs-check-links` passes - [ ] `docs-check-index` passes Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/258 --- .../review-grafana-mikado-chain.doc.md | 1 + .../how-to/grafana/build-grafana-container.md | 34 +++ .../grafana/kustomize-grafana-deployment.md | 43 ++++ docs/how-to/grafana/upgrade-grafana.md | 72 ++++++ docs/how-to/how-to.md | 10 +- docs/how-to/plans/plans.md | 1 - .../plans/upgrade-grafana-helm-chart.md | 233 ------------------ service-versions.yaml | 6 +- 8 files changed, 162 insertions(+), 238 deletions(-) create mode 100644 docs/changelog.d/review-grafana-mikado-chain.doc.md create mode 100644 docs/how-to/grafana/build-grafana-container.md create mode 100644 docs/how-to/grafana/kustomize-grafana-deployment.md create mode 100644 docs/how-to/grafana/upgrade-grafana.md delete mode 100644 docs/how-to/plans/upgrade-grafana-helm-chart.md diff --git a/docs/changelog.d/review-grafana-mikado-chain.doc.md b/docs/changelog.d/review-grafana-mikado-chain.doc.md new file mode 100644 index 0000000..cd5f956 --- /dev/null +++ b/docs/changelog.d/review-grafana-mikado-chain.doc.md @@ -0,0 +1 @@ +Replace Grafana Helm upgrade plan with C2 Mikado chain for upgrading to 12.x with kustomize and home-built containers. diff --git a/docs/how-to/grafana/build-grafana-container.md b/docs/how-to/grafana/build-grafana-container.md new file mode 100644 index 0000000..ccec654 --- /dev/null +++ b/docs/how-to/grafana/build-grafana-container.md @@ -0,0 +1,34 @@ +--- +title: Build Grafana Container +status: active +modified: 2026-02-23 +tags: + - how-to + - grafana + - containers +--- + +# Build Grafana Container + +Build a home-built Grafana 12.x container image and publish to the forge registry. + +## Context + +Grafana currently uses the upstream `docker.io/grafana/grafana:11.4.0` image via the Helm chart. Per supply-chain policy, this should be replaced with a locally built image pushed to `forge.ops.eblu.me/eblume/grafana`. + +## Steps + +1. Add a Grafana container build to Dagger (or Nix, following existing patterns) +2. Base on the official Grafana source or use a Nix derivation +3. Tag and push to `forge.ops.eblu.me/eblume/grafana:` +4. Add to `mise run container-list` inventory + +## Reference + +- Follow [[build-container-image]] for the standard container build workflow +- See existing container builds in `.dagger/` for patterns +- The k8s-sidecar image (`quay.io/kiwigrid/k8s-sidecar`) is a secondary concern — address after the main Grafana image + +## Related + +- [[upgrade-grafana]] — Goal card diff --git a/docs/how-to/grafana/kustomize-grafana-deployment.md b/docs/how-to/grafana/kustomize-grafana-deployment.md new file mode 100644 index 0000000..f25d14d --- /dev/null +++ b/docs/how-to/grafana/kustomize-grafana-deployment.md @@ -0,0 +1,43 @@ +--- +title: Kustomize Grafana Deployment +status: active +modified: 2026-02-23 +tags: + - how-to + - grafana +--- + +# Kustomize Grafana Deployment + +Convert Grafana from a Helm chart deployment to plain Kustomize manifests. + +## Context + +Grafana is currently deployed via ArgoCD using a Helm chart (`grafana-8.8.2`) from a forge mirror. The chart produces: Deployment, Service, PVC, ConfigMaps (grafana.ini, datasources), RBAC resources, and a sidecar container for dashboard provisioning. + +## Steps + +1. Template the current Helm chart to see what it produces: + ```fish + # From the forge mirror, or use argocd app manifests + argocd app manifests grafana > /tmp/grafana-helm-output.yaml + ``` +2. Create Kustomize equivalents in `argocd/manifests/grafana/`: + - `kustomization.yaml` + - `deployment.yaml` — Grafana container + k8s-sidecar container + - `service.yaml` + - `pvc.yaml` — Reuse existing 1Gi PVC + - `configmap.yaml` — grafana.ini and datasource provisioning + - `rbac.yaml` — ClusterRole, ClusterRoleBinding, Role, RoleBinding +3. Update `argocd/apps/grafana.yaml` to use a single kustomize source instead of the Helm multi-source +4. Remove the Helm values.yaml (replaced by the kustomize manifests) + +## Notes + +- The existing PVC must not be deleted during the transition — ensure the kustomize PVC matches the existing one's name +- The sidecar (`quay.io/kiwigrid/k8s-sidecar`) should also be replaced with a home-built image eventually, but is lower priority — focus on Grafana itself first +- Preserve all existing config: Authentik OIDC, datasources, dashboard sidecar labels + +## Related + +- [[upgrade-grafana]] — Goal card diff --git a/docs/how-to/grafana/upgrade-grafana.md b/docs/how-to/grafana/upgrade-grafana.md new file mode 100644 index 0000000..4044150 --- /dev/null +++ b/docs/how-to/grafana/upgrade-grafana.md @@ -0,0 +1,72 @@ +--- +title: Upgrade Grafana +status: active +requires: + - kustomize-grafana-deployment + - build-grafana-container +modified: 2026-02-23 +tags: + - how-to + - grafana + - observability +--- + +# Upgrade Grafana + +Upgrade Grafana from 11.4.0 (Helm chart 8.8.2) to 12.x, converting from Helm to Kustomize with a home-built container image. + +## Current State + +| Property | Value | +|----------|-------| +| **Helm chart** | `grafana-8.8.2` (from forge mirror of `grafana/helm-charts`) | +| **Grafana app** | 11.4.0 | +| **Deployment** | Helm via ArgoCD multi-source | +| **Namespace** | `monitoring` | +| **Storage** | SQLite on 1Gi PVC | + +Datasources: [[prometheus]], [[loki]], PostgreSQL (TeslaMate). Dashboard ConfigMaps provisioned via sidecar. + +## Target State + +- Grafana 12.x running from a home-built container (`forge.ops.eblu.me/eblume/grafana`) +- Kustomize manifests in `argocd/manifests/grafana/` (no Helm chart dependency) +- ArgoCD app simplified to a single source (kustomize directory) +- All existing datasources, dashboards, and Authentik OIDC intact + +## Grafana 12 Breaking Changes + +- **Angular plugin removal:** All AngularJS panels force-migrated to React. Our dashboards already use only React panels — no action needed. +- **Datasource UID format enforcement:** UIDs must be alphanumeric + dash/underscore, ≤40 chars. Our UIDs (`prometheus`, `loki`, `TeslaMate`) are compliant. +- **Annotation table migration:** Full rewrite of the `annotation` table (adds `dashboard_uid` column). Small SQLite DB — should be fast. PVC is disposable if anything goes wrong. + +Overall risk: **Low.** + +## Execution + +Once both prerequisites are complete: + +1. Update `argocd/apps/grafana.yaml` to point at the kustomize directory (single source, remove Helm multi-source) +2. Update `argocd/manifests/grafana/` with the kustomize manifests using the home-built image +3. Deploy on branch, verify with checklist below +4. Update `service-versions.yaml` to the new version and today's date + +The SQLite PVC is disposable — dashboards are provisioned from ConfigMaps and datasources from config. No backup needed. + +## Verification Checklist + +- [ ] Pod running: `kubectl --context=minikube-indri -n monitoring get pods -l app.kubernetes.io/name=grafana` +- [ ] UI loads at `https://grafana.ops.eblu.me` +- [ ] Admin login works +- [ ] Authentik OIDC login works +- [ ] Datasources healthy: Prometheus, Loki, TeslaMate (Settings → Datasources → Test each) +- [ ] Key dashboards render: macOS System, Services Health, TeslaMate Overview +- [ ] Sidecar loaded all dashboard ConfigMaps +- [ ] `mise run services-check` passes +- [ ] No errors in pod logs + +## Related + +- [[grafana]] — Service reference card +- [[build-grafana-container]] — Prereq: build the container image +- [[kustomize-grafana-deployment]] — Prereq: create kustomize manifests diff --git a/docs/how-to/how-to.md b/docs/how-to/how-to.md index 5777a94..8ff7aeb 100644 --- a/docs/how-to/how-to.md +++ b/docs/how-to/how-to.md @@ -61,7 +61,7 @@ Migration and transition plans for upcoming infrastructure changes. | [[adopt-dagger-ci]] | Adopt Dagger as CI/CD build engine | | [[upstream-fork-strategy]] | Stacked-branch forking strategy for upstream projects | | [[adopt-oidc-provider]] | Deploy OIDC identity provider for SSO across services | -| [[upgrade-grafana-helm-chart]] | Upgrade Grafana Helm chart from 8.8.2 to 11.x | +| [[upgrade-grafana]] | Upgrade Grafana to 12.x with kustomize and home-built container | | [[operationalize-reolink-camera]] | Cloud-free NVR with Frigate and ring buffer recording | ## Ringtail @@ -95,6 +95,14 @@ Mikado chain for deploying Authentik. Track progress with `mise run docs-mikado - [[create-authentik-secrets]] - [[migrate-grafana-to-authentik]] +## Grafana + +Mikado chain for upgrading Grafana to 12.x with kustomize and home-built containers. Track progress with `mise run docs-mikado upgrade-grafana`. + +- [[upgrade-grafana]] +- [[kustomize-grafana-deployment]] +- [[build-grafana-container]] + ## Forgejo Runner Mikado chain for upgrading the k8s forgejo-runner daemon from v6.3.1 to v12.x. Track progress with `mise run docs-mikado upgrade-k8s-runner`. diff --git a/docs/how-to/plans/plans.md b/docs/how-to/plans/plans.md index 0f295f2..ea3ef7e 100644 --- a/docs/how-to/plans/plans.md +++ b/docs/how-to/plans/plans.md @@ -18,6 +18,5 @@ Plans differ from regular how-to guides in that they describe work that has been | [[add-unifi-pulumi-stack]] | Abandoned | Add Pulumi IaC for UniFi Express 7 (provider bugs — see doc) | | [[upstream-fork-strategy]] | Planned | Stacked-branch forking strategy for tracking upstream projects | | [[adopt-oidc-provider]] | Completed | Deploy OIDC identity provider for SSO across services | -| [[upgrade-grafana-helm-chart]] | Planned | Upgrade Grafana Helm chart from 8.8.2 to 11.x (3 phases) | | [[deploy-authentik]] | Completed | Deploy Authentik IdP — Mikado chain tracked in `how-to/authentik/` | | [[operationalize-reolink-camera]] | Planned | Cloud-free NVR with Frigate, object detection, and ring buffer recording to sifaka | diff --git a/docs/how-to/plans/upgrade-grafana-helm-chart.md b/docs/how-to/plans/upgrade-grafana-helm-chart.md deleted file mode 100644 index 207725c..0000000 --- a/docs/how-to/plans/upgrade-grafana-helm-chart.md +++ /dev/null @@ -1,233 +0,0 @@ ---- -title: "Plan: Upgrade Grafana Helm Chart" -modified: 2026-02-17 -tags: - - how-to - - plans - - grafana - - observability ---- - -# Plan: Upgrade Grafana Helm Chart - -> **Status:** Planned (not yet executed) -> **Phases:** 3 (execute sequentially, each as a separate PR) - -## Background - -Grafana is deployed via ArgoCD at Helm chart version **8.8.2** (Grafana app ~11.5.x). The latest chart is **11.1.7** (Grafana app 12.3.3). The chart has moved from the original `grafana/helm-charts` repository (deprecated 2026-01-30) to `grafana-community/helm-charts`. - -### Current Deployment - -| Property | Value | -|----------|-------| -| **Chart version** | `grafana-8.8.2` | -| **Grafana app** | ~11.5.x | -| **Source** | Forge mirror `eblume/grafana-helm-charts` of `grafana/helm-charts` | -| **ArgoCD app** | `argocd/apps/grafana.yaml` | -| **Values** | `argocd/manifests/grafana/values.yaml` | -| **Namespace** | `monitoring` | -| **Storage** | SQLite on 1Gi PVC | - -Datasources: [[prometheus]], [[loki]], PostgreSQL (TeslaMate). 32 dashboard ConfigMaps provisioned via sidecar. - -### Breaking Change Summary - -| Boundary | What Changes | Impact on Us | -|----------|-------------|--------------| -| **Chart 9.0 (Grafana 12.0)** | Angular plugins removed, datasource UID format enforced, annotation table migration | **Main risk** — but dashboards already use React panels and UIDs are compliant | -| Chart 10.0 | Alert file templating via `tpl` | None — we don't use file-based alerts | -| Chart 11.0 | Min K8s version raised to 1.25, removed old API version fallbacks | None — minikube is modern | -| **Repo migration** | Chart moved to `grafana-community/helm-charts` | Must update forge mirror for 11.x | - -### Grafana 12.0 Application Changes (Detail) - -- **Angular plugin removal:** All AngularJS panels are force-migrated to React at load time. Our dashboards already use only React panel types (`timeseries`, `stat`, `gauge`, `table`, `geomap`, `barchart`, `bargauge`, `logs`, `piechart`, `state-timeline`). No action needed. -- **Datasource UID format enforcement (`failWrongDSUID`):** UIDs must be alphanumeric + dash/underscore, ≤40 chars. Our UIDs (`prometheus`, `loki`, `TeslaMate`) are compliant. Built-in references like `"-- Grafana --"` in dashboard JSON are handled internally and unaffected. -- **Annotation table migration:** Upgrading to 12.x triggers a full-table rewrite of the `annotation` table (adds `dashboard_uid` column). For our small SQLite database this should be fast, but back up the PVC first. -- **`editors_can_admin` removed:** We don't use this setting. No action needed. - -Overall risk: **Low.** No `values.yaml` changes required across any phase. - -### Forge Mirror Situation - -The forge mirror `eblume/grafana-helm-charts` tracks `https://github.com/grafana/helm-charts`. Forgejo mirrors are managed via its built-in async mirror framework — you cannot manually push tags. To update the mirror upstream or pick up new tags, **delete and re-create the mirror** in Forgejo. - -- The old repo (`grafana/helm-charts`) contains tags through `grafana-10.5.15` -- The new repo (`grafana-community/helm-charts`) contains tags for 11.x+ -- The community repo was forked from the original, so it should also contain all historical tags - ---- - -## Phase 1: Upgrade to Chart 8.15.0 (Grafana 11.6.1) - -**Goal:** Validate the upgrade mechanism with zero breaking changes. Stay on Grafana 11.x. - -### Pre-flight - -1. **Sync forge mirror** to pick up tag `grafana-8.15.0`: - - Go to forge.ops.eblu.me → `eblume/grafana-helm-charts` → Settings → Mirror - - Trigger a sync, or if the tag is already present, skip this step - - Verify tag `grafana-8.15.0` exists in the forge repo tags list - - If the tag is not present and sync doesn't fetch it, delete the mirror and re-create it from `https://github.com/grafana/helm-charts` (this should pull all tags including 8.15.0) - -### Steps - -1. Create feature branch `upgrade/grafana-8.15.0` -2. Edit `argocd/apps/grafana.yaml`: - - Change `targetRevision: grafana-8.8.2` → `targetRevision: grafana-8.15.0` -3. Add changelog fragment `docs/changelog.d/upgrade-grafana-8.15.0.infra.md` -4. Commit, push, create PR via `tea pr create` -5. **Deploy on branch:** - ```fish - argocd app set grafana --revision upgrade/grafana-8.15.0 - argocd app diff grafana - argocd app sync grafana - ``` -6. **Verify** (see Verification Checklist below) -7. After merge: - ```fish - argocd app set grafana --revision main - argocd app sync grafana - ``` - -### Files Modified - -- `argocd/apps/grafana.yaml` (targetRevision) -- `docs/changelog.d/upgrade-grafana-8.15.0.infra.md` (new) - ---- - -## Phase 2: Upgrade to Chart 9.4.5 (Grafana 12.1.1) - -**Goal:** Cross the Grafana 11→12 boundary. This is the main breaking change phase — triggers Angular removal, UID enforcement, and annotation table migration. - -### Pre-flight - -1. **Back up Grafana PVC** before upgrading: - ```fish - kubectl --context=minikube-indri -n monitoring exec deploy/grafana -- \ - sqlite3 /var/lib/grafana/grafana.db ".backup '/var/lib/grafana/grafana-backup.db'" - ``` -2. Verify forge mirror has tag `grafana-9.4.5` (should exist — it's in the old repo) - -### Steps - -1. Create feature branch `upgrade/grafana-9.4.5` -2. Edit `argocd/apps/grafana.yaml`: - - Change `targetRevision: grafana-8.15.0` → `targetRevision: grafana-9.4.5` -3. Add changelog fragment `docs/changelog.d/upgrade-grafana-9.4.5.infra.md` -4. Commit, push, create PR -5. **Deploy on branch:** - ```fish - argocd app set grafana --revision upgrade/grafana-9.4.5 - argocd app sync grafana - ``` -6. **Thorough verification:** - - Watch pod logs for annotation table migration messages: - ```fish - kubectl --context=minikube-indri -n monitoring logs -f deploy/grafana | head -100 - ``` - - Verify all 3 datasources connect: Grafana UI → Settings → Datasources → Test each - - Spot-check key dashboards: macOS System, Services Health, TeslaMate Overview - - Check pod logs for UID format errors (unlikely but possible) - - Run `mise run services-check` -7. **If issues:** Restore backup and roll back `targetRevision`: - ```fish - kubectl --context=minikube-indri -n monitoring exec deploy/grafana -- \ - cp /var/lib/grafana/grafana-backup.db /var/lib/grafana/grafana.db - # Then set targetRevision back to grafana-8.15.0 and sync - ``` -8. After successful verification, clean up backup: - ```fish - kubectl --context=minikube-indri -n monitoring exec deploy/grafana -- \ - rm /var/lib/grafana/grafana-backup.db - ``` -9. After merge: set revision to main and sync - -### Files Modified - -- `argocd/apps/grafana.yaml` (targetRevision) -- `docs/changelog.d/upgrade-grafana-9.4.5.infra.md` (new) - ---- - -## Phase 3: Upgrade to Chart 11.1.7 (Grafana 12.3.3) - -**Goal:** Get to the latest chart from the new community repository. No new breaking changes — this is a repo migration + version bump. - -### Pre-flight: Update Forge Mirror - -The `grafana-community/helm-charts` repo is a fork of the original, so it should contain all historical tags plus the new 11.x tags. - -1. **Delete the existing forge mirror** `eblume/grafana-helm-charts` -2. **Re-create it** as a mirror of `https://github.com/grafana-community/helm-charts` -3. Wait for the initial mirror sync to complete -4. Verify tag `grafana-11.1.7` exists in the forge repo tags list -5. Also verify that older tags (e.g., `grafana-9.4.5`) are still present — if the community repo doesn't carry old tags, we need to handle that before proceeding - -> **Fallback:** If the community repo doesn't have old tags, create a second forge mirror (e.g., `eblume/grafana-community-helm-charts`) and update the ArgoCD app's `repoURL` to point to it. - -### Steps - -1. Create feature branch `upgrade/grafana-11.1.7` -2. Edit `argocd/apps/grafana.yaml`: - - Change `targetRevision: grafana-9.4.5` → `targetRevision: grafana-11.1.7` - - Update comment: `# Chart mirrored from https://github.com/grafana-community/helm-charts to forge` - - If a new mirror repo was created (fallback), also update `repoURL` -3. Edit `argocd/manifests/grafana/values.yaml`: - - Update comment: `# Chart: https://github.com/grafana-community/helm-charts/tree/main/charts/grafana` -4. Update `docs/reference/services/grafana.md`: - - Note chart version and new upstream source -5. Update `docs/reference/services/forgejo.md`: - - Update mirror reference if repo name changed -6. Add changelog fragment `docs/changelog.d/upgrade-grafana-11.1.7.infra.md` -7. Commit, push, create PR -8. Deploy on branch and verify (standard check — no new breaking changes) -9. After merge: set revision to main and sync - -### Files Modified - -- `argocd/apps/grafana.yaml` (targetRevision, comment, possibly repoURL) -- `argocd/manifests/grafana/values.yaml` (comment only) -- `docs/reference/services/grafana.md` -- `docs/reference/services/forgejo.md` (if mirror name changed) -- `docs/changelog.d/upgrade-grafana-11.1.7.infra.md` (new) - ---- - -## Verification Checklist (All Phases) - -After each phase: - -- [ ] Pod is running: `kubectl --context=minikube-indri -n monitoring get pods -l app.kubernetes.io/name=grafana` -- [ ] UI loads at `https://grafana.ops.eblu.me` and/or `https://grafana.tail8d86e.ts.net` -- [ ] Admin login works -- [ ] Datasources healthy: Settings → Datasources → Test each (Prometheus, Loki, TeslaMate) -- [ ] Key dashboards render: macOS System, Services Health, TeslaMate Overview -- [ ] Sidecar loaded all 32 dashboard ConfigMaps (check dashboard list) -- [ ] `mise run services-check` passes -- [ ] No errors in pod logs: `kubectl --context=minikube-indri -n monitoring logs deploy/grafana --tail=50` - -## Open Questions - -- **Forge mirror tags:** Will the `grafana-community/helm-charts` mirror include all historical tags from the original repo? Verify during Phase 3 pre-flight. If not, use the fallback approach (separate mirror). -- **Chart pinning strategy:** After reaching 11.1.7, decide whether to track the latest tag going forward or continue pinning to specific versions. Pinning is safer for GitOps. - -## Reference Files - -| File | Purpose | -|------|---------| -| `argocd/apps/grafana.yaml` | ArgoCD Application (chart source + version) | -| `argocd/apps/grafana-config.yaml` | ArgoCD Application (dashboards, ingress, secrets) | -| `argocd/manifests/grafana/values.yaml` | Helm values | -| `argocd/manifests/grafana-config/` | ConfigMaps, ExternalSecrets, Ingress | -| `docs/reference/services/grafana.md` | Service reference card | -| `docs/reference/services/forgejo.md` | Forge mirror inventory | - -## Related - -- [[grafana]] — Service reference card -- [[prometheus]] — Metrics datasource -- [[loki]] — Logs datasource -- [[apps]] — ArgoCD application inventory diff --git a/service-versions.yaml b/service-versions.yaml index f73424b..69020e6 100644 --- a/service-versions.yaml +++ b/service-versions.yaml @@ -82,10 +82,10 @@ services: - name: grafana type: argocd - last-reviewed: null - current-version: null + last-reviewed: 2026-02-23 + current-version: "v11.4.0" upstream-source: https://github.com/grafana/grafana/releases - notes: Deployed via Helm chart + notes: Helm chart 8.8.2; Mikado chain to upgrade to 12.x with kustomize - name: cloudnative-pg type: argocd