11 KiB
Phase 2: Grafana Migration (Pilot)
Goal: Migrate Grafana as lowest-risk pilot service
Status: Complete (2026-01-19)
Prerequisites: Phase 1 complete
Overview
This phase migrates Grafana from Homebrew/Ansible on indri to Kubernetes, establishing the pattern for future service migrations. Additionally, we establish the pattern of mirroring Helm chart repositories to forge for resilience and GitOps consistency.
Key Decisions
Helm Chart Mirroring
Problem: P1 uses external Helm repos which creates external dependencies.
Solution: Mirror Helm chart Git repositories to forge, reference charts from git path.
ArgoCD auto-detects Helm charts when a directory contains Chart.yaml. No build step needed.
| Chart | Upstream Git Repo | Forge Mirror | Chart Path |
|---|---|---|---|
| cloudnative-pg | github.com/cloudnative-pg/charts |
forge/eblume/cloudnative-pg-charts |
charts/cloudnative-pg/ |
| grafana | github.com/grafana/helm-charts |
forge/eblume/grafana-helm-charts |
charts/grafana/ |
Database Storage
Use SQLite with 1Gi PVC (not k8s PostgreSQL). Grafana stores minimal persistent data and dashboards are git-provisioned.
Datasource URLs
From k8s pods, use host.containers.internal to reach indri services:
- Prometheus:
http://host.containers.internal:9090 - Loki:
http://host.containers.internal:3100(requires ansible change to bind 0.0.0.0)
Ingress
Tailscale Ingress with Let's Encrypt TLS (following ArgoCD pattern), with crio-compat proxy class.
Secrets Management
Admin password stored in 1Password, injected manually via op inject. Future: migrate to External Secrets Operator or similar.
Prerequisites
0.1 Mirror Helm Chart Repos to Forge
User action: Create mirrors in forge:
-
CloudNativePG charts (fix existing P1 app):
- Mirror:
https://github.com/cloudnative-pg/charts - To:
forge.tail8d86e.ts.net/eblume/cloudnative-pg-charts
- Mirror:
-
Grafana helm-charts (new):
- Mirror:
https://github.com/grafana/helm-charts - To:
forge.tail8d86e.ts.net/eblume/grafana-helm-charts
- Mirror:
0.2 Update Loki to Bind 0.0.0.0
File: ansible/roles/loki/templates/loki-config.yaml.j2
Add under server::
http_listen_address: 0.0.0.0
Deploy: mise run provision-indri -- --tags loki
Steps
1. Fix CloudNativePG to Use Forge Mirror
Update argocd/apps/cloudnative-pg.yaml to use forge-mirrored chart:
sources:
- repoURL: ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/cloudnative-pg-charts.git
targetRevision: cloudnative-pg-0.23.0 # git tag
path: charts/cloudnative-pg
helm:
releaseName: cloudnative-pg
valueFiles:
- $values/argocd/manifests/cloudnative-pg/values.yaml
- repoURL: ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git
targetRevision: main
ref: values
2. Create Grafana Helm Values
File: argocd/manifests/grafana/values.yaml
admin:
existingSecret: grafana-admin
userKey: admin-user
passwordKey: admin-password
persistence:
enabled: true
type: pvc
size: 1Gi
grafana.ini:
server:
root_url: https://grafana.tail8d86e.ts.net
analytics:
check_for_updates: false
reporting_enabled: false
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
uid: prometheus
url: http://host.containers.internal:9090
isDefault: true
editable: false
- name: Loki
type: loki
access: proxy
uid: loki
url: http://host.containers.internal:3100
editable: false
sidecar:
dashboards:
enabled: true
label: grafana_dashboard
labelValue: "1"
service:
type: ClusterIP
port: 80
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
3. Create Grafana ArgoCD Application
File: argocd/apps/grafana.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: grafana
namespace: argocd
spec:
project: default
sources:
- repoURL: ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/grafana-helm-charts.git
targetRevision: grafana-8.8.2
path: charts/grafana
helm:
releaseName: grafana
valueFiles:
- $values/argocd/manifests/grafana/values.yaml
- repoURL: ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git
targetRevision: main
ref: values
destination:
server: https://kubernetes.default.svc
namespace: monitoring
syncPolicy:
syncOptions:
- CreateNamespace=true
4. Create Grafana Config Application
File: argocd/apps/grafana-config.yaml
Deploys Tailscale Ingress and Dashboard ConfigMaps from argocd/manifests/grafana-config/.
5. Create Grafana Config Manifests
Directory: argocd/manifests/grafana-config/
Contents:
kustomization.yamlingress-tailscale.yaml- Tailscale Ingress forgrafana.tail8d86e.ts.netsecret-admin.yaml.tpl- Admin password template (1Password-backed)README.md- Notes on secrets managementdashboards/configmap-*.yaml- 9 dashboard ConfigMaps
Ingress:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana-tailscale
namespace: monitoring
annotations:
tailscale.com/proxy-class: "crio-compat"
spec:
ingressClassName: tailscale
defaultBackend:
service:
name: grafana
port:
number: 80
tls:
- hosts:
- grafana
Secret template (secret-admin.yaml.tpl):
# Apply: op inject -i secret-admin.yaml.tpl | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: grafana-admin
namespace: monitoring
type: Opaque
stringData:
admin-user: admin
admin-password: {{ op://vg6xf6vvfmoh5hqjjhlhbeoaie/oxkcr3xtxnewy7noep2izvyr6y/password }}
Dashboard ConfigMaps: Convert each JSON from ansible/roles/grafana/files/dashboards/ to ConfigMap with label grafana_dashboard: "1".
6. Deploy to Kubernetes
# Create namespace and secret
ki create namespace monitoring
op inject -i argocd/manifests/grafana-config/secret-admin.yaml.tpl | ki apply -f -
# Push changes and sync
argocd app sync grafana
argocd app sync grafana-config
7. Tailscale Service Cutover
Remove svc:grafana from ansible/roles/tailscale_serve/defaults/main.yml, then:
mise run provision-indri -- --tags tailscale-serve
8. Stop Brew Grafana
ssh indri 'brew services stop grafana'
9. Retire Ansible Grafana Role
Once k8s Grafana is verified working:
-
Remove role from playbook - Delete grafana role entry from
ansible/playbooks/indri.yml -
Delete the role directory -
rm -rf ansible/roles/grafana/ -
Update zk documentation - Note in
~/code/personal/zk/1767747119-YCPO.mdthat Grafana is now k8s-hosted
New Files
| Path | Purpose |
|---|---|
argocd/apps/grafana.yaml |
Grafana Helm chart Application |
argocd/apps/grafana-config.yaml |
Grafana config Application |
argocd/manifests/grafana/values.yaml |
Helm values |
argocd/manifests/grafana-config/kustomization.yaml |
Kustomize config |
argocd/manifests/grafana-config/ingress-tailscale.yaml |
Tailscale Ingress |
argocd/manifests/grafana-config/secret-admin.yaml.tpl |
Admin password template |
argocd/manifests/grafana-config/README.md |
Secrets management notes |
argocd/manifests/grafana-config/dashboards/configmap-*.yaml |
9 dashboard ConfigMaps |
Modified Files
| Path | Change |
|---|---|
argocd/apps/cloudnative-pg.yaml |
Switch to forge-mirrored chart |
ansible/roles/loki/templates/loki-config.yaml.j2 |
Add http_listen_address: 0.0.0.0 |
ansible/roles/tailscale_serve/defaults/main.yml |
Remove svc:grafana |
ansible/playbooks/indri.yml |
Remove grafana role |
Deleted Files
| Path | Reason |
|---|---|
ansible/roles/grafana/ |
Replaced by k8s deployment |
Verification
- Loki accessible from k8s pods
- Prometheus accessible from k8s pods
- Grafana pod running in
monitoringnamespace - Grafana Ingress active
- https://grafana.tail8d86e.ts.net loads
- All 9 dashboards visible
- Prometheus datasource queries work
- Loki datasource queries work
Rollback
- Re-add
svc:grafanato ansible tailscale_serve mise run provision-indri -- --tags tailscale-serve,grafanaargocd app delete grafana grafana-config --cascade
Implementation Notes
Added during implementation for retrospective review
SSH Credential Management
Issue: Initial plan used HTTPS URLs for forge-mirrored Helm chart repos, but ArgoCD in cluster couldn't resolve forge.tail8d86e.ts.net (MagicDNS not available inside cluster).
Solution: Use SSH URLs for all forge repos. Created a credential template (repo-creds-forge) that matches all repos under ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/ using URL prefix matching. This allows a single SSH key (added to Forgejo user, not as deploy key) to work for all repos.
SSH Host Key for ArgoCD
Issue: ArgoCD's known_hosts didn't include indri's SSH host key, causing knownhosts: key is unknown errors.
Solution: Added argocd-ssh-known-hosts-cm.yaml as a kustomize patch to include indri's host key alongside the upstream defaults.
Gotcha: Kustomize patches must not specify namespace - the namespace transformation happens after patch matching. Our patch had namespace: argocd which caused "no matches for Id" errors until removed.
Tailscale Hostname Cutover
Issue: After removing svc:grafana from ansible's tailscale_serve config, the k8s Ingress still got a numbered hostname (grafana-1.tail8d86e.ts.net).
Solution: The old svc:grafana service remained registered in Tailscale admin console even after clearing its serve config. Manual deletion in Tailscale admin console was required to free the grafana hostname for the k8s Ingress to claim. After deletion, recreating the Ingress picked up the correct hostname.
ArgoCD Workflow Decision
During implementation, we established the pattern for GitOps workflow:
- All apps target
mainbranch (not feature branches) - Manual sync policy on workload apps = merge doesn't auto-deploy
- Workflow: feature branch → PR → merge to main →
argocd app sync <name> - For testing: temporarily set one app to feature branch via
argocd app set --revision
This avoids the friction of switching targetRevision in manifests during development.
Bootstrap Dependencies
Some resources must be applied manually before ArgoCD can manage itself:
- SSH known_hosts - chicken-and-egg: ArgoCD can't sync the config that adds the host key
- Credential secrets -
repo-creds-forgemust exist before ArgoCD can pull from forge
These are documented in argocd/manifests/argocd/README.md as bootstrap steps.
Actual Versions Used
- Grafana Helm chart:
grafana-8.8.2(tag in grafana-helm-charts repo) - CloudNativePG Helm chart:
cloudnative-pg-v0.23.0(tag in cloudnative-pg-charts repo) - Grafana version: 11.4.0