Replace generic Helm install instructions with kustomize/ArgoCD patterns that reflect how BlumeOps actually deploys Prometheus, Loki, Grafana, and Alloy. Fix "BluemeOps" typos, document Alloy as a core (not optional) component, remove hardcoded admin password, add proper prerequisites and cross-references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7.8 KiB
| title | modified | last-reviewed | tags | |||
|---|---|---|---|---|---|---|
| Observability Stack | 2026-04-06 | 2026-04-06 |
|
Building the Observability Stack
Audiences: Replicator
Prerequisites: kubernetes-bootstrap, argocd-config
This tutorial walks through deploying metrics, logs, and dashboards for your homelab — because you can't fix what you can't see.
The Stack
A complete observability solution has three pillars plus a collection layer:
| Component | Purpose | BlumeOps Uses |
|---|---|---|
| Metrics | Numeric measurements over time | prometheus |
| Logs | Text output from applications | loki |
| Dashboards | Visualization and alerting | grafana |
| Collection | Gathering and forwarding data | alloy |
BlumeOps deploys all of these as plain kustomize manifests managed by ArgoCD — no Helm charts. See no-helm-policy for the rationale and observability for the full reference.
Step 1: Create the Monitoring Namespace
ArgoCD can create this automatically via CreateNamespace=true in the Application spec, but if you're bootstrapping manually:
kubectl create namespace monitoring
Step 2: Deploy Prometheus
Prometheus collects and stores metrics. BlumeOps runs it as a StatefulSet with local persistent storage.
Write the Manifests
Create argocd/manifests/prometheus/ with:
kustomization.yaml— references the manifests and patches the container imagestatefulset.yaml— a single-replica StatefulSet with a 20Gi PVC for/prometheusconfigmap.yaml— theprometheus.ymlscrape configurationservice.yaml— exposes port 9090 within the cluster
Key StatefulSet settings:
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.retention.time=3650d"
- "--web.enable-remote-write-receiver"
- "--web.enable-lifecycle"
The remote-write-receiver flag is important — it lets alloy push metrics into Prometheus from both the host and in-cluster collectors.
Tag the Image
Use your local container registry and the :kustomized sentinel pattern:
# kustomization.yaml
images:
- name: registry.ops.eblu.me/blumeops/prometheus
newTag: v3.10.0-abcdef0
See build-container-image for how to build and tag images.
Create the ArgoCD Application
Add argocd/apps/prometheus.yaml:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: prometheus
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
path: argocd/manifests/prometheus
targetRevision: main
destination:
server: https://kubernetes.default.svc
namespace: monitoring
syncPolicy:
syncOptions:
- CreateNamespace=true
Verify
kubectl -n monitoring get pods -l app.kubernetes.io/name=prometheus
Step 3: Deploy Loki
Loki aggregates logs — think Prometheus, but for log lines instead of metrics.
Write the Manifests
Create argocd/manifests/loki/ with a StatefulSet, ConfigMap, and Service similar to Prometheus. Loki listens on port 3100 (HTTP) and 9096 (gRPC).
The config file (loki-config.yaml) defines storage, compaction, and retention. For a homelab, a simple single-binary mode with local filesystem storage works well — no need for S3 or distributed mode.
Create the ArgoCD Application
Same pattern as Prometheus — point to argocd/manifests/loki, target monitoring namespace.
Step 4: Deploy Grafana
Grafana provides dashboards, visualization, and alerting.
Write the Manifests
Grafana has more moving parts than Prometheus or Loki:
- Deployment with a PVC for
/var/lib/grafana - ConfigMap containing
grafana.ini,datasources.yaml, andalerting.yaml - Dashboard ConfigMaps labeled
grafana_dashboard: "1"— a sidecar container watches for these and auto-loads them - ExternalSecret for the admin password (from 1Password via external-secrets)
Configure data sources declaratively in the ConfigMap:
# datasources.yaml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus.monitoring.svc:9090
isDefault: true
- name: Loki
type: loki
url: http://loki.monitoring.svc:3100
Secrets
Grafana's admin password and any OAuth credentials (for authentik SSO) should come from 1Password via ExternalSecret — never hardcode passwords in manifests. See external-secrets and security-model.
Expose via Caddy
BlumeOps exposes Grafana at grafana.ops.eblu.me through caddy on indri, which reverse-proxies to the Kubernetes service via its Tailscale Ingress endpoint. This is the standard pattern for all services — see routing for details.
Step 5: Deploy Alloy
Grafana Alloy is a unified telemetry collector that replaces multiple agents (Promtail, node_exporter, etc.). BlumeOps runs Alloy in two places — it is not optional; it's the glue that connects everything.
In-Cluster (DaemonSet)
Create argocd/manifests/alloy-k8s/ with:
- DaemonSet — runs on every node, mounts
/var/logread-only for pod log access - ServiceAccount + RBAC — needs pod list/watch for Kubernetes discovery
- ConfigMap — the
config.alloyfile defining:- Kubernetes pod log discovery and collection
- Service health probes (blackbox-style checks for key services)
- Remote write to Prometheus (
/api/v1/write) and Loki (/loki/api/v1/push)
The DaemonSet goes in a dedicated alloy namespace, separate from monitoring.
On the Host (Ansible)
For metrics and logs from native services (Forgejo, Zot, Caddy, Borgmatic), Alloy runs directly on indri as a macOS LaunchAgent, managed by ansible.
The host Alloy collects:
- System metrics via
prometheus.exporter.unix - Logs from Homebrew services and LaunchAgents
- Optional: PostgreSQL metrics, container registry metrics
It pushes to the same Prometheus and Loki endpoints via *.ops.eblu.me.
What You Now Have
- Prometheus scraping metrics from all services
- Loki aggregating logs from all pods and host services
- Grafana with declarative dashboards and data sources
- Alloy collecting from both Kubernetes and the host
- A foundation for alerting via Grafana Unified Alerting
Adding Alerts
BlumeOps uses Grafana Unified Alerting (not Prometheus Alertmanager). Alerts are defined declaratively in alerting.yaml within the Grafana ConfigMap. Notifications go to ntfy — a self-hosted push notification service.
Example alert categories:
- Service probe failures (is Grafana/Prometheus/Loki reachable?)
- Pod readiness (are pods healthy?)
- Metrics freshness (is data still flowing?)
- Storage and resource thresholds
See observability for the full alerting reference.
Adding Dashboards
Import community dashboards or create custom ones. BlumeOps uses a sidecar pattern — any ConfigMap in the monitoring namespace with the label grafana_dashboard: "1" is automatically loaded by Grafana's sidecar container.
Create dashboard ConfigMaps in argocd/manifests/grafana-config/dashboards/:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboard-my-service
labels:
grafana_dashboard: "1"
data:
my-service.json: |
{ ... dashboard JSON ... }
Next Steps
- Set up authentik SSO for Grafana login (see federated-login)
- Create custom dashboards for your services
- Configure alerting rules and notification channels
- Add service-specific metrics exporters
Related
- observability — Full observability reference
- no-helm-policy — Why kustomize instead of Helm
- alloy — Alloy collector reference
- prometheus — Prometheus reference
- loki — Loki reference
- grafana — Grafana reference
- routing — Service routing and exposure