blumeops/docs/zk/1768506761-GHUW.md
Erich Blume 278f231563 Switch to title-based wiki-links (Quartz resolves via frontmatter title and aliases)
- Remove aliases from all zk cards to prevent them from capturing wiki-links
- Convert all wiki-links from [[filename|Title]] to [[Title]] format
- Replace doc-filenames task with doc-titles for duplicate detection
- Update pre-commit hook to use doc-titles

Wiki-links now resolve to reference docs by their frontmatter title,
which is more readable and maintainable than filename-based links.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 15:48:15 -08:00

5.3 KiB

id tags
1768506761-GHUW
blumeops

Grafana Alloy Management Log

Grafana Alloy is a unified observability collector with two deployments:

  1. Indri (host) - System metrics and service logs from macOS host
  2. Kubernetes (DaemonSet) - Automatic pod log collection and service health probes

Service Details

  • Binary: ~/.local/bin/alloy (built from source with CGO_ENABLED=1)
  • Config: ~/.config/grafana-alloy/config.alloy
  • Data: ~/.local/share/grafana-alloy/
  • Logs: ~/Library/Logs/mcquack.alloy.{out,err}.log
  • Managed via: mcquack LaunchAgent (mcquack.eblume.alloy)

Why built from source? The Homebrew bottle is built with CGO_ENABLED=0, which uses Go's pure DNS resolver. This resolver reads /etc/resolv.conf directly and ignores macOS /etc/resolver/* files, breaking Tailscale MagicDNS hostname resolution. Building with CGO_ENABLED=1 uses the macOS native resolver.

What Alloy Collects

Metrics

  • System metrics via prometheus.exporter.unix (same metrics as node_exporter)
  • Textfile collector reads from /opt/homebrew/var/node_exporter/textfile/
    • minikube.prom - Minikube cluster status
    • borgmatic.prom - Backup status metrics
    • zot.prom - Container registry metrics
    • jellyfin.prom - Jellyfin media server metrics
  • Zot registry metrics scraped from http://localhost:5050/metrics
  • Metrics pushed to Prometheus (k8s) via remote_write at https://prometheus.tail8d86e.ts.net/api/v1/write

Logs

Collects logs from all services on Indri:

Brew services:

  • forgejo
  • tailscale

mcquack LaunchAgents:

  • alloy (stdout/stderr)
  • borgmatic (stdout/stderr)
  • zot (stdout/stderr)
  • jellyfin (stdout/stderr)

Logs pushed to Loki (k8s) at https://loki.tail8d86e.ts.net/loki/api/v1/push.

Useful Commands

# Check service status
ssh indri 'launchctl list | grep alloy'

# View alloy logs
ssh indri 'tail -f ~/Library/Logs/mcquack.alloy.err.log'

# Restart service
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.alloy.plist && launchctl load ~/Library/LaunchAgents/mcquack.eblume.alloy.plist'

Building from Source

Alloy must be built with CGO to use macOS native DNS resolver (required for Tailscale MagicDNS):

# On gilbert (dev workstation):
git clone ssh://forgejo@forge.tail8d86e.ts.net/eblume/alloy.git ~/code/3rd/alloy
cd ~/code/3rd/alloy && mise use go@1.25 node yarn
mise x -- make alloy
scp ~/code/3rd/alloy/build/alloy indri:~/.local/bin/alloy

Then run ansible to deploy the config and LaunchAgent.

Ansible Management (Indri)

Alloy on Indri is managed via ansible in 1767747119-YCPO.

mise run provision-indri -- --tags alloy

Kubernetes Alloy (alloy-k8s)

A separate Alloy DaemonSet runs in k8s for:

  • Automatic pod log collection - discovers and collects logs from all pods
  • Service health probes - HTTP blackbox probes for k8s services

Service Details (k8s)

  • Namespace: alloy
  • Image: grafana/alloy:v1.8.2
  • ArgoCD app: alloy-k8s
  • Manifests: argocd/manifests/alloy-k8s/

What k8s Alloy Collects

Pod logs (automatic discovery):

  • All pods in all namespaces via loki.source.kubernetes
  • Labels: namespace, pod, container, node

Service health probes:

  • miniflux, kiwix, transmission, devpi, argocd
  • Metrics: probe_success, probe_duration_seconds
  • Labels: job="integrations/blackbox/<service>"

Useful Commands (k8s Alloy)

# View alloy-k8s logs
kubectl --context=minikube-indri -n alloy logs -f daemonset/alloy

# Check running config
kubectl --context=minikube-indri -n alloy get configmap alloy-config -o yaml

# Sync from ArgoCD
argocd app sync alloy-k8s

Log

Wed Jan 22 2026 (later)

  • Added Alloy k8s DaemonSet for automatic pod log collection
  • Logs from all k8s pods now forwarded to Loki with automatic discovery
  • Added service health probes for miniflux, kiwix, transmission, devpi, argocd
  • New "Services Health" Grafana dashboard shows probe metrics
  • Deleted stale textfile metrics (devpi.prom, transmission.prom) from indri
  • Deleted stale data directories (/opt/homebrew/var/loki, /opt/homebrew/var/prometheus)

Wed Jan 22 2026

  • Rebuilt from source with CGO_ENABLED=1 - required for Tailscale MagicDNS resolution
  • Migrated from Homebrew to mcquack LaunchAgent management
  • Updated remote_write to push to k8s Prometheus at prometheus.tail8d86e.ts.net
  • Updated log push to k8s Loki at loki.tail8d86e.ts.net
  • Removed prometheus/loki log collection (now running in k8s)
  • Binary now at ~/.local/bin/alloy, config at ~/.config/grafana-alloy/
  • Added build instructions to ansible role defaults

Mon Jan 20 2026

  • Removed devpi log collection (devpi migrated to k8s)
  • Removed devpi.prom textfile collection (metrics role retired)
  • Removed grafana log collection (grafana migrated to k8s in P2)

Wed Jan 15 2026

  • Initial setup replacing node_exporter
  • Configured metrics push via remote_write to Prometheus
  • Configured log collection for all services, forwarding to Loki

Thu Jan 30 2026

  • Removed Plex log and metrics collection (replaced by Jellyfin)
  • Added Jellyfin log collection via mcquack LaunchAgent logs
  • Added jellyfin.prom textfile metrics

Wed Jan 15 2026 (later)

  • Added Plex Media Server log collection (removed 2026-01-30)
  • Added plex.prom metrics from plex_metrics role (removed 2026-01-30)