# Phase 0: Foundation (Complete)

**Goal**: Container registry + minikube cluster without disrupting existing services

**Status**: Complete

---

## Important: Tailscale Service Creation Order

> **WARNING**: You MUST create services in the Tailscale admin console BEFORE running `tailscale serve` commands via ansible. If you run `tailscale serve --service svc:foo` before the service exists in the admin console, the local config will be in a bad state.
>
> To fix a misconfigured service:
> ```bash
> tailscale serve --service svc:foo reset
> ```
> Then create the service in admin console and try again.

---

## Step 0.1: Update Pulumi ACLs (BEFORE Tailscale serve)

**Files to modify:**
- `pulumi/policy.hujson`

**Changes:**

1. Add new tag to `tagOwners` section (around line 104, after `"tag:feed"`):
```hujson
"tag:registry": ["autogroup:admin", "tag:blumeops"],
```

2. Add test cases to `tests` section:
   - Update Erich's accept list (around line 111) to include registry:
   ```hujson
   "accept": ["tag:grafana:443", "tag:kiwix:443", "tag:feed:443", "tag:loki:3100", "tag:pg:5432", "tag:homelab:22", "tag:registry:443"],
   ```
   - Update Allison's deny list (around line 117) to deny registry:
   ```hujson
   "deny": ["tag:grafana:443", "tag:loki:3100", "tag:nas:445", "tag:registry:443"],
   ```

**Note:**
- No member grant needed - admins have full access via wildcard, members don't need registry
- `tag:k8s` is added later in Phase 1 when the Tailscale Kubernetes Operator is deployed
- Zot supports htpasswd auth if we later need finer-grained control

**Testing:**
```bash
mise run tailnet-preview   # Review changes - should show new tag
mise run tailnet-up        # Apply changes
```

**Implementation Details:**
- Also need to add `"tag:registry"` to indri's tags in `pulumi/__main__.py` (the `DeviceTags` resource), not just define it in `policy.hujson`. The policy file defines the tag ownership rules, but the device tags are managed separately in the Python code.

---

## Step 0.2: Create Tailscale Services in Admin Console (MANUAL)

> **CRITICAL**: Do this BEFORE running any ansible that calls `tailscale serve`

1. Go to https://login.tailscale.com/admin/services
2. Create service `registry` with:
   - Port: 443 (HTTPS)
   - Host: indri

**Implementation Details:**
- Tag is applied to indri via Pulumi in Step 0.1, not manually in admin console.

**Verification:**
```bash
# Service should appear (even if not yet serving)
tailscale status | grep registry
```

---

## Step 0.3: Create Zot Registry Ansible Role

**Note:** Zot is NOT in homebrew (no formula or tap). Clone to `~/code/3rd/` on indri and build from source (requires Go).

**Prerequisites on indri (ALREADY COMPLETED):**
```bash
# Clone zot from forge mirror (use localhost:3001 - hairpinning doesn't work on indri)
ssh indri 'git clone http://localhost:3001/eblume/zot.git ~/code/3rd/zot'

# Set up Go via mise (creates mise.toml in repo directory)
ssh indri 'cd ~/code/3rd/zot && mise use go@1.25'

# Build (creates bin/zot-darwin-arm64, ~183MB)
ssh indri 'cd ~/code/3rd/zot && mise x -- make binary'

# Verify binary exists
ssh indri 'ls -la ~/code/3rd/zot/bin/zot-darwin-arm64'
```

**Build verified:** Binary at `~/code/3rd/zot/bin/zot-darwin-arm64` (183MB, ARM64 native).

**New files:**
```
ansible/roles/zot/
├── defaults/main.yml
├── tasks/main.yml
├── templates/
│   ├── config.json.j2
│   └── zot.plist.j2
└── handlers/main.yml
```

**Key configuration (defaults/main.yml):**
```yaml
zot_repo_dir: "/Users/erichblume/code/3rd/zot"
zot_binary: "{{ zot_repo_dir }}/bin/zot-darwin-arm64"
zot_data_dir: "/Users/erichblume/zot"
zot_config_dir: "/Users/erichblume/.config/zot"
zot_port: 5000
zot_log_dir: "/Users/erichblume/Library/Logs"

# Pull-through cache registries (on-demand sync)
zot_sync_registries:
  - name: docker.io
    url: https://registry-1.docker.io
  - name: ghcr.io
    url: https://ghcr.io
  - name: quay.io
    url: https://quay.io
```

**Zot config.json template** (key sections):
```json
{
  "storage": {
    "rootDirectory": "/Users/erichblume/zot"
  },
  "http": {
    "address": "0.0.0.0",
    "port": "5000"
  },
  "extensions": {
    "sync": {
      "enable": true,
      "registries": [
        {
          "urls": ["https://registry-1.docker.io"],
          "content": [{"prefix": "**"}],
          "onDemand": true,
          "tlsVerify": true
        },
        {
          "urls": ["https://ghcr.io"],
          "content": [{"prefix": "**"}],
          "onDemand": true,
          "tlsVerify": true
        },
        {
          "urls": ["https://quay.io"],
          "content": [{"prefix": "**"}],
          "onDemand": true,
          "tlsVerify": true
        }
      ]
    }
  }
}
```

**Two modes of operation:**

1. **Pull-through cache** (automatic): When you pull `registry.tail8d86e.ts.net/docker.io/library/nginx:latest`, Zot fetches from Docker Hub and caches locally. Subsequent pulls are local.

2. **Private images** (manual push): Push your own images to any path NOT matching a sync prefix:
   ```bash
   # From gilbert (after building)
   podman push myapp:v1 registry.tail8d86e.ts.net/blumeops/myapp:v1
   ```

**Namespace convention:**
- `registry.tail8d86e.ts.net/docker.io/*` → cached from Docker Hub
- `registry.tail8d86e.ts.net/ghcr.io/*` → cached from GHCR
- `registry.tail8d86e.ts.net/quay.io/*` → cached from Quay
- `registry.tail8d86e.ts.net/blumeops/*` → private images (built by you/Woodpecker)

**LaunchAgent template (zot.plist.j2):**
```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "...">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>mcquack.eblume.zot</string>
    <key>ProgramArguments</key>
    <array>
        <!-- ABSOLUTE PATH to built binary in ~/code/3rd/zot -->
        <string>{{ zot_binary }}</string>
        <string>serve</string>
        <string>{{ zot_config_dir }}/config.json</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>StandardOutPath</key>
    <string>{{ zot_log_dir }}/mcquack.zot.out.log</string>
    <key>StandardErrorPath</key>
    <string>{{ zot_log_dir }}/mcquack.zot.err.log</string>
</dict>
</plist>
```

**Handlers (handlers/main.yml):**
```yaml
- name: Restart zot
  ansible.builtin.shell: |
    launchctl unload ~/Library/LaunchAgents/mcquack.eblume.zot.plist 2>/dev/null || true
    launchctl load ~/Library/LaunchAgents/mcquack.eblume.zot.plist
  changed_when: true
```

**Tasks should notify handler on config change:**
```yaml
- name: Deploy zot config
  ansible.builtin.template:
    src: config.json.j2
    dest: "{{ zot_config_dir }}/config.json"
  notify: Restart zot
```

**Testing (after deploying role):**
```bash
# Check LaunchAgent is running
ssh indri 'launchctl list | grep zot'

# Check zot is responding
ssh indri 'curl -s http://localhost:5000/v2/_catalog'
# Expected: {"repositories":[]}

# Check logs for errors
ssh indri 'tail -20 ~/Library/Logs/mcquack.zot.err.log'

# Test pull-through cache via curl (podman not installed until Step 0.8)
ssh indri 'curl -s http://localhost:5000/v2/docker.io/library/alpine/manifests/latest -H "Accept: application/vnd.oci.image.manifest.v1+json"'
# Should return manifest JSON (triggers cache fetch from Docker Hub)
ssh indri 'curl -s http://localhost:5000/v2/_catalog'
# Expected: {"repositories":["docker.io/library/alpine"]}
```

**Implementation Details:**
- Changed port from 5000 to 5050 because macOS ControlCenter (AirPlay Receiver) uses port 5000 by default.
- Fixed sync config: use `"content": [{"prefix": "**", "destination": "/{{ registry.name }}"}]` instead of `"prefix": "{{ registry.name }}/**"`. The destination rewrites the local path, while prefix `**` matches all upstream repos.

---

## Step 0.4: Add Zot to Tailscale Serve

**Files to modify:**
- `ansible/roles/tailscale_serve/defaults/main.yml`

**Changes:**
```yaml
# Add to tailscale_serve_services list
- name: svc:registry
  https:
    port: 443
    upstream: http://localhost:5000
```

**Testing:**
```bash
# Deploy tailscale serve config
mise run provision-indri -- --tags tailscale-serve

# Verify from gilbert (not indri - hairpinning doesn't work)
curl -s https://registry.tail8d86e.ts.net/v2/_catalog
# Expected: {"repositories":["docker.io/library/alpine"]} (from Step 0.3 test)

# Test private image push from gilbert
podman pull alpine:latest
podman tag alpine:latest registry.tail8d86e.ts.net/blumeops/test:v1
podman push registry.tail8d86e.ts.net/blumeops/test:v1
curl -s https://registry.tail8d86e.ts.net/v2/_catalog
# Expected: {"repositories":["blumeops/test","docker.io/library/alpine"]}
```

**Implementation Details:**
- Changed upstream port from 5000 to 5050 (see Step 0.3 implementation details).
- After running `tailscale serve`, the service must be approved in Tailscale admin console at https://login.tailscale.com/admin/services before it becomes accessible.
- Podman needed on gilbert for testing - added to Brewfile. Requires `podman machine init && podman machine start` after install.

---

## Step 0.5: Create Zot Metrics Role

**New files:**
```
ansible/roles/zot_metrics/
├── defaults/main.yml
├── tasks/main.yml
├── templates/
│   ├── zot-metrics.sh.j2
│   └── zot-metrics.plist.j2
└── handlers/main.yml
```

**Metrics script pattern (zot-metrics.sh.j2):**
```bash
#!/bin/bash
# Collect Zot registry metrics for Prometheus textfile collector
set -euo pipefail

METRICS_FILE="/opt/homebrew/var/node_exporter/textfile/zot.prom"
TEMP_FILE="${METRICS_FILE}.tmp"

# Check if zot is up
if curl -sf http://localhost:5000/v2/_catalog > /dev/null 2>&1; then
    echo "zot_up 1" > "$TEMP_FILE"
else
    echo "zot_up 0" > "$TEMP_FILE"
fi

mv "$TEMP_FILE" "$METRICS_FILE"
```

**Note:** Start with just `zot_up` for now. Additional metrics (storage usage, cache stats) can be added later after reviewing zot's metrics endpoint.

**Testing:**
```bash
# Deploy metrics role
mise run provision-indri -- --tags zot_metrics

# Check metrics file exists and is updated
ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/zot.prom'
# Expected: zot_up 1

# Verify metrics appear in Prometheus (after a scrape cycle)
curl -s "http://indri:9090/api/v1/query?query=zot_up" | jq '.data.result[0].value[1]'
# Expected: "1"
```

---

## Step 0.6: Add Zot Log Collection to Alloy

**Files to modify:**
- `ansible/roles/alloy/defaults/main.yml`

**Changes:**
Add to the `alloy_mcquack_logs` list:
```yaml
  - path: /Users/erichblume/Library/Logs/mcquack.zot.out.log
    service: zot
    stream: stdout
  - path: /Users/erichblume/Library/Logs/mcquack.zot.err.log
    service: zot
    stream: stderr
```

**Testing:**
```bash
# Deploy alloy config (handler restarts alloy automatically if config changed)
mise run provision-indri -- --tags alloy

# Wait a minute, then check Loki for zot logs
# In Grafana Explore, query: {service="zot"}
```

---

## Step 0.7: Update indri-services-check Script

**Files to modify:**
- `mise-tasks/indri-services-check`

**Changes to add:**
```bash
# Add after existing service checks (around line 55)
check_service "zot" "ssh indri 'launchctl list | grep zot | grep -v \"^-\"'"
check_service "zot-metrics" "ssh indri 'launchctl list | grep zot-metrics | grep -v \"^-\"'"

# Add to HTTP endpoints section (around line 65)
check_http "Zot Registry" "http://indri:5000/v2/_catalog"

# Add metrics file check
check_service "Zot metrics" "ssh indri 'test -f /opt/homebrew/var/node_exporter/textfile/zot.prom'"
```

**Testing:**
```bash
# Run the health check
mise run indri-services-check

# Expected output includes:
# zot...               OK
# zot-metrics...       OK
# Zot Registry...      OK
# Zot metrics...       OK
```

**Implementation Details:**
- Used Tailscale service URL (`https://registry.tail8d86e.ts.net/v2/_catalog`) instead of internal endpoint to verify full path works.

---

## Step 0.8: Install and Configure Podman on Indri

**New files:**
```
ansible/roles/podman/
├── tasks/main.yml
└── handlers/main.yml
```

**Tasks (tasks/main.yml):**
```yaml
- name: Install podman via homebrew
  community.general.homebrew:
    name: podman
    state: present

- name: Initialize podman machine (if not exists)
  ansible.builtin.command:
    cmd: podman machine init --cpus 4 --memory 8192 --disk-size 220
  register: podman_init
  changed_when: podman_init.rc == 0
  failed_when: podman_init.rc not in [0, 125]  # 125 = already exists

- name: Start podman machine
  ansible.builtin.command:
    cmd: podman machine start
  register: podman_start
  changed_when: "'started successfully' in podman_start.stdout"
  failed_when: podman_start.rc not in [0, 125]  # 125 = already running
```

**Testing:**
```bash
# Deploy podman role
mise run provision-indri -- --tags podman

# Verify podman is working
ssh indri 'podman info'
ssh indri 'podman run --rm hello-world'
```

**Implementation Details:**
- **KNOWN ISSUE**: `podman machine init` and `podman machine start` have reliability issues when run via Ansible/SSH. The machine sometimes gets stuck in "Starting" state due to a race condition (see https://github.com/containers/podman/issues/16945). Apple Hypervisor may also require GUI session context.
- **WORKAROUND**: If the machine fails to start via Ansible, manually run on indri:
  ```bash
  podman machine rm -f podman-machine-default
  podman machine init --cpus 4 --memory 8192 --disk-size 220
  podman machine start
  ```
- LaunchAgent approach was attempted but didn't resolve the issue reliably.
- TODO: Investigate proper automation solution for reliable podman machine management.

---

## Step 0.9: Install and Configure Minikube

**New files:**
```
ansible/roles/minikube/
├── defaults/main.yml
├── tasks/main.yml
└── handlers/main.yml
```

**Defaults:**
```yaml
minikube_cpus: 4
minikube_memory: 8192
minikube_disk_size: "200g"
minikube_driver: podman
minikube_container_runtime: cri-o
```

**Note on storage:** The disk-size is for node-local storage only (container images, emptyDir, local PVs). Pods can also mount external storage:
- **hostPath** - indri filesystem (e.g., `~/transmission/` for kiwix ZIM files)
- **NFS** - sifaka volumes (Synology supports NFS natively, easiest for k8s)
- **SMB/CIFS** - requires csi-driver-smb; sifaka currently uses SMB for desktop mounts

**Tasks:**
```yaml
- name: Install minikube via homebrew
  community.general.homebrew:
    name: minikube
    state: present

- name: Check if minikube cluster exists
  ansible.builtin.command:
    cmd: minikube status --format='{{.Host}}'
  register: minikube_status
  changed_when: false
  failed_when: false

- name: Start minikube cluster
  ansible.builtin.command:
    cmd: >
      minikube start
      --driver={{ minikube_driver }}
      --container-runtime={{ minikube_container_runtime }}
      --cpus={{ minikube_cpus }}
      --memory={{ minikube_memory }}
      --disk-size={{ minikube_disk_size }}
  when: minikube_status.rc != 0 or 'Running' not in minikube_status.stdout
```

**Testing:**
```bash
# Deploy minikube role
mise run provision-indri -- --tags minikube

# Verify cluster is running
ssh indri 'minikube status'
# Expected: host: Running, kubelet: Running, apiserver: Running

# Test kubectl access from indri
ssh indri 'kubectl get nodes'
# Expected: minikube   Ready    control-plane   ...
```

**Implementation Details:**
- Changed `minikube_memory` from 8192 to 7800 because podman machine reports slightly less available memory (7908MB) due to VM overhead. Minikube rejects memory requests exceeding what podman reports.
- Deployed with Kubernetes v1.34.0 and CRI-O 1.24.6.

---

## Step 0.10: Configure Kubeconfig on Gilbert

**Goal**: Enable `kubectl` and `k9s` on gilbert to connect to the minikube cluster running on indri.

**Considerations:**
- Minikube runs inside a podman VM on indri, so the API server isn't directly exposed on indri's network interface
- Admin users have full Tailscale access to indri via `autogroup:admin → * → *`
- Be careful not to overwrite existing work kubeconfigs

**Possible approaches:**
1. SSH tunneling to forward the API server port
2. `minikube tunnel` running on indri (exposes LoadBalancer services)
3. Configure minikube with `--apiserver-names=indri` at cluster creation time
4. Use `kubectl` via SSH wrapper: `ssh indri kubectl ...`

**Verification:**
```bash
# From gilbert, these should work:
kubectl get nodes
kubectl get namespaces
k9s  # Should show the minikube cluster
```

The exact approach will be determined during implementation based on what works best with the podman driver.

**Implementation Details:**

Chose **Option 3: Recreate cluster with `--apiserver-names`** after researching alternatives:

1. **SSH tunneling** - Requires keeping a tunnel running or complex on-demand setup
2. **SOCKS5 proxy with kubeconfig `proxy-url`** - Kubeconfig supports `proxy-url: socks5://localhost:1080` per-context, but still requires managing the proxy
3. **`--apiserver-names` + `--listen-address`** - Native minikube support, cleanest solution

**Cluster Setup:** Recreated the minikube cluster with additional flags:
```bash
minikube delete
minikube start \
  --driver=podman \
  --container-runtime=cri-o \
  --cpus=4 --memory=7800 --disk-size=200g \
  --apiserver-names=indri \
  --listen-address=0.0.0.0
```

- `--apiserver-names=indri` adds "indri" to the API server certificate SAN
- `--listen-address=0.0.0.0` tells podman to expose the API port on all interfaces
- API server port is dynamic (check with `kubectl config view --minify -o jsonpath="{.clusters[0].cluster.server}"` on indri)

**Credential Management with 1Password:**

Rather than copying private keys between machines, credentials are stored in 1Password and fetched on-demand using kubectl's exec credential plugin. This mirrors the 1Password SSH agent pattern for biometric-protected key access.

1. **Store credentials in 1Password** (vault: `vg6xf6vvfmoh5hqjjhlhbeoaie`, item: `3jo4f2hnzvwfmamudfsbbbec7e`):
   - `client-cert` - Contents of `~/.minikube/profiles/minikube/client.crt` (text field)
   - `client-key` - Contents of `~/.minikube/profiles/minikube/client.key` (text field)
   - `ca-cert` - Contents of `~/.minikube/ca.crt` (text field, not secret but stored for convenience)

2. **Created credential helper script** at `bin/kubectl-credential-1password`:
   ```bash
   #!/bin/bash
   # Fetches client cert/key from 1Password, outputs ExecCredential JSON
   # Usage: kubectl-credential-1password <vault-id> <item-id> <cert-field> <key-field>
   ```
   Symlinked to `~/.local/bin/kubectl-credential-1password`

3. **Kubeconfig setup on gilbert:**
   ```bash
   # Store CA cert locally (not secret - public key for server verification)
   mkdir -p ~/.kube/minikube-indri
   op --vault <vault> item get <item> --fields ca-cert | sed 's/^"//; s/"$//' > ~/.kube/minikube-indri/ca.crt

   # Configure cluster
   kubectl config set-cluster minikube-indri \
     --server=https://indri:<port> \
     --certificate-authority=/Users/eblume/.kube/minikube-indri/ca.crt

   # Configure credentials with exec plugin
   kubectl config set-credentials minikube-indri \
     --exec-api-version=client.authentication.k8s.io/v1beta1 \
     --exec-command=kubectl-credential-1password \
     --exec-arg=<vault-id> \
     --exec-arg=<item-id> \
     --exec-arg=client-cert \
     --exec-arg=client-key

   # Create context
   kubectl config set-context minikube-indri \
     --cluster=minikube-indri \
     --user=minikube-indri
   ```

4. **Usage:**
   ```bash
   kubectl --context=minikube-indri get nodes
   # or
   kubectl config use-context minikube-indri
   kubectl get nodes
   ```

**Security Notes:**
- Client private key never stored on disk - fetched from 1Password on each kubectl command
- CA cert stored on disk (not secret - it's a public key for server verification)
- 1Password biometric/password prompt required for credential access
- `op` command strips quotes from text fields with `sed 's/^"//; s/"$//'`

**References:**
- [minikube start options](https://minikube.sigs.k8s.io/docs/commands/start/)
- [Using kubectl via SSH Tunnel](https://blog.scottlowe.org/2020/06/16/using-kubectl-via-an-ssh-tunnel/)
- [SOCKS5 Proxy Access to K8s API](https://kubernetes.ltd/docs/tasks/extend-kubernetes/socks5-proxy-access-api/)
- [kubectl-tokensshtunnel](https://github.com/jordiprats/kubectl-tokensshtunnel)
- [Securing kubectl config with 1Password](https://blog.mikael.green/post/1password-kubeconfig/)

---

## Step 0.11: Add Minikube to indri-services-check

**Files to modify:**
- `mise-tasks/indri-services-check`

**Changes:**
```bash
# Add new section for Kubernetes
echo ""
echo "Kubernetes cluster:"
check_service "minikube" "ssh indri 'minikube status --format={{.Host}} | grep -q Running'"
check_service "k8s-apiserver" "ssh indri 'kubectl get --raw /healthz'"
```

**Testing:**
```bash
mise run indri-services-check

# Expected output includes:
# Kubernetes cluster:
# minikube...          OK
# k8s-apiserver...     OK
```

**Implementation Notes:**
- Added a third check `k8s-apiserver (remote)` that verifies kubectl access from gilbert, not just via SSH to indri. This ensures the 1Password credential flow and remote API server access are working.
- The remote check uses both `--kubeconfig` and `--context` flags explicitly since the script runs in bash (not fish) and doesn't inherit the KUBECONFIG environment variable from fish config.

---

## Step 0.12: Create Zettelkasten Documentation

**New files:**
- `~/code/personal/zk/zot.md`
- `~/code/personal/zk/minikube.md`

**Files to update:**
- `~/code/personal/zk/1767747119-YCPO.md` (main blumeops card)

**Updates to main blumeops card:**

1. Add to **Device Tags** table:
   | `tag:registry` | indri | Container registry access |

2. Add to **Services** table:
   | **Registry** | https://registry.tail8d86e.ts.net | OCI container registry (Zot) | [[zot]] |
   | **Kubernetes** | https://indri:<port> | Minikube cluster | [[minikube]] |

3. Add to **Port Map (Indri)** table:
   | 5050 | Zot | HTTP | localhost | Container registry |
   | <dynamic> | K8s API | HTTPS | 0.0.0.0 | Minikube API server |

4. Add new section **Remote Kubernetes Access**:
   ```markdown
   ## Remote Kubernetes Access (from Gilbert)

   The minikube cluster on indri is accessible from gilbert via direct connection.
   Cluster was created with `--apiserver-names=indri --listen-address=0.0.0.0`.

   **Fish abbreviations** (in `~/.config/fish/config.fish`):
   - `ki` → `kubectl --context=minikube-indri`
   - `k9i` → `k9s --context=minikube-indri`
   - `k9` → `k9s`

   ```bash
   # Quick access via abbreviations
   ki get nodes
   k9i

   # Or explicitly set context
   kubectl config use-context minikube-indri
   kubectl get nodes
   ```
   ```

**Template for zot.md:**
```markdown
---
id: zot
aliases:
  - zot
  - container-registry
tags:
  - blumeops
---

# Zot Registry Management Log

Zot is an OCI-native container registry running on Indri, providing:
1. Pull-through cache for Docker Hub, GHCR, Quay (avoids rate limits)
2. Private image storage for custom-built containers

## Service Details

- URL: https://registry.tail8d86e.ts.net
- Local port: 5050
- Data directory: ~/zot
- Config: ~/.config/zot/config.json
- Managed via: mcquack LaunchAgent

## Namespace Convention

| Path | Source |
|------|--------|
| `registry.../docker.io/*` | Cached from Docker Hub |
| `registry.../ghcr.io/*` | Cached from GHCR |
| `registry.../quay.io/*` | Cached from Quay |
| `registry.../blumeops/*` | Private images (yours) |

## Useful Commands

\`\`\`bash
# List all images
curl -s http://localhost:5050/v2/_catalog | jq

# Pull via cache (from indri or k8s)
podman pull localhost:5050/docker.io/library/nginx:latest

# Build and push private image (from gilbert)
podman build -t registry.tail8d86e.ts.net/blumeops/myapp:v1 .
podman push registry.tail8d86e.ts.net/blumeops/myapp:v1

# Check service status
launchctl list | grep zot

# View logs
tail -f ~/Library/Logs/mcquack.zot.err.log
\`\`\`

## Log

### [DATE]
- Initial setup for k8s migration Phase 0
```

**Template for minikube.md:**
```markdown
---
id: minikube
aliases:
  - minikube
  - kubernetes
  - k8s
tags:
  - blumeops
---

# Minikube Management Log

Minikube provides a single-node Kubernetes cluster on Indri for running containerized services.

## Cluster Details

- Driver: podman (rootless)
- Container runtime: CRI-O
- Kubernetes version: v1.34.0
- Resources: 4 CPUs, 7800MB RAM, 200GB disk
- API server: https://indri:<port> (accessible from gilbert via Tailscale)

## Remote Access from Gilbert

Cluster was created with `--apiserver-names=indri --listen-address=0.0.0.0` to allow remote kubectl access.

\`\`\`bash
# Switch context
kubectl config use-context minikube-indri

# Verify
kubectl get nodes
kubectl get namespaces

# Use k9s
k9s --context minikube-indri
\`\`\`

## Useful Commands (on indri)

\`\`\`bash
# Cluster status
minikube status

# Start/stop cluster
minikube start
minikube stop

# Access dashboard
minikube dashboard

# SSH into node
minikube ssh

# View logs
minikube logs
\`\`\`

## Podman Machine (prerequisite)

Minikube uses podman as the container runtime. The podman machine must be running:

\`\`\`bash
# Check podman machine
podman machine list

# Start if needed
podman machine start
\`\`\`

## Log

### [DATE]
- Initial cluster setup for k8s migration Phase 0
- Configured for remote access with --apiserver-names=indri
```

**Implementation Notes:**
- Created zot.md and minikube.md in ~/code/personal/zk/
- Updated 1767747119-YCPO.md (main blumeops card) with all specified changes
- Added 1Password credential plugin reference to minikube docs
- K8s API port is 39535 (dynamically assigned by minikube, may change on cluster recreation)

---

## Step 0.13: Update Main Playbook

**Files to modify:**
- `ansible/playbooks/indri.yml`

**Changes:**
```yaml
# Add new roles to the roles list
- role: podman
  tags: podman
- role: zot
  tags: zot
- role: zot_metrics
  tags: zot_metrics
- role: minikube
  tags: minikube
```

**Implementation Notes:**
- Roles were added incrementally during Steps 0.3, 0.5, 0.8, and 0.9
- All four roles (zot, zot_metrics, podman, minikube) confirmed present in indri.yml

---

## Step 0.14: Expose K8s API as Tailscale Service (Added Post-Completion)

> **Note**: This step was added after Phase 0 was otherwise complete, to provide a stable, named endpoint for the Kubernetes API server.

**Goal**: Expose the minikube API server as `k8s.tail8d86e.ts.net` instead of using `indri:<dynamic-port>`.

**Current state:**
- Minikube API server on port 39535 (dynamic, could change on cluster recreation)
- Accessed via `https://indri:39535`
- Certificate SANs include "indri"

**Target state:**
- Stable Tailscale service at `k8s.tail8d86e.ts.net:443`
- Fixed API server port (6443, the k8s standard)
- Certificate SANs include both hostnames for compatibility

---

### Step 0.14.1: Update Pulumi ACLs

**Files to modify:**
- `pulumi/policy.hujson`
- `pulumi/__main__.py`

**Changes to policy.hujson:**

1. Add tag to `tagOwners`:
```hujson
"tag:k8s-api": ["autogroup:admin", "tag:blumeops"],
```

2. Update Erich's test case accept list to include k8s-api:
```hujson
"accept": ["tag:grafana:443", "tag:kiwix:443", "tag:feed:443", "tag:loki:3100", "tag:pg:5432", "tag:homelab:22", "tag:registry:443", "tag:k8s-api:443"],
```

3. Update Allison's deny list:
```hujson
"deny": ["tag:grafana:443", "tag:loki:3100", "tag:nas:445", "tag:registry:443", "tag:k8s-api:443"],
```

**Changes to __main__.py:**
- Add `"tag:k8s-api"` to indri's DeviceTags

**Testing:**
```bash
mise run tailnet-preview   # Review changes
mise run tailnet-up        # Apply changes
```

---

### Step 0.14.2: Create Tailscale Service in Admin Console (MANUAL)

> **CRITICAL**: Do this BEFORE running ansible that calls `tailscale serve`

1. Go to https://login.tailscale.com/admin/services
2. Create service `k8s` with:
   - Port: 443 (TCP)
   - Host: indri

---

### Step 0.14.3: Recreate Minikube Cluster

The cluster needs to be recreated to:
1. Add `k8s.tail8d86e.ts.net` to the API server certificate SANs
2. Fix the API server port to 6443 (standard k8s port)

**On indri:**
```bash
# Stop and delete existing cluster
minikube stop
minikube delete

# Recreate with new settings
minikube start \
  --driver=podman \
  --container-runtime=cri-o \
  --cpus=4 --memory=7800 --disk-size=200g \
  --apiserver-names=k8s.tail8d86e.ts.net,indri \
  --apiserver-port=6443 \
  --listen-address=0.0.0.0

# Verify certificate SANs include both names
kubectl config view --minify -o jsonpath="{.clusters[0].cluster.server}"
# Expected: https://127.0.0.1:6443 or similar

# Verify cluster is running
minikube status
kubectl get nodes
```

**Update ansible role defaults** (`ansible/roles/minikube/defaults/main.yml`):
```yaml
minikube_apiserver_names:
  - k8s.tail8d86e.ts.net
  - indri
minikube_apiserver_port: 6443
```

---

### Step 0.14.4: Add K8s Service to Tailscale Serve

**Files to modify:**
- `ansible/roles/tailscale_serve/defaults/main.yml`

**Add to services list:**
```yaml
- name: svc:k8s
  tcp:
    port: 443
    upstream: tcp://localhost:6443
```

**Note:** Using TCP passthrough (not HTTPS termination) because k8s uses mTLS authentication.

**Deploy:**
```bash
mise run provision-indri -- --tags tailscale-serve
```

---

### Step 0.14.5: Update 1Password Credentials

After cluster recreation, the client certificates have changed.

**On indri, get the new credentials:**
```bash
# Display new certificates (copy to 1Password)
cat ~/.minikube/profiles/minikube/client.crt
cat ~/.minikube/profiles/minikube/client.key
cat ~/.minikube/ca.crt
```

**In 1Password** (vault: `vg6xf6vvfmoh5hqjjhlhbeoaie`, item: `3jo4f2hnzvwfmamudfsbbbec7e`):
- Update `client-cert` field with new certificate
- Update `client-key` field with new key
- Update `ca-cert` field with new CA certificate

---

### Step 0.14.6: Update Kubeconfig on Gilbert

**Update CA certificate:**
```bash
# Fetch new CA cert from 1Password
op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get 3jo4f2hnzvwfmamudfsbbbec7e --fields ca-cert | sed 's/^"//; s/"$//' > ~/.kube/minikube-indri/ca.crt
```

**Update kubeconfig** (`~/.kube/minikube-indri/config.yml`):
```yaml
clusters:
- cluster:
    certificate-authority: /Users/eblume/.kube/minikube-indri/ca.crt
    server: https://k8s.tail8d86e.ts.net  # Changed from https://indri:39535
  name: minikube-indri
```

**Verification:**
```bash
# Test connection via new hostname
kubectl --context=minikube-indri get nodes

# Test via abbreviation
ki get nodes
```

---

### Step 0.14.7: Update Documentation

**Files to update:**
- `~/code/personal/zk/minikube.md` - Update API server URL and port info
- `~/code/personal/zk/1767747119-YCPO.md` - Update Services table and Port Map

**Changes to blumeops card:**

1. Update Services table:
   | **Kubernetes** | https://k8s.tail8d86e.ts.net | Minikube cluster | [[minikube]] |

2. Update Port Map:
   | 6443 | K8s API | HTTPS/TCP | 0.0.0.0 | Minikube API server (via Tailscale) |

3. Add `tag:k8s-api` to Device Tags table

---

### Step 0.14.8: Update indri-services-check

**Files to modify:**
- `mise-tasks/indri-services-check`

**Changes:**
```bash
# Update remote k8s check to use new URL
check_service "k8s-apiserver (remote)" "kubectl --kubeconfig=$HOME/.kube/minikube-indri/config.yml --context=minikube-indri get --raw /healthz"
# (No change needed - uses kubeconfig which now points to k8s.tail8d86e.ts.net)
```

---

### Step 0.14 Verification

```bash
# 1. Service health check
mise run indri-services-check
# All services should be OK

# 2. Test k8s access via Tailscale hostname
curl -k https://k8s.tail8d86e.ts.net/healthz
# Expected: ok (or certificate error if mTLS required - that's fine)

# 3. kubectl via Tailscale
ki get nodes
ki get namespaces

# 4. k9s via Tailscale
k9i
```

---

## Phase 0 Verification Checklist

Run after completing all steps:

```bash
# 1. Full service health check
mise run indri-services-check
# All services should show OK, including new ones

# 2. Registry functionality - pull-through cache
ssh indri 'podman pull localhost:5000/docker.io/library/alpine:latest'
curl -s https://registry.tail8d86e.ts.net/v2/_catalog
# Expected: {"repositories":["docker.io/library/alpine"]}

# 3. Registry functionality - private image push (from gilbert)
podman pull alpine:latest
podman tag alpine:latest registry.tail8d86e.ts.net/blumeops/test:v1
podman push registry.tail8d86e.ts.net/blumeops/test:v1
curl -s https://registry.tail8d86e.ts.net/v2/_catalog
# Expected: {"repositories":["blumeops/test","docker.io/library/alpine"]}

# 4. Kubernetes cluster
ssh indri 'minikube status'
ssh indri 'kubectl get nodes'
kubectl get nodes  # from gilbert

# 5. Metrics in Prometheus
curl -s "http://indri:9090/api/v1/query?query=zot_up"
# Expected: value = 1

# 6. Logs in Loki
# In Grafana Explore: {service="zot"}
# Should see zot log entries

# 7. k9s from gilbert
k9s
# Should connect and show minikube cluster
```

---

## Phase 0 Rollback

If something goes wrong:

```bash
# Stop and remove minikube
ssh indri 'minikube stop && minikube delete'

# Stop and remove zot
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.zot.plist'
ssh indri 'rm ~/Library/LaunchAgents/mcquack.eblume.zot.plist'

# Remove podman machine
ssh indri 'podman machine stop && podman machine rm'

# Remove from tailscale serve
ssh indri 'tailscale serve --service svc:registry reset'

# Remove tags from Pulumi (revert policy.hujson changes)
mise run tailnet-up

# Revert ansible playbook changes
git checkout ansible/playbooks/indri.yml
git checkout ansible/roles/tailscale_serve/defaults/main.yml
git checkout ansible/roles/alloy/templates/config.alloy.j2

# Remove new roles
rm -rf ansible/roles/{zot,zot_metrics,podman,minikube}

# Remove zk cards
rm ~/code/personal/zk/{zot,minikube}.md
```

---

## Phase 0 Follow-up: Grafana Dashboards

After Phase 0 is running and stable, create monitoring dashboards:

**Zot Dashboard** (`ansible/roles/grafana/files/dashboards/zot.json`):
1. Check what metrics zot exposes: `ssh indri 'curl -s http://localhost:5000/metrics'`
2. Review community dashboards for inspiration (copy permitted if license allows)
3. Create dashboard with available metrics (at minimum: `zot_up`)

**Minikube Dashboard** (`ansible/roles/grafana/files/dashboards/minikube.json`):
1. Deploy kube-state-metrics if needed for additional cluster metrics
2. Review what Prometheus can scrape from the cluster
3. Review community dashboards for inspiration (copy permitted if license allows)
4. Create dashboard with relevant panels (node usage, pod counts, etc.)

---

## New Files Summary

| File | Purpose |
|------|---------|
| `ansible/roles/zot/` | Zot registry deployment |
| `ansible/roles/zot_metrics/` | Metrics collection for Zot |
| `ansible/roles/podman/` | Podman installation and setup |
| `ansible/roles/minikube/` | Minikube cluster setup |
| `~/code/personal/zk/zot.md` | Zot management documentation |
| `~/code/personal/zk/minikube.md` | Minikube management documentation |

## Modified Files Summary

| File | Changes |
|------|---------|
| `pulumi/policy.hujson` | Add tag:registry |
| `ansible/playbooks/indri.yml` | Add new roles |
| `ansible/roles/tailscale_serve/defaults/main.yml` | Add svc:registry |
| `ansible/roles/alloy/templates/config.alloy.j2` | Add zot log collection |
| `mise-tasks/indri-services-check` | Add zot and k8s checks |