# Compensating Controls # # Documents controls that mitigate risks from suppressed or accepted security # findings. Referenced by security tools (Prowler mutelist, Kingfisher config, # etc.) via "CC: " in finding descriptions or suppression notes. # # Used by `mise run review-compensating-controls` to surface stale controls. # # Fields: # id - kebab-case unique identifier, referenced from tool configs # description - what the control actually does to mitigate risk # created - date (YYYY-MM-DD) the control was documented # last-reviewed - date (YYYY-MM-DD) or null # notes - optional context controls: - id: single-user-cluster description: >- Only the cluster operator (eblume) has kubectl access. No untrusted users can create pods, access cached images, or bind RBAC roles. created: 2026-03-30 last-reviewed: 2026-04-01 notes: >- Verify by checking kubeconfig distribution and Tailscale ACLs. If additional users gain cluster access, re-evaluate all findings muted under this control. - id: tailscale-network-isolation description: >- Cluster is not internet-exposed. All access requires Tailscale identity with ACL enforcement. Profiling endpoints, debug ports, and control-plane APIs are unreachable from the public internet. created: 2026-03-30 last-reviewed: 2026-04-06 notes: >- Verify with 'tailscale serve status --json' on indri and review Tailscale ACLs in pulumi/tailscale/. Only tag:flyio-target services are publicly routable. - id: local-registry description: >- Operator-built services use a private zot registry (registry.ops.eblu.me) for supply-chain control. Remaining images are pulled from public registries without stored credentials. No shared registry secrets are cached on cluster nodes. created: 2026-03-30 last-reviewed: 2026-04-12 notes: >- Verify by checking image prefixes in kustomization.yaml files. Known external-image categories: (1) upstream apps not yet mirrored — immich, ollama, frigate, frigate-notify, valkey; (2) infrastructure components — tailscale operator/proxy, external-secrets, 1password-connect, forgejo-runner, docker DinD, nvidia-device-plugin; (3) utility base images — busybox, alpine (grafana init containers). Track upstream versions in service-versions.yaml. Goal is to progressively mirror these into zot. - id: sso-gated-admin-tools description: >- ArgoCD requires SSO authentication via Authentik OIDC. Wildcard RBAC roles are mitigated by requiring authenticated identity before any API access. created: 2026-03-30 last-reviewed: 2026-04-14 notes: >- Verify Authentik OIDC provider config for ArgoCD and that anonymous access is disabled. Check ArgoCD --auth-token isn't leaked. The workflow-bot API key account is scoped to sync/get only. - id: operator-managed-pods description: >- Tailscale operator manages proxy pod specs (ts-*, ingress-*, operator-*, nameserver-*). Pod security settings are set by the operator, not user manifests. Operator is tracked in service-versions.yaml and regularly updated. created: 2026-03-30 last-reviewed: 2026-04-21 notes: >- Verify operator version is current via 'mise run service-review'. Check Tailscale changelog for security fixes. If operator adds seccomp support, remove these mutes. As of 2026-04-21: still no default seccomp on operator-generated pods (upstream issue #7359 open). A ProxyClass + generic device plugin can downgrade proxies from privileged to NET_ADMIN+NET_RAW and set seccompProfile — potential future remediation to remove the seccomp mute without waiting for upstream defaults. - id: ephemeral-privileged-jobs description: >- Prowler CIS scanner runs as a CronJob with 7-day TTL auto-deletion, not as a persistent privileged workload. hostPID exposure is time-bounded to scan duration (~20s). created: 2026-03-30 last-reviewed: 2026-04-29 notes: >- Verify TTL is set in cronjob.yaml. Check that no persistent pods run with hostPID on the scanned cluster (indri). The alloy-tracing DaemonSet on ringtail also uses hostPID but is out of scope — Prowler only scans indri. Tracked in Todoist: "prowler scan against ringtail" — once that lands, the DaemonSet's hostPID+privileged posture will surface as a CIS finding and need its own CC or remediation. - id: trusted-ci-only description: >- Forgejo runner only executes workflows from repos on the private forge (forge.ops.eblu.me). No external or untrusted repos can trigger privileged CI jobs. created: 2026-03-30 last-reviewed: 2026-05-01 notes: >- Verification: (1) Runner config (argocd/manifests/forgejo-runner/ config.yaml) connects only to https://forge.ops.eblu.me/. (2) Forge app.ini has DISABLE_REGISTRATION=true and ALLOW_ONLY_EXTERNAL_REGISTRATION =true (ansible/roles/forgejo/defaults/main.yml) — no untrusted users can sign up or create repos. The runner registers at instance scope (repo_id=0/owner_id=0 in action_runner table), but the instance itself is closed, so no per-repo allow-list is needed. Re-evaluate if the forge ever opens to additional users or if the runner is repointed to an external forge. - id: init-container-isolation description: >- Root privileges and added capabilities (CHOWN) are limited to init containers that run once at pod startup. All runtime containers run as non-root (UID 472) with all capabilities dropped. created: 2026-03-30 last-reviewed: 2026-05-04 notes: >- Verify by inspecting grafana deployment.yaml securityContext for both init and runtime containers. If fsGroup alone can handle PVC ownership, remove init-chown-data and this control. Retirement deferred until grafana lands on ringtail's k3s (see [[indri-k8s-migration]]) — storage backend will change, and removing init-chown-data right before that migration trades a real safety net for marginal cleanup. Revisit post-migration. - id: node-config-automated-verification description: >- Prowler reports certain node-level checks as MANUAL because it runs inside a pod and cannot evaluate kubelet file permissions, kubelet config arguments, etcd CA separation, or cluster-admin RBAC bindings. The review-compliance-reports script SSHes into the minikube node weekly and programmatically verifies each condition, failing loudly if any check deviates from expected values. created: 2026-04-14 last-reviewed: 2026-04-14 notes: >- Verification runs as part of 'mise run review-compliance-reports'. If minikube node is unreachable, all checks report as FAIL. If new MANUAL findings appear in Prowler, add corresponding verification logic to the script and update the mutelist. - id: operator-purpose-bound-rbac description: >- Operators whose entire function is to manage a sensitive resource legitimately need RBAC over that resource. external-secrets-operator manages Secret objects (its purpose) and the cert-controller mutates its own ValidatingWebhookConfigurations to inject rotating CA bundles. Risk is bounded by: (1) the operator code being upstream open-source and reviewed; (2) RBAC scoped to specific named webhooks where possible; (3) supply chain controls on the operator image (mirrored to local registry, version tracked in service-versions.yaml). created: 2026-04-27 last-reviewed: 2026-04-27 notes: >- Verify by checking that the operators in question still match their stated purpose (i.e. external-secrets is still the only consumer of these ClusterRoles) and that upstream hasn't published advisories for credential-handling bugs. Re-evaluate if a non-secrets-managing ClusterRole appears under this control. - id: kube-state-metrics-metadata-only description: >- kube-state-metrics holds list/watch on Secrets cluster-wide but only exposes Secret object *metadata* (name, namespace, type, creation timestamp, labels) via the kube_secret_info / kube_secret_labels metrics. Secret data fields are never read into KSM's exposed metrics by upstream design. Mitigation rests on KSM's metric schema, the version pin in service-versions.yaml, and the metrics endpoint being reachable only on the cluster network. created: 2026-04-27 last-reviewed: 2026-04-27 notes: >- Verify by inspecting the /metrics endpoint output for any series that include secret data (only *_info and *_labels metrics should reference secrets, and labels should be limited to user-applied labels — never the data:). Re-evaluate on KSM version bumps. - id: observability-stack-audit description: >- Alloy collects pod logs and ships them to Loki, providing an audit trail for cluster activity. Compensates for missing apiserver audit logging which minikube does not configure. created: 2026-03-30 last-reviewed: 2026-03-30 notes: >- Verify Alloy DaemonSet is running and Loki is receiving logs. Note this is weaker than native apiserver audit logs — it captures pod stdout/stderr, not API request-level auditing. Consider enabling minikube audit logging if supported.