# Compensating Controls
#
# Documents controls that mitigate risks from suppressed or accepted security
# findings. Referenced by security tools (Prowler mutelist, Kingfisher config,
# etc.) via "CC: <id>" in finding descriptions or suppression notes.
#
# Used by `mise run review-compensating-controls` to surface stale controls.
#
# Fields:
#   id              - kebab-case unique identifier, referenced from tool configs
#   description     - what the control actually does to mitigate risk
#   created         - date (YYYY-MM-DD) the control was documented
#   last-reviewed   - date (YYYY-MM-DD) or null
#   notes           - optional context

controls:
  - id: single-user-cluster
    description: >-
      Only the cluster operator (eblume) has kubectl access. No untrusted
      users can create pods, access cached images, or bind RBAC roles.
    created: 2026-03-30
    last-reviewed: 2026-04-01
    notes: >-
      Verify by checking kubeconfig distribution and Tailscale ACLs.
      If additional users gain cluster access, re-evaluate all findings
      muted under this control.

  - id: tailscale-network-isolation
    description: >-
      Cluster is not internet-exposed. All access requires Tailscale
      identity with ACL enforcement. Profiling endpoints, debug ports,
      and control-plane APIs are unreachable from the public internet.
    created: 2026-03-30
    last-reviewed: 2026-04-06
    notes: >-
      Verify with 'tailscale serve status --json' on indri and review
      Tailscale ACLs in pulumi/tailscale/. Only tag:flyio-target services
      are publicly routable.

  - id: local-registry
    description: >-
      Operator-built services use a private zot registry
      (registry.ops.eblu.me) for supply-chain control. Remaining
      images are pulled from public registries without stored
      credentials. No shared registry secrets are cached on cluster
      nodes.
    created: 2026-03-30
    last-reviewed: 2026-04-12
    notes: >-
      Verify by checking image prefixes in kustomization.yaml files.
      Known external-image categories: (1) upstream apps not yet
      mirrored — immich, ollama, frigate, frigate-notify, valkey;
      (2) infrastructure components — tailscale operator/proxy,
      external-secrets, 1password-connect, forgejo-runner, docker
      DinD, nvidia-device-plugin; (3) utility base images — busybox,
      alpine (grafana init containers). Track upstream versions in
      service-versions.yaml. Goal is to progressively mirror these
      into zot.

  - id: sso-gated-admin-tools
    description: >-
      ArgoCD requires SSO authentication via Authentik OIDC. Wildcard
      RBAC roles are mitigated by requiring authenticated identity
      before any API access.
    created: 2026-03-30
    last-reviewed: 2026-04-14
    notes: >-
      Verify Authentik OIDC provider config for ArgoCD and that
      anonymous access is disabled. Check ArgoCD --auth-token isn't
      leaked. The workflow-bot API key account is scoped to sync/get
      only.

  - id: operator-managed-pods
    description: >-
      Tailscale operator manages proxy pod specs (ts-*, ingress-*,
      operator-*, nameserver-*). Pod security settings are set by the
      operator, not user manifests. Operator is tracked in
      service-versions.yaml and regularly updated.
    created: 2026-03-30
    last-reviewed: 2026-04-21
    notes: >-
      Verify operator version is current via 'mise run service-review'.
      Check Tailscale changelog for security fixes. If operator adds
      seccomp support, remove these mutes. As of 2026-04-21: still no
      default seccomp on operator-generated pods (upstream issue #7359
      open). A ProxyClass + generic device plugin can downgrade proxies
      from privileged to NET_ADMIN+NET_RAW and set seccompProfile —
      potential future remediation to remove the seccomp mute without
      waiting for upstream defaults.

  - id: ephemeral-privileged-jobs
    description: >-
      Prowler CIS scanner runs as a CronJob with 7-day TTL
      auto-deletion, not as a persistent privileged workload. hostPID
      exposure is time-bounded to scan duration (~20s).
    created: 2026-03-30
    last-reviewed: 2026-04-29
    notes: >-
      Verify TTL is set in cronjob.yaml. Check that no persistent
      pods run with hostPID on the scanned cluster (indri). The
      alloy-tracing DaemonSet on ringtail also uses hostPID but is
      out of scope — Prowler only scans indri. Tracked in Todoist:
      "prowler scan against ringtail" — once that lands, the
      DaemonSet's hostPID+privileged posture will surface as a CIS
      finding and need its own CC or remediation.

  - id: trusted-ci-only
    description: >-
      Forgejo runner only executes workflows from repos on the private
      forge (forge.ops.eblu.me). No external or untrusted repos can
      trigger privileged CI jobs.
    created: 2026-03-30
    last-reviewed: 2026-05-01
    notes: >-
      Verification: (1) Runner config (argocd/manifests/forgejo-runner/
      config.yaml) connects only to https://forge.ops.eblu.me/. (2) Forge
      app.ini has DISABLE_REGISTRATION=true and ALLOW_ONLY_EXTERNAL_REGISTRATION
      =true (ansible/roles/forgejo/defaults/main.yml) — no untrusted users
      can sign up or create repos. The runner registers at instance scope
      (repo_id=0/owner_id=0 in action_runner table), but the instance itself
      is closed, so no per-repo allow-list is needed. Re-evaluate if the
      forge ever opens to additional users or if the runner is repointed
      to an external forge.

  - id: init-container-isolation
    description: >-
      Root privileges and added capabilities (CHOWN) are limited to
      init containers that run once at pod startup. All runtime
      containers run as non-root (UID 472) with all capabilities
      dropped.
    created: 2026-03-30
    last-reviewed: 2026-05-04
    notes: >-
      Verify by inspecting grafana deployment.yaml securityContext
      for both init and runtime containers. If fsGroup alone can
      handle PVC ownership, remove init-chown-data and this control.
      Retirement deferred until grafana lands on ringtail's k3s
      (see [[indri-k8s-migration]]) — storage backend will change,
      and removing init-chown-data right before that migration
      trades a real safety net for marginal cleanup. Revisit
      post-migration.

  - id: node-config-automated-verification
    description: >-
      Prowler reports certain node-level checks as MANUAL because it runs
      inside a pod and cannot evaluate kubelet file permissions, kubelet
      config arguments, etcd CA separation, or cluster-admin RBAC bindings.
      The review-compliance-reports script SSHes into the minikube node
      weekly and programmatically verifies each condition, failing loudly
      if any check deviates from expected values.
    created: 2026-04-14
    last-reviewed: 2026-04-14
    notes: >-
      Verification runs as part of 'mise run review-compliance-reports'.
      If minikube node is unreachable, all checks report as FAIL. If new
      MANUAL findings appear in Prowler, add corresponding verification
      logic to the script and update the mutelist.

  - id: operator-purpose-bound-rbac
    description: >-
      Operators whose entire function is to manage a sensitive resource
      legitimately need RBAC over that resource. external-secrets-operator
      manages Secret objects (its purpose) and the cert-controller mutates
      its own ValidatingWebhookConfigurations to inject rotating CA bundles.
      Risk is bounded by: (1) the operator code being upstream open-source
      and reviewed; (2) RBAC scoped to specific named webhooks where
      possible; (3) supply chain controls on the operator image (mirrored
      to local registry, version tracked in service-versions.yaml).
    created: 2026-04-27
    last-reviewed: 2026-04-27
    notes: >-
      Verify by checking that the operators in question still match their
      stated purpose (i.e. external-secrets is still the only consumer of
      these ClusterRoles) and that upstream hasn't published advisories
      for credential-handling bugs. Re-evaluate if a non-secrets-managing
      ClusterRole appears under this control.

  - id: kube-state-metrics-metadata-only
    description: >-
      kube-state-metrics holds list/watch on Secrets cluster-wide but only
      exposes Secret object *metadata* (name, namespace, type, creation
      timestamp, labels) via the kube_secret_info / kube_secret_labels
      metrics. Secret data fields are never read into KSM's exposed
      metrics by upstream design. Mitigation rests on KSM's metric
      schema, the version pin in service-versions.yaml, and the metrics
      endpoint being reachable only on the cluster network.
    created: 2026-04-27
    last-reviewed: 2026-04-27
    notes: >-
      Verify by inspecting the /metrics endpoint output for any series
      that include secret data (only *_info and *_labels metrics should
      reference secrets, and labels should be limited to user-applied
      labels — never the data:). Re-evaluate on KSM version bumps.

  - id: observability-stack-audit
    description: >-
      Alloy collects pod logs and ships them to Loki, providing an
      audit trail for cluster activity. Compensates for missing
      apiserver audit logging which minikube does not configure.
    created: 2026-03-30
    last-reviewed: 2026-03-30
    notes: >-
      Verify Alloy DaemonSet is running and Loki is receiving logs.
      Note this is weaker than native apiserver audit logs — it
      captures pod stdout/stderr, not API request-level auditing.
      Consider enabling minikube audit logging if supported.