Recurring compensating-control review. Verified: - alloy-k8s: Synced/Healthy on minikube-indri (DaemonSet 1/1 ready) - alloy-ringtail: Synced/Healthy on k3s-ringtail - loki (monitoring/loki-0): Running, receiving logs The previous description named only minikube, but BlumeOps now runs two clusters with the migration to ringtail in progress. Generalized the description and notes to cover both, and added a follow-up note that enabling native apiserver audit logging is much more tractable on k3s than it was on minikube. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
210 lines
9.8 KiB
YAML
210 lines
9.8 KiB
YAML
# Compensating Controls
|
|
#
|
|
# Documents controls that mitigate risks from suppressed or accepted security
|
|
# findings. Referenced by security tools (Prowler mutelist, Kingfisher config,
|
|
# etc.) via "CC: <id>" in finding descriptions or suppression notes.
|
|
#
|
|
# Used by `mise run review-compensating-controls` to surface stale controls.
|
|
#
|
|
# Fields:
|
|
# id - kebab-case unique identifier, referenced from tool configs
|
|
# description - what the control actually does to mitigate risk
|
|
# created - date (YYYY-MM-DD) the control was documented
|
|
# last-reviewed - date (YYYY-MM-DD) or null
|
|
# notes - optional context
|
|
|
|
controls:
|
|
- id: single-user-cluster
|
|
description: >-
|
|
Only the cluster operator (eblume) has kubectl access. No untrusted
|
|
users can create pods, access cached images, or bind RBAC roles.
|
|
created: 2026-03-30
|
|
last-reviewed: 2026-04-01
|
|
notes: >-
|
|
Verify by checking kubeconfig distribution and Tailscale ACLs.
|
|
If additional users gain cluster access, re-evaluate all findings
|
|
muted under this control.
|
|
|
|
- id: tailscale-network-isolation
|
|
description: >-
|
|
Cluster is not internet-exposed. All access requires Tailscale
|
|
identity with ACL enforcement. Profiling endpoints, debug ports,
|
|
and control-plane APIs are unreachable from the public internet.
|
|
created: 2026-03-30
|
|
last-reviewed: 2026-04-06
|
|
notes: >-
|
|
Verify with 'tailscale serve status --json' on indri and review
|
|
Tailscale ACLs in pulumi/tailscale/. Only tag:flyio-target services
|
|
are publicly routable.
|
|
|
|
- id: local-registry
|
|
description: >-
|
|
Operator-built services use a private zot registry
|
|
(registry.ops.eblu.me) for supply-chain control. Remaining
|
|
images are pulled from public registries without stored
|
|
credentials. No shared registry secrets are cached on cluster
|
|
nodes.
|
|
created: 2026-03-30
|
|
last-reviewed: 2026-04-12
|
|
notes: >-
|
|
Verify by checking image prefixes in kustomization.yaml files.
|
|
Known external-image categories: (1) upstream apps not yet
|
|
mirrored — immich, ollama, frigate, frigate-notify, valkey;
|
|
(2) infrastructure components — tailscale operator/proxy,
|
|
external-secrets, 1password-connect, forgejo-runner, docker
|
|
DinD, nvidia-device-plugin; (3) utility base images — busybox,
|
|
alpine (grafana init containers). Track upstream versions in
|
|
service-versions.yaml. Goal is to progressively mirror these
|
|
into zot.
|
|
|
|
- id: sso-gated-admin-tools
|
|
description: >-
|
|
ArgoCD requires SSO authentication via Authentik OIDC. Wildcard
|
|
RBAC roles are mitigated by requiring authenticated identity
|
|
before any API access.
|
|
created: 2026-03-30
|
|
last-reviewed: 2026-04-14
|
|
notes: >-
|
|
Verify Authentik OIDC provider config for ArgoCD and that
|
|
anonymous access is disabled. Check ArgoCD --auth-token isn't
|
|
leaked. The workflow-bot API key account is scoped to sync/get
|
|
only.
|
|
|
|
- id: operator-managed-pods
|
|
description: >-
|
|
Tailscale operator manages proxy pod specs (ts-*, ingress-*,
|
|
operator-*, nameserver-*). Pod security settings are set by the
|
|
operator, not user manifests. Operator is tracked in
|
|
service-versions.yaml and regularly updated.
|
|
created: 2026-03-30
|
|
last-reviewed: 2026-04-21
|
|
notes: >-
|
|
Verify operator version is current via 'mise run service-review'.
|
|
Check Tailscale changelog for security fixes. If operator adds
|
|
seccomp support, remove these mutes. As of 2026-04-21: still no
|
|
default seccomp on operator-generated pods (upstream issue #7359
|
|
open). A ProxyClass + generic device plugin can downgrade proxies
|
|
from privileged to NET_ADMIN+NET_RAW and set seccompProfile —
|
|
potential future remediation to remove the seccomp mute without
|
|
waiting for upstream defaults.
|
|
|
|
- id: ephemeral-privileged-jobs
|
|
description: >-
|
|
Prowler CIS scanner runs as a CronJob with 7-day TTL
|
|
auto-deletion, not as a persistent privileged workload. hostPID
|
|
exposure is time-bounded to scan duration (~20s).
|
|
created: 2026-03-30
|
|
last-reviewed: 2026-04-29
|
|
notes: >-
|
|
Verify TTL is set in cronjob.yaml. Check that no persistent
|
|
pods run with hostPID on the scanned cluster (indri). The
|
|
alloy-tracing DaemonSet on ringtail also uses hostPID but is
|
|
out of scope — Prowler only scans indri. Tracked in Todoist:
|
|
"prowler scan against ringtail" — once that lands, the
|
|
DaemonSet's hostPID+privileged posture will surface as a CIS
|
|
finding and need its own CC or remediation.
|
|
|
|
- id: trusted-ci-only
|
|
description: >-
|
|
Forgejo runner only executes workflows from repos on the private
|
|
forge (forge.ops.eblu.me). No external or untrusted repos can
|
|
trigger privileged CI jobs.
|
|
created: 2026-03-30
|
|
last-reviewed: 2026-05-01
|
|
notes: >-
|
|
Verification: (1) Runner config (argocd/manifests/forgejo-runner/
|
|
config.yaml) connects only to https://forge.ops.eblu.me/. (2) Forge
|
|
app.ini has DISABLE_REGISTRATION=true and ALLOW_ONLY_EXTERNAL_REGISTRATION
|
|
=true (ansible/roles/forgejo/defaults/main.yml) — no untrusted users
|
|
can sign up or create repos. The runner registers at instance scope
|
|
(repo_id=0/owner_id=0 in action_runner table), but the instance itself
|
|
is closed, so no per-repo allow-list is needed. Re-evaluate if the
|
|
forge ever opens to additional users or if the runner is repointed
|
|
to an external forge.
|
|
|
|
- id: init-container-isolation
|
|
description: >-
|
|
Root privileges and added capabilities (CHOWN) are limited to
|
|
init containers that run once at pod startup. All runtime
|
|
containers run as non-root (UID 472) with all capabilities
|
|
dropped.
|
|
created: 2026-03-30
|
|
last-reviewed: 2026-05-04
|
|
notes: >-
|
|
Verify by inspecting grafana deployment.yaml securityContext
|
|
for both init and runtime containers. If fsGroup alone can
|
|
handle PVC ownership, remove init-chown-data and this control.
|
|
Retirement deferred until grafana lands on ringtail's k3s
|
|
(see [[indri-k8s-migration]]) — storage backend will change,
|
|
and removing init-chown-data right before that migration
|
|
trades a real safety net for marginal cleanup. Revisit
|
|
post-migration.
|
|
|
|
- id: node-config-automated-verification
|
|
description: >-
|
|
Prowler reports certain node-level checks as MANUAL because it runs
|
|
inside a pod and cannot evaluate kubelet file permissions, kubelet
|
|
config arguments, etcd CA separation, or cluster-admin RBAC bindings.
|
|
The review-compliance-reports script SSHes into the minikube node
|
|
weekly and programmatically verifies each condition, failing loudly
|
|
if any check deviates from expected values.
|
|
created: 2026-04-14
|
|
last-reviewed: 2026-04-14
|
|
notes: >-
|
|
Verification runs as part of 'mise run review-compliance-reports'.
|
|
If minikube node is unreachable, all checks report as FAIL. If new
|
|
MANUAL findings appear in Prowler, add corresponding verification
|
|
logic to the script and update the mutelist.
|
|
|
|
- id: operator-purpose-bound-rbac
|
|
description: >-
|
|
Operators whose entire function is to manage a sensitive resource
|
|
legitimately need RBAC over that resource. external-secrets-operator
|
|
manages Secret objects (its purpose) and the cert-controller mutates
|
|
its own ValidatingWebhookConfigurations to inject rotating CA bundles.
|
|
Risk is bounded by: (1) the operator code being upstream open-source
|
|
and reviewed; (2) RBAC scoped to specific named webhooks where
|
|
possible; (3) supply chain controls on the operator image (mirrored
|
|
to local registry, version tracked in service-versions.yaml).
|
|
created: 2026-04-27
|
|
last-reviewed: 2026-04-27
|
|
notes: >-
|
|
Verify by checking that the operators in question still match their
|
|
stated purpose (i.e. external-secrets is still the only consumer of
|
|
these ClusterRoles) and that upstream hasn't published advisories
|
|
for credential-handling bugs. Re-evaluate if a non-secrets-managing
|
|
ClusterRole appears under this control.
|
|
|
|
- id: kube-state-metrics-metadata-only
|
|
description: >-
|
|
kube-state-metrics holds list/watch on Secrets cluster-wide but only
|
|
exposes Secret object *metadata* (name, namespace, type, creation
|
|
timestamp, labels) via the kube_secret_info / kube_secret_labels
|
|
metrics. Secret data fields are never read into KSM's exposed
|
|
metrics by upstream design. Mitigation rests on KSM's metric
|
|
schema, the version pin in service-versions.yaml, and the metrics
|
|
endpoint being reachable only on the cluster network.
|
|
created: 2026-04-27
|
|
last-reviewed: 2026-04-27
|
|
notes: >-
|
|
Verify by inspecting the /metrics endpoint output for any series
|
|
that include secret data (only *_info and *_labels metrics should
|
|
reference secrets, and labels should be limited to user-applied
|
|
labels — never the data:). Re-evaluate on KSM version bumps.
|
|
|
|
- id: observability-stack-audit
|
|
description: >-
|
|
Alloy collects pod logs and ships them to Loki, providing an
|
|
audit trail for cluster activity. Compensates for missing
|
|
apiserver audit logging which neither minikube (indri) nor
|
|
k3s (ringtail) configures by default.
|
|
created: 2026-03-30
|
|
last-reviewed: 2026-05-11
|
|
notes: >-
|
|
Verify Alloy DaemonSet is running on each cluster (alloy-k8s on
|
|
minikube, alloy-ringtail on k3s) and Loki is receiving logs.
|
|
Note this is weaker than native apiserver audit logs — it
|
|
captures pod stdout/stderr, not API request-level auditing.
|
|
Consider enabling apiserver audit logging on k3s post-migration
|
|
(`--audit-log-path` / `--audit-policy-file`) — minikube made it
|
|
hard, k3s makes it straightforward.
|