forked from mirrors/kingfisher
271 lines
12 KiB
Markdown
271 lines
12 KiB
Markdown
# Project Configuration (`kingfisher.yaml`)
|
|
|
|
Long CLI invocations are awkward in CI. Kingfisher loads a project-local
|
|
`kingfisher.yaml` to provide defaults for nearly every `kingfisher scan` flag,
|
|
plus alert webhooks and filter lists. Lists are **additive** (config + CLI
|
|
concatenated); scalars are **default-only** — a config value applies only when
|
|
the user did not pass the matching `--flag`. This keeps CI overrides
|
|
predictable and makes the CLI authoritative.
|
|
|
|
## Loading a config
|
|
|
|
Kingfisher does **not** auto-discover `kingfisher.yaml`. The file is loaded
|
|
only when you pass `--config FILE` explicitly:
|
|
|
|
```bash
|
|
kingfisher scan . --config ./kingfisher.yaml
|
|
```
|
|
|
|
A missing or malformed file is a fatal error — there is no silent fallback,
|
|
so a typo in the path or a broken YAML block fails fast instead of running
|
|
with surprising defaults. Auto-discovery was rejected because it makes scan
|
|
results depend on where the binary was launched from, which is too easy to
|
|
get wrong in CI.
|
|
|
|
## Precedence
|
|
|
|
```
|
|
CLI flag > environment variable > kingfisher.yaml > built-in default
|
|
```
|
|
|
|
For list-typed values both sources are concatenated, so passing
|
|
`--skip-word EXAMPLE` and listing `EXAMPLE` again in `kingfisher.yaml` is safe
|
|
but redundant. The one nuance: `rules.enabled` *replaces* the synthetic
|
|
`["all"]` default when you don't pass `--rule`, so a config that lists
|
|
`["custom"]` actually narrows the selection.
|
|
|
|
## End-to-end: create a config and scan with it
|
|
|
|
### Step 1 — generate the config
|
|
|
|
Don't write the YAML by hand. Start from the **scan-default flags** you
|
|
already pass to `kingfisher scan` (the policy-shaped ones — confidence,
|
|
redaction, filters, output, alerts, TLS, self-hosted API roots) and pass
|
|
them to `kingfisher config init`:
|
|
|
|
```bash
|
|
# Print to stdout, redirect to file:
|
|
kingfisher config init \
|
|
--confidence high \
|
|
--redact \
|
|
--exclude vendor/ \
|
|
--skip-word EXAMPLE \
|
|
--format sarif \
|
|
--output ./kingfisher.sarif \
|
|
--alert-min-confidence high \
|
|
--alert-webhook https://hooks.slack.com/services/T0/B0/AAA \
|
|
--tls-mode lax \
|
|
--github-api-url https://ghe.corp.example.com/api/v3/ \
|
|
--gitlab-api-url https://gitlab.corp.example.com/ \
|
|
> kingfisher.yaml
|
|
|
|
# Or write the file directly (pass --force to overwrite):
|
|
kingfisher config init [...flags...] --out kingfisher.yaml
|
|
```
|
|
|
|
Only flags you actually supply appear in the output; clap defaults are
|
|
omitted to keep the file minimal. Scan-target inputs (paths, `--git-url`,
|
|
GitHub/GitLab/etc. user/org/group flags, S3/GCS buckets) are stripped —
|
|
they describe *what* this run scans and don't belong in shared project
|
|
policy.
|
|
|
|
> **Important:** `config init` does **not** accept the provider-subcommand
|
|
> form. `kingfisher scan gitlab --group my-group --api-url https://...`
|
|
> cannot be pasted verbatim — `config init` has no `gitlab` subcommand,
|
|
> and `--group` / the subcommand-scoped `--api-url` are not accepted at
|
|
> the top level. Use the top-level aliases instead: `--gitlab-api-url`
|
|
> for the GitLab API root and `--github-api-url` for GHE. Target
|
|
> selectors like `--group` / `--organization` are intentionally CLI-only
|
|
> and have no config-file equivalent.
|
|
|
|
### Step 2 — run the scan, passing the config explicitly
|
|
|
|
```bash
|
|
kingfisher scan . --config ./kingfisher.yaml
|
|
```
|
|
|
|
`--config FILE` is required: there is no auto-discovery. CLI flags can
|
|
still override any individual value for a single run:
|
|
|
|
```bash
|
|
kingfisher scan . --config ./kingfisher.yaml --confidence low
|
|
# scan.confidence: high in YAML → CLI flag wins, runs at low confidence
|
|
```
|
|
|
|
## Webhook URL policy
|
|
|
|
`alerts.webhooks[].url` (and `--alert-webhook URL`) **must use `https://`**.
|
|
Webhook URLs typically embed a secret token in the path and the alert
|
|
payload contains finding metadata, so cleartext transport is never the right
|
|
default. `http://` is allowed only when the host is a loopback address
|
|
(`localhost`, `127.0.0.0/8`, `::1`) — useful for local development against an
|
|
on-host receiver. Loopback decisions are made on the literal hostname / IP
|
|
in the URL; we do not consult DNS, so a resolver cannot trick the validator
|
|
into permitting `http://` for a remote host.
|
|
|
|
## Caveats
|
|
|
|
- **`scan.jobs` and the Tokio runtime.** The Tokio runtime is sized from the
|
|
CLI value of `--jobs` *before* `kingfisher.yaml` is loaded, so config-only
|
|
`scan.jobs` will resize the scanner's job pool but not the underlying async
|
|
worker pool. If you want both to match, pass `--jobs N` on the CLI (or set
|
|
the same value in both places). This only affects parallelism, never
|
|
correctness.
|
|
- **Subcommand scope.** Project config only applies to `kingfisher scan`.
|
|
`validate`, `revoke`, `access-map`, `view`, and `rules` commands ignore
|
|
`kingfisher.yaml`; pass their flags on the CLI directly.
|
|
|
|
## What is *not* config-overridable
|
|
|
|
Scan-target inputs are intentionally CLI-only — they describe *what* this
|
|
invocation is scanning, not project policy:
|
|
|
|
- positional paths, `--git-url`
|
|
- `--github-user` / `--github-org`, `--gitlab-user` / `--gitlab-group` and
|
|
the equivalent Gitea / Bitbucket / Azure / Hugging Face flags
|
|
- `--s3-bucket`, `--gcs-bucket`, `--docker-image`, Docker `--archive`
|
|
- `--jira-url`, `--confluence-url`, `--slack-query`, `--teams-query`,
|
|
`--postman-*`
|
|
|
|
Auth tokens are also intentionally not in YAML; they continue to come from
|
|
env vars (`KINGFISHER_GITHUB_TOKEN`, etc.) so secrets stay out of
|
|
checked-in config files.
|
|
|
|
## Schema
|
|
|
|
```yaml
|
|
scan:
|
|
confidence: medium # low | medium | high (--confidence)
|
|
min_entropy: 3.5 # float (--min-entropy)
|
|
no_validate: false # bool (--no-validate)
|
|
only_valid: false # bool (--only-valid)
|
|
redact: false # bool (--redact)
|
|
no_dedup: false # bool (--no-dedup)
|
|
turbo: false # bool (--turbo)
|
|
no_base64: false # bool (--no-base64)
|
|
access_map: false # bool (--access-map)
|
|
rule_stats: false # bool (--rule-stats)
|
|
jobs: 8 # int (--jobs)
|
|
git_repo_timeout: 1800 # seconds (--git-repo-timeout)
|
|
|
|
rules:
|
|
enabled: ["all"] # list, additive (--rule)
|
|
paths: # list, additive (--rules-path)
|
|
- ./custom-rules/
|
|
load_builtins: true # bool (--load-builtins)
|
|
|
|
validation:
|
|
timeout: 10 # seconds, 1..=60 (--validation-timeout)
|
|
retries: 1 # int, 0..=5 (--validation-retries)
|
|
rps: 5.0 # float (--validation-rps)
|
|
rps_per_rule: # map, additive (--validation-rps-rule)
|
|
kingfisher.aws: 1.0
|
|
full_response: false # bool (--full-validation-response)
|
|
max_response_length: 2048 # bytes (--max-validation-response-length)
|
|
|
|
filters:
|
|
skip_words: # list, additive (--skip-word)
|
|
- EXAMPLE
|
|
- PLACEHOLDER
|
|
skip_regex: # list, additive (--skip-regex)
|
|
- '^DUMMY_[A-Z]+$'
|
|
exclude: # list, additive (--exclude)
|
|
- vendor/
|
|
- "**/node_modules/**"
|
|
max_file_size_mb: 256.0 # float (--max-file-size)
|
|
no_binary: false # bool (--no-binary)
|
|
no_extract_archives: false # bool (--no-extract-archives)
|
|
extraction_depth: 2 # int, 1..=25 (--extraction-depth)
|
|
no_inline_ignore: false # bool (--no-ignore)
|
|
no_ignore_if_contains: false # bool (--no-ignore-if-contains)
|
|
extra_ignore_comments: [] # list, additive (--ignore-comment)
|
|
skip_aws_accounts: [] # list, additive (--skip-aws-account)
|
|
skip_aws_account_file: null # path (--skip-aws-account-file)
|
|
|
|
output:
|
|
format: pretty # pretty|json|jsonl|bson|toon|sarif|html (--format)
|
|
path: ./kingfisher-report.json # path (--output)
|
|
|
|
baseline:
|
|
file: ./baseline.json # path (--baseline-file)
|
|
manage: false # bool (--manage-baseline)
|
|
|
|
alerts:
|
|
defaults: # global defaults; per-webhook overrides still win
|
|
format: null # null = auto-infer (--alert-format)
|
|
on: findings # findings | always (--alert-on)
|
|
min_confidence: medium # low | medium | high (--alert-min-confidence)
|
|
include_secret: false # bool (--alert-include-secret)
|
|
report_url: null # URL (--alert-report-url)
|
|
detail: auto # summary | detail | auto (--alert-detail)
|
|
webhooks:
|
|
- url: https://hooks.slack.com/services/T0/B0/AAA # required
|
|
format: slack # slack | teams | generic | discord | mattermost | googlechat
|
|
on: findings # findings | always
|
|
min_confidence: medium # low | medium | high
|
|
include_secret: false # default false
|
|
report_url: https://ci.example/run/42 # optional pivot link rendered in payload
|
|
detail: auto # summary | detail | auto (default auto)
|
|
|
|
global:
|
|
tls_mode: strict # strict | lax | off (--tls-mode)
|
|
allow_internal_ips: false # bool (--allow-internal-ips)
|
|
no_update_check: false # bool (--no-update-check)
|
|
user_agent_suffix: null # string (--user-agent-suffix)
|
|
endpoints: # list, additive (--endpoint)
|
|
- github=https://ghe.example.com/api/v3
|
|
endpoint_config: null # path (--endpoint-config)
|
|
|
|
git:
|
|
clone_dir: null # path (--git-clone-dir)
|
|
keep_clones: false # bool (--keep-clones)
|
|
repo_clone_limit: null # int (--repo-clone-limit)
|
|
include_contributors: false # bool (--include-contributors)
|
|
github_api_url: null # URL GHE / self-hosted GH (--github-api-url)
|
|
gitlab_api_url: null # URL self-hosted GitLab (--gitlab-api-url)
|
|
```
|
|
|
|
Unknown fields are rejected (typo protection). Empty sections and a missing
|
|
top-level file are both fine.
|
|
|
|
## Example: CI workflow
|
|
|
|
A typical `kingfisher.yaml` for a CI repo, paired with a workflow step
|
|
that runs `kingfisher scan` against it:
|
|
|
|
```bash
|
|
# .github/workflows/secrets.yml — run step
|
|
kingfisher scan . \
|
|
--config ./kingfisher.yaml \
|
|
--alert-webhook "$SLACK_SECURITY_WEBHOOK"
|
|
# `--alert-webhook` here is appended to any webhooks already in
|
|
# kingfisher.yaml (lists are additive). Everything else comes from the
|
|
# config file.
|
|
```
|
|
|
|
The committed `kingfisher.yaml`:
|
|
|
|
```yaml
|
|
scan:
|
|
confidence: high
|
|
redact: true
|
|
output:
|
|
format: sarif
|
|
path: ./kingfisher.sarif
|
|
filters:
|
|
exclude:
|
|
- vendor/
|
|
- "**/node_modules/**"
|
|
- "**/__snapshots__/**"
|
|
skip_aws_accounts:
|
|
- "111122223333" # a test account whose creds we tolerate in test fixtures
|
|
alerts:
|
|
defaults:
|
|
min_confidence: high
|
|
webhooks:
|
|
- url: https://hooks.slack.com/services/T0/B0/AAA
|
|
format: slack
|
|
```
|
|
|
|
Combined with [`docs/ALERTS.md`](ALERTS.md), this lets one repo own its
|
|
webhook configuration and CI policy without baking it into command-line strings.
|