kingfisher/README.md
2025-07-28 11:04:24 -07:00

556 lines
19 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Kingfisher
<p align="center">
<img src="docs/kingfisher_logo.png" alt="Kingfisher Logo" width="126" height="173" style="vertical-align: right;" />
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
Kingfisher is a blazingly fast secretscanning and validation tool built in Rust. It combines Intels hardwareaccelerated Hyperscan regex engine with languageaware parsing via TreeSitter, and **ships with hundreds of builtin rules** to detect, validate, and triage secrets before they ever reach production
</p>
Kingfisher originated as a fork of [Nosey Parker](https://github.com/praetorian-inc/noseyparker) by Praetorian Security, Inc, and is built atop their incredible work and the work contributed by the Nosey Parker community.
Kingfisher extends Nosey Parker by:
1. **Validating secrets** in real time via cloud-provider APIs
2. Enhancing regex-based detection with **source-code parsing** for improved accuracy
3. Adding **GitLab** repository scanning support
4. Adding support for scanning **Docker** images via `--docker-image`
5. Providing **Jira** scanning capabilities
6. Introducing a baseline feature that suppresses known secrets and reports only newly introduced ones
7. Offering native **Windows** support
**MongoDB Blog**: [Introducing Kingfisher: Real-Time Secret Detection and Validation](https://www.mongodb.com/blog/post/product-release-announcements/introducing-kingfisher-real-time-secret-detection-validation)
## Key Features
- **Performance**: Multithreaded, Hyperscanpowered scanning for massive codebases
- **LanguageAware Accuracy**: AST parsing in 20+ languages via TreeSitter reduces contextless regex matches. see [docs/PARSING.md](/docs/PARSING.md)
- **Built-In Validation**: Hundreds of built-in detection rules, many with live-credential validators that call the relevant service APIs (AWS, Azure, GCP, Stripe, etc.) to confirm a secret is active. You can extend or override the library by adding YAML-defined rules on the command line—see [docs/RULES.md](/docs/RULES.md) for details
- **Git History Scanning**: Scan local repos, remote GitHub/GitLab orgs/users, or arbitrary GitHub/GitLab repos
- **Jira Scanning**: Scan issues returned from a JQL search using `--jira-url` and `--jql`
- **Docker Image Scanning**: Scan public or private docker images via `--docker-image`
- **Baseline Support:** Generate and manage baseline files to ignore known secrets and report only newly introduced ones. See ([docs/BASELINE.md](docs/BASELINE.md)) for details.
# Getting Started
## Installation
On macOS, you can simply
```bash
brew install kingfisher
```
Pre-built binaries are also available on the [Releases](https://github.com/mongodb/kingfisher/releases) section of this page.
Or you may compile for your platform via `make`:
```bash
# NOTE: Requires Docker
make linux
```
```bash
# macOS
make darwin
```
```bash
# Windows x64 --- requires building from a Windows host with Visual Studio installed
./buildwin.bat -force
```
```bash
# Build all targets
make linux-all # builds both x64 and arm64
make darwin-all # builds both x64 and arm64
make all # builds for every OS and architecture supported
```
### Run Kingfisher in Docker
Run the dockerized Kingfisher container:
```bash
# GitHub Container Registry
docker run --rm ghcr.io/mongodb/kingfisher:latest --version
# Scan the current working directory
# (mounts your code at /src and scans it)
docker run --rm \
-v "$PWD":/src \
ghcr.io/mongodb/kingfisher:latest scan /src
# Scan while providing a GitHub token
# Mounts your working dir at /proj and passes in the token:
docker run --rm \
-e KF_GITHUB_TOKEN=ghp_… \
-v "$PWD":/proj \
ghcr.io/mongodb/kingfisher:latest \
scan --git-url https://github.com/org/private_repo.git
# Scan and write a JSON report locally
# Here we:
# 1. Mount $PWD → /proj
# 2. Tell Kingfisher to write findings.json inside /proj/reports
# 3. Ensure ./reports exists on your host so Docker can mount it
mkdir -p reports
# run and output into hosts ./reports directory
docker run --rm \
-v "$PWD":/proj \
ghcr.io/mongodb/kingfisher:latest \
scan /proj \
--format json \
--output /proj/reports/findings.json
# Tip: you can combine multiple mounts if you prefer separating source vs. output:
# Here /src is readonly, and /out holds your generated reports
docker run --rm \
-v "$PWD":/src:ro \
-v "$PWD/reports":/out \
ghcr.io/mongodb/kingfisher:latest \
scan /src \
--format json \
--output /out/findings.json
```
# 🔐 Detection Rules at a Glance
Kingfisher ships with hundreds of rules that cover everything from classic cloud keys to the latest LLM-API secrets. Below is an overview:
| Category | What we catch |
|----------|---------------|
| **AI / LLM APIs** | OpenAI, Anthropic, Google Gemini, Cohere, Mistral, Stability AI, Replicate, xAI (Grok), and more
| **Cloud Providers** | AWS, Azure, GCP, Alibaba Cloud, DigitalOcean, IBM Cloud, Cloudflare, and more
| **Dev & CI/CD** | GitHub/GitLab tokens, CircleCI, TravisCI, TeamCity, Docker Hub, npm & PyPI publish token, and more
| **Messaging & Comms** | Slack, Discord, Microsoft Teams, Twilio, Mailgun/SendGrid/Mailchimp, and more
| **Databases & Data Ops** | MongoDB Atlas, PlanetScale, Postgres DSNs, Grafana Cloud, Datadog, Dynatrace, and more
| **Payments & Billing** | Stripe, PayPal, Square, GoCardless, and more
| **Security & DevSecOps** | Snyk, Dependency-Track, CodeClimate, Codacy, OpsGenie, PagerDuty, and more
| **Misc. SaaS & Tools** | 1Password, Adobe, Atlassian/Jira, Asana, Netlify, Baremetrics, and more
## Write Custom Rules!
Kingfisher ships with hundreds of rules with HTTP and servicespecific validation checks (AWS, Azure, GCP, etc.) to confirm if a detected string is a live credential.
However, you may want to add your own custom rules, or modify a detection to better suit your needs / environment.
First, review [docs/RULES.md](/docs/RULES.md) to learn how to create custom Kingfisher rules.
Once you've done that, you can provide your custom rules (defined in a YAML file) and provide it to Kingfisher at runtime --- no recompiling required!
# Usage
## Basic Examples
> **Note**  `kingfisher scan` detects whether the input is a Git repository or a plain directory—no extra flags required.
### Scan with secret validation
```bash
kingfisher scan /path/to/code
## NOTE: This path can refer to:
# 1. a local git repo
# 2. a directory with many git repos
# 3. or just a folder with files and subdirectories
## To explicitly prevent scanning git commit history add:
# `--git-history=none`
```
### Scan a directory containing multiple Git repositories
```bash
kingfisher scan /projects/monorepodir
```
### Scan a Git repository without validation
```bash
kingfisher scan ~/src/myrepo --no-validate
```
### Display only secrets confirmed active by thirdparty APIs
```bash
kingfisher scan /path/to/repo --only-valid
```
### Output JSON and capture to a file
```bash
kingfisher scan . --format json | tee kingfisher.json
```
### Output SARIF directly to disk
```bash
kingfisher scan /path/to/repo --format sarif --output findings.sarif
```
### Pipe any text directly into Kingfisher by passing `-`
```bash
cat /path/to/file.py | kingfisher scan -
```
### Scan using a rule _family_ with one flag
_(prefix matching: `--rule kingfisher.aws` loads `kingfisher.aws._`)\*
```bash
# Only apply AWS-related rules (kingfisher.aws.1 + kingfisher.aws.2)
kingfisher scan /path/to/repo --rule kingfisher.aws
```
### Display rule performance statistics
```bash
kingfisher scan /path/to/repo --rule-stats
```
### Scan while ignoring likely test files
`--exclude` skips any file or directory whose path matches this glob pattern (repeatable, uses gitignore-style syntax, case sensitive)
```bash
# Scan source but skip likely unit / integration tests
kingfisher scan ./my-project \
--exclude='[Tt]est' \
--exclude='spec' \
--exclude='[Ff]ixture' \
--exclude='example' \
--exclude='sample'
```
### Exclude specific paths
```bash
# Skip all Python files and any directory named tests
kingfisher scan ./my-project \
--exclude '*.py' \
--exclude '[Tt]ests'
```
If you want to know which files are being skipped, enable verbose debugging (-v) when scanning, which will report any files being skipped by the baseline file (or via --exclude):
```bash
# Skip all Python files and any directory named tests, and report to stderr any skipped files
kingfisher scan ./my-project \
--exclude '*.py' \
--exclude tests \
-v
```
## Scanning Docker Images
Kingfisher will first try to use any locally available image, then fall back to pulling via OCI.
Authentication happens *in this order*:
1. **`KF_DOCKER_TOKEN`** env var
- If it contains `user:pass`, its used as Basic auth
- Otherwise its sent as a Bearer token
2. **Docker CLI credentials**
- Checks `credHelpers` (per-registry) and `credsStore` in `~/.docker/config.json`.
- Falls back to the legacy `auths` → `auth` (base64) entries.
3. **Anonymous** (no credentials)
```bash
# 1) Scan public or already-pulled image
kingfisher scan --docker-image ghcr.io/owasp/wrongsecrets/wrongsecrets-master:latest-master
# 2) For private registries, explicitly set KF_DOCKER_TOKEN:
# - Basic auth: "user:pass"
# - Bearer only: "TOKEN"
export KF_DOCKER_TOKEN="AWS:$(aws ecr get-login-password --region us-east-1)"
kingfisher scan --docker-image some-private-registry.dkr.ecr.us-east-1.amazonaws.com/base/amazonlinux2023:latest
# 3) Or rely on your Docker CLI login/keychain:
# (e.g. aws ecr get-login-password … | docker login …)
kingfisher scan --docker-image private.registry.example.com/my-image:tag
```
## Scanning GitHub
### Scan GitHub organisation (requires `KF_GITHUB_TOKEN`)
```bash
kingfisher scan --github-organization my-org
```
### Scan remote GitHub repository
```bash
kingfisher scan --git-url https://github.com/org/repo.git
# Optionally provide a GitHub Token
KF_GITHUB_TOKEN="ghp_…" kingfisher scan --git-url https://github.com/org/private_repo.git
```
---
## Scanning GitLab
### Scan GitLab group (requires `KF_GITLAB_TOKEN`)
```bash
kingfisher scan --gitlab-group my-group
```
### Scan GitLab user
```bash
kingfisher scan --gitlab-user johndoe
```
### Scan remote GitLab repository by URL
```bash
kingfisher scan --git-url https://gitlab.com/group/project.git
```
### List GitLab repositories
```bash
kingfisher gitlab repos list --group my-group
```
## Scanning Jira
### Scan Jira issues matching a JQL query
```bash
KF_JIRA_TOKEN="token" kingfisher scan \
--jira-url https://jira.company.com \
--jql "project = TEST AND status = Open" \
--max-results 500
```
### Scan the last 1,000 Jira issues:
```bash
KF_JIRA_TOKEN="token" kingfisher scan \
--jira-url https://jira.mongodb.org \
--jql 'ORDER BY created DESC' \
--max-results 1000
```
---
## Environment Variables for Tokens
| Variable | Purpose |
| ----------------- | ---------------------------- |
| `KF_GITHUB_TOKEN` | GitHub Personal Access Token |
| `KF_GITLAB_TOKEN` | GitLab Personal Access Token |
| `KF_JIRA_TOKEN` | Jira API token |
| `KF_DOCKER_TOKEN` | Docker registry token (`user:pass` or bearer token). If unset, credentials from the Docker keychain are used |
Set them temporarily per command:
```bash
KF_GITLAB_TOKEN="glpat-…" kingfisher scan --gitlab-group my-group
```
Or export for the session:
```bash
export KF_GITLAB_TOKEN="glpat-…"
```
To authenticate Jira requests:
```bash
export KF_JIRA_TOKEN="token"
```
_If no token is provided Kingfisher still works for public repositories._
---
## Exit Codes
| Code | Meaning |
| ---- | ----------------------------- |
| 0 | No findings |
| 200 | Findings discovered |
| 205 | Validated findings discovered |
## Install a Pre-Commit Hook
Run the provided helper script to add a hook that scans staged files before each commit:
```bash
# local (current repo only ─ default)
./install-precommit-hook.sh
```
This creates `.git/hooks/pre-commit` that scans the files staged for commit with `kingfisher scan --no-update-check` and blocks the commit if any secrets are found.
```bash
# global (every repo on this machine)
./install-precommit-hook.sh --global
### Install a Pre-Receive Hook
```
Installs a global pre-commit hook at `$HOME/.git/hooks/pre-commit`; for every Git repository you use, it runs `kingfisher scan --no-update-check` on the staged files and cancels the commit if any secrets are detected.
To check incoming pushes on a server-side repository, install the pre-receive hook:
```bash
./install-prereceive-hook.sh
```
The resulting `.git/hooks/pre-receive` script scans the files in each pushed commit and rejects the push if any secrets are detected.
## Update Checks
Kingfisher automatically queries GitHub for a newer release when it starts and tells you whether an update is available.
- **Hands-free updates** Add `--self-update` to any Kingfisher command
* If a newer version exists, Kingfisher will download it, replace the running binary, and re-launch itself with the **exact same arguments**.
* If the update fails or no newer release is found, the current run proceeds as normal
- **Disable version checks** Pass `--no-update-check` to skip both the startup and shutdown checks entirely
# Advanced Options
## Build a Baseline / Detect New Secrets
There are situations where a repository already contains checkedin secrets, but you want to ensure no **new** secrets are introduced. A baseline file lets you document the known findings so future scans only report anything that is not already in that list.
The easiest way to create a baseline is to run a normal scan with the `--manage-baseline` flag (typically at a low confidence level to capture all potential matches):
```bash
kingfisher scan /path/to/code \
--confidence low \
--manage-baseline \
--baseline-file ./baseline-file.yml
```
Use the same YAML file with the `--baseline-file` option on future scans to hide all recorded findings:
```bash
kingfisher scan /path/to/code \
--baseline-file /path/to/baseline-file.yaml
```
See ([docs/BASELINE.md](docs/BASELINE.md)) for full detail.
## List Builtin Rules
```bash
kingfisher rules list
```
## To scan using **only** your own `my_rules.yaml` you could run:
```bash
kingfisher scan \
--load-builtins=false \
--rules-path path/to/my_rules.yaml \
./src/
```
## To add your rules alongside the builtins:
```bash
kingfisher scan \
--rules-path ./custom-rules/ \
--rules-path my_rules.yml \
~/path/to/project-dir/
```
## Other Examples
```bash
# Check custom rules - this ensures all regular expressions compile, and can match the rule's `examples` in the YML file
kingfisher rules check --rules-path ./my_rules.yml
# List GitHub repos
kingfisher github repos list --user my-user
kingfisher github repos list --organization my-org
```
## Notable Scan Options
- `--no-dedup`: Report every occurrence of a finding (disable the default de-duplicate behavior)
- `--confidence <LEVEL>`: (low|medium|high)
- `--min-entropy <VAL>`: Override default threshold
- `--no-binary`: Skip binary files
- `--no-extract-archives`: Do not scan inside archives
- `--extraction-depth <N>`: Specifies how deep nested archives should be extracted and scanned (default: 2)
- `--redact`: Replaces discovered secrets with a one-way hash for secure output
- `--exclude <PATTERN>`: Skip any file or directory whose path matches this glob pattern (repeatable, uses gitignore-style syntax, case sensitive)
- `--baseline-file <FILE>`: Ignore matches listed in a baseline YAML file
- `--manage-baseline`: Create or update the baseline file with current findings
## Finding Fingerprint
The document below details the four-field formula (rule SHA-1, origin label, start & end offsets) hashed with XXH3-64 to create Kingfishers 64-bit finding fingerprint, and explains how this ID powers safe deduplication; plus how `--no-dedup` can be used shows every raw match.
See ([docs/FINGERPRINT.md](docs/FINGERPRINT.md))
## Rule Performance Profiling
Use `--rule-stats` to collect timing information for every rule. After scanning, the summary prints a **Rule Performance Stats** section showing how many matches each rule produced along with its slowest and average match times. Useful when creating rules or debugging rules.
## CLI Options
```bash
kingfisher scan --help
```
## Business Value
By integrating Kingfisher into your development lifecycle, you can:
- **Prevent Costly Breaches**
Early detection of embedded credentials avoids expensive incident response, legal fees, and reputation damage
- **Automate Compliance**
Enforce secretscanning policies across GitOps, CI/CD, and pull requests to help satisfy SOC 2, PCIDSS, GDPR, and other standards
- **Reduce Noise, Focus on Real Threats**
Validation logic filters out false positives and highlights only active, valid secrets (`--only-valid`)
- **Accelerate Dev Workflows**
Run in parallel across dozens of languages, integrate with GitHub Actions or any pipeline, and shift security left to minimize delays
## The Risk of Leaked Secrets
Real breaches show how one exposed key can snowball into a full-scale incident:
- **Uber (2016):** GitHub-hosted AWS key let attackers access data on 57 M riders and 600 k drivers. [[BBC](https://www.bbc.com/news/technology-42075306)] [[Ars](https://arstechnica.com/tech-policy/2017/11/report-uber-paid-hackers-100000-to-keep-2016-data-breach-quiet/)]
- **AWS engineer (2020):** Pushed log files with root credentials to GitHub. [[Register](https://www.theregister.com/2020/01/23/aws_engineer_credentials_github/)] [[UpGuard](https://www.upguard.com/breaches/identity-and-access-misstep-how-an-amazon-engineer-exposed-credentials-and-more)]
- **Infosys (2023):** Full-admin AWS key left in a public PyPI package for a year. [[Stack](https://www.thestack.technology/infosys-leak-aws-key-exposed-on-pypi/)] [[Blog](https://tomforb.es/blog/infosys-leak/)]
- **Microsoft (2023):** Azure SAS token in an AI repo exposed 38 TB of internal data. [[Wiz](https://www.wiz.io/blog/38-terabytes-of-private-data-accidentally-exposed-by-microsoft-ai-researchers)] [[TechCrunch](https://techcrunch.com/2023/09/18/microsoft-ai-researchers-accidentally-exposed-terabytes-of-internal-sensitive-data/)]
- **GitHub (2023):** RSA SSH host key briefly went public; company rotated it. [[GitHub](https://github.blog/news-insights/company-news/we-updated-our-rsa-ssh-host-key/)]
Leaked secrets fuel unauthorized access, lateral movement, regulatory fines, and brand-damaging incident-response costs.
# Benchmark Results
See ([docs/COMPARISON.md](docs/COMPARISON.md))
<p align="center">
<img src="docs/runtime-comparison.png" alt="Kingfisher Runtime Comparison" style="vertical-align: center;" />
</p>
# Roadmap
- More rules
- Packages for Linux (deb, rpm)
- Please file a [feature request](https://github.com/mongodb/kingfisher/issues) if you have specific features you'd like added
# License
[Apache2 License](LICENSE)