forked from mirrors/kingfisher
added --turbo mode
This commit is contained in:
parent
4905ace028
commit
92f43d2e29
23 changed files with 130 additions and 49 deletions
|
|
@ -3,7 +3,7 @@
|
|||
All notable changes to this project will be documented in this file.
|
||||
|
||||
## [v1.85.0]
|
||||
- Added `--fast` mode: sets `--commit-metadata=false` and `--no-base64` for maximum scan speed. Findings will omit Git commit context (author, date, commit hash) and will not include Base64-decoded secrets.
|
||||
- Added `--turbo` mode: sets `--commit-metadata=false`, `--no-base64`, disables language detection, and disables tree-sitter parsing...for maximum scan speed. Findings will omit Git commit context (author, date, commit hash) and will not include Base64-decoded secrets.
|
||||
- SQLite database scanning: kingfisher now detects and extracts SQLite files (`.db`, `.sqlite`, `.sqlite3`, etc.), dumping each table as SQL text with named columns so secrets stored in database rows are scannable. Controlled by the existing `--extract-archives` flag.
|
||||
- Python bytecode (.pyc) scanning: extracts string constants from compiled Python (`.pyc`, `.pyo`) files via marshal parsing so secrets embedded in bytecode are scannable. Controlled by `--extract-archives`.
|
||||
- Performance: pipelined ODB enumeration — scanning now begins while blob OIDs are still being discovered, overlapping I/O with pattern matching.
|
||||
|
|
|
|||
36
README.md
36
README.md
|
|
@ -1,4 +1,4 @@
|
|||
# Kingfisher
|
||||
# Kingfisher: Open Source Secret Scanner with Live Validation
|
||||
|
||||
<p align="center">
|
||||
<img src="docs/kingfisher_logo.png" alt="Kingfisher Logo" width="126" height="173" style="vertical-align: right;" />
|
||||
|
|
@ -7,16 +7,25 @@
|
|||
[](https://github.com/mongodb/kingfisher/pkgs/container/kingfisher)<br>
|
||||
|
||||
|
||||
Kingfisher is a blazingly fast secret-scanning and **live validation** tool built in Rust.
|
||||
Kingfisher is an open source secret scanner and **live secret validation** tool built in Rust.
|
||||
|
||||
It combines Intel's SIMD-accelerated regex engine (Hyperscan) with language-aware parsing to achieve high accuracy at massive scale, and **ships with hundreds of built-in rules** to detect, **validate**, and triage secrets before they ever reach production.
|
||||
It combines Intel's SIMD-accelerated regex engine (Hyperscan) with language-aware parsing to achieve high accuracy at massive scale, and **ships with hundreds of built-in rules** to detect, **validate**, and triage leaked API keys, tokens, and credentials before they ever reach production.
|
||||
|
||||
Designed for offensive security engineers and blue-teamers alike, Kingfisher helps you pivot across repo ecosystems, validate exposure paths, and hunt for developer-owned leaks that spill beyond the primary codebase.
|
||||
Designed for offensive security engineers and blue-team defenders alike, Kingfisher helps you scan repositories, cloud storage, chat, docs, and CI pipelines to find and verify exposed secrets quickly.
|
||||
|
||||
</p>
|
||||
|
||||
**Learn more:** [Introducing Kingfisher: Real‑Time Secret Detection and Validation](https://www.mongodb.com/blog/post/product-release-announcements/introducing-kingfisher-real-time-secret-detection-validation)
|
||||
|
||||
## What Is Kingfisher?
|
||||
|
||||
Kingfisher is a high-performance, open source secret detection tool for source code and developer platforms. If you are searching for a "GitHub secret scanner," "API key scanner," "token leak detection," or "Git secrets scanner," this project is built for that workflow.
|
||||
|
||||
- Scan code, Git history, and integrated platforms (GitHub, GitLab, Azure Repos, Bitbucket, Gitea, Hugging Face, Jira, Confluence, Slack, Docker, AWS S3, and Google Cloud Storage)
|
||||
- Validate discovered credentials against provider APIs to reduce false positives
|
||||
- Revoke supported secrets directly from the CLI
|
||||
- Generate JSON, SARIF, and HTML outputs for security teams, compliance, and CI
|
||||
|
||||
## Key Features
|
||||
|
||||
### Multiple Scan Targets
|
||||
|
|
@ -60,7 +69,7 @@ See ([docs/COMPARISON.md](docs/COMPARISON.md))
|
|||
kingfisher scan /path/to/scan --view-report
|
||||
```
|
||||
NOTE: Replay has been slowed down for demo
|
||||

|
||||

|
||||
|
||||
## Report Viewer Demo
|
||||
Explore Kingfisher's built-in report viewer and its `--access-map`, which can show what the token (AWS, GCP, Azure, GitHub, GitLab, and Slack...more coming) can actually access.
|
||||
|
|
@ -77,13 +86,14 @@ Serving access-map viewer at http://127.0.0.1:7890 (Ctrl+C to stop)
|
|||
kingfisher scan /path/to/scan --access-map --view-report
|
||||
```
|
||||
|
||||

|
||||

|
||||
|
||||
**Click to view video**
|
||||
[](https://github.com/user-attachments/assets/d33ee7a6-c60a-4e42-88e0-ac03cb429a46)
|
||||
|
||||
# Table of Contents
|
||||
|
||||
- [What Is Kingfisher?](#what-is-kingfisher)
|
||||
- [Key Features](#key-features)
|
||||
- [Compliance and Audit-Ready Scans](#compliance-and-audit-ready-scans)
|
||||
- [Benchmark Results](#benchmark-results)
|
||||
|
|
@ -312,9 +322,10 @@ kingfisher scan /path/to/code
|
|||
# Scan without validation
|
||||
kingfisher scan ~/src/myrepo --no-validate
|
||||
|
||||
# Fast mode: run as fast as possible by disabling Git commit metadata and Base64 decoding
|
||||
# (findings omit commit context and Base64-encoded secrets)
|
||||
kingfisher scan ~/src/myrepo --fast
|
||||
# Turbo mode: run as fast as possible by disabling Git commit metadata, Base64 decoding,
|
||||
# MIME sniffing, language detection, and tree-sitter parsing
|
||||
# (findings omit commit context, Base64-only matches, MIME type, and language metadata)
|
||||
kingfisher scan ~/src/myrepo --turbo
|
||||
|
||||
# Display only secrets confirmed active by third‑party APIs
|
||||
kingfisher scan /path/to/repo --only-valid
|
||||
|
|
@ -398,9 +409,10 @@ cat /path/to/file.py | kingfisher scan -
|
|||
# Limit maximum file size scanned (default: 256 MB)
|
||||
kingfisher scan /some/file --max-file-size 500
|
||||
|
||||
# Fast mode: equivalent to --commit-metadata=false --no-base64 for maximum speed
|
||||
# No Git commit metadata (author, date, hash) or Base64 decoding in findings
|
||||
kingfisher scan /path/to/repo --fast
|
||||
# Turbo mode: equivalent to --commit-metadata=false --no-base64 and disables MIME sniffing,
|
||||
# language detection/tree-sitter parsing for maximum speed
|
||||
# No Git commit metadata (author, date, hash), Base64 decoding, MIME, or language metadata in findings
|
||||
kingfisher scan /path/to/repo --turbo
|
||||
|
||||
# Scan using a rule family
|
||||
kingfisher scan /path/to/repo --rule kingfisher.aws
|
||||
|
|
|
|||
|
|
@ -110,7 +110,7 @@ impl ContentInspector {
|
|||
#[inline]
|
||||
#[must_use]
|
||||
pub fn guess_language(&self, path: &Path, content: &[u8]) -> Option<String> {
|
||||
// 1) Extension mapping (fast, no I/O).
|
||||
// 1) Extension mapping (turbo, no I/O).
|
||||
if let Some(ext) = path.extension().and_then(|e| e.to_str()) {
|
||||
if let Some(lang) = LanguageType::from_file_extension(&ext.to_ascii_lowercase()) {
|
||||
return Some(lang.name().to_string());
|
||||
|
|
|
|||
|
|
@ -151,9 +151,9 @@ pub struct ScanArgs {
|
|||
#[arg(global = true, long, default_value_t = false)]
|
||||
pub no_base64: bool,
|
||||
|
||||
/// Fast mode: equivalent to --commit-metadata=false --no-base64
|
||||
#[arg(global = true, long, default_value_t = false)]
|
||||
pub fast: bool,
|
||||
/// Turbo mode: equivalent to --commit-metadata=false --no-base64 and disables MIME sniffing, language detection, and tree-sitter parsing
|
||||
#[arg(global = true, long = "turbo", default_value_t = false)]
|
||||
pub turbo: bool,
|
||||
|
||||
/// Timeout for Git repository scanning in seconds
|
||||
#[arg(global = true, long, default_value_t = 1800, value_name = "SECONDS")]
|
||||
|
|
@ -490,7 +490,7 @@ impl ScanCommandArgs {
|
|||
self.scan_args.no_dedup = true;
|
||||
}
|
||||
|
||||
if self.scan_args.fast {
|
||||
if self.scan_args.turbo {
|
||||
self.scan_args.no_base64 = true;
|
||||
self.scan_args.input_specifier_args.commit_metadata = false;
|
||||
}
|
||||
|
|
|
|||
|
|
@ -961,7 +961,7 @@ pub(crate) fn create_minimal_scan_args() -> crate::cli::commands::scan::ScanArgs
|
|||
skip_aws_account_file: None,
|
||||
output_args: OutputArgs { output: None, format: ReportOutputFormat::Pretty },
|
||||
no_base64: false,
|
||||
fast: false,
|
||||
turbo: false,
|
||||
no_inline_ignore: false,
|
||||
no_ignore_if_contains: false,
|
||||
validation_timeout: 10,
|
||||
|
|
|
|||
|
|
@ -262,7 +262,7 @@ impl FindingsStore {
|
|||
// Origin::Extended(_) => "ext",
|
||||
// };
|
||||
|
||||
// // 64-bit key (fast, cheap, good dispersion)
|
||||
// // 64-bit key (turbo, cheap, good dispersion)
|
||||
// let key = xxh3_64(
|
||||
// format!(
|
||||
// "{}|{}|{}",
|
||||
|
|
|
|||
|
|
@ -577,7 +577,7 @@ fn create_default_scan_args() -> cli::commands::scan::ScanArgs {
|
|||
skip_aws_account_file: None,
|
||||
output_args: OutputArgs { output: None, format: ReportOutputFormat::Pretty },
|
||||
no_base64: false,
|
||||
fast: false,
|
||||
turbo: false,
|
||||
no_inline_ignore: false,
|
||||
no_ignore_if_contains: false,
|
||||
validation_timeout: 10,
|
||||
|
|
|
|||
|
|
@ -1781,7 +1781,7 @@ mod tests {
|
|||
view_report: false,
|
||||
redact: false,
|
||||
no_base64: false,
|
||||
fast: false,
|
||||
turbo: false,
|
||||
git_repo_timeout: 1_800,
|
||||
output_args: OutputArgs { output: None, format: ReportOutputFormat::Pretty },
|
||||
baseline_file: None,
|
||||
|
|
|
|||
|
|
@ -193,7 +193,7 @@ mod tests {
|
|||
skip_aws_account: Vec::new(),
|
||||
skip_aws_account_file: None,
|
||||
no_base64: false,
|
||||
fast: false,
|
||||
turbo: false,
|
||||
no_inline_ignore: false,
|
||||
no_ignore_if_contains: false,
|
||||
validation_timeout: 10,
|
||||
|
|
|
|||
|
|
@ -238,7 +238,14 @@ pub fn enumerate_filesystem_inputs(
|
|||
return Ok(());
|
||||
}
|
||||
progress.inc(blob.len().try_into().unwrap());
|
||||
match processor.run(origin, blob, args.no_dedup, args.redact, args.no_base64) {
|
||||
match processor.run(
|
||||
origin,
|
||||
blob,
|
||||
args.no_dedup,
|
||||
args.redact,
|
||||
args.no_base64,
|
||||
args.turbo,
|
||||
) {
|
||||
Ok(None) => {
|
||||
// nothing to record
|
||||
}
|
||||
|
|
|
|||
|
|
@ -28,13 +28,18 @@ impl<'a> BlobProcessor<'a> {
|
|||
no_dedup: bool,
|
||||
redact: bool,
|
||||
no_base64: bool,
|
||||
fast_mode: bool,
|
||||
) -> Result<Option<DatastoreMessage>> {
|
||||
let _span = debug_span!("matcher", temp_id = blob.temp_id()).entered();
|
||||
let t1 = Instant::now();
|
||||
let language_hint = origin
|
||||
.iter()
|
||||
.find_map(|p| p.blob_path())
|
||||
.and_then(|path| ContentInspector::default().guess_language(path, blob.bytes()));
|
||||
let language_hint = if fast_mode {
|
||||
None
|
||||
} else {
|
||||
origin
|
||||
.iter()
|
||||
.find_map(|p| p.blob_path())
|
||||
.and_then(|path| ContentInspector::default().guess_language(path, blob.bytes()))
|
||||
};
|
||||
let res =
|
||||
self.matcher.scan_blob(&blob, &origin, language_hint, redact, no_dedup, no_base64)?;
|
||||
let scan_us = t1.elapsed().as_micros();
|
||||
|
|
@ -66,7 +71,7 @@ impl<'a> BlobProcessor<'a> {
|
|||
if matches.is_empty() {
|
||||
return Ok(None);
|
||||
}
|
||||
let md = MetadataResult::from_blob_and_origin(&blob, &origin);
|
||||
let md = MetadataResult::from_blob_and_origin(&blob, &origin, fast_mode);
|
||||
let metadata = BlobMetadata {
|
||||
id: blob.id(),
|
||||
num_bytes: blob.len(),
|
||||
|
|
@ -117,12 +122,17 @@ struct MetadataResult {
|
|||
language: Option<String>,
|
||||
}
|
||||
impl MetadataResult {
|
||||
fn from_blob_and_origin(blob: &Blob, origin: &OriginSet) -> MetadataResult {
|
||||
fn from_blob_and_origin(blob: &Blob, origin: &OriginSet, fast_mode: bool) -> MetadataResult {
|
||||
let blob_path: Option<&'_ Path> = origin.iter().find_map(|p| p.blob_path());
|
||||
let bytes = blob.bytes();
|
||||
let mime_essence = Some(tree_magic_mini::from_u8(bytes).to_string());
|
||||
let inspector = ContentInspector::default();
|
||||
let language = blob_path.and_then(|p| inspector.guess_language(p, bytes));
|
||||
let mime_essence =
|
||||
if fast_mode { None } else { Some(tree_magic_mini::from_u8(bytes).to_string()) };
|
||||
let language = if fast_mode {
|
||||
None
|
||||
} else {
|
||||
let inspector = ContentInspector::default();
|
||||
blob_path.and_then(|p| inspector.guess_language(p, bytes))
|
||||
};
|
||||
MetadataResult { mime_essence, language }
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -863,7 +863,7 @@ pub async fn fetch_s3_objects(
|
|||
let blob = crate::blob::Blob::from_bytes(bytes);
|
||||
|
||||
if let Some((origin, blob_md, scored_matches)) =
|
||||
processor.run(origin, blob, args.no_dedup, args.redact, args.no_base64)?
|
||||
processor.run(origin, blob, args.no_dedup, args.redact, args.no_base64, args.turbo)?
|
||||
{
|
||||
// Wrap origin & metadata once:
|
||||
let origin_arc = Arc::new(origin);
|
||||
|
|
@ -945,7 +945,7 @@ pub async fn fetch_gcs_objects(
|
|||
let blob = crate::blob::Blob::from_bytes(bytes);
|
||||
|
||||
if let Some((origin, blob_md, scored_matches)) =
|
||||
processor.run(origin, blob, args.no_dedup, args.redact, args.no_base64)?
|
||||
processor.run(origin, blob, args.no_dedup, args.redact, args.no_base64, args.turbo)?
|
||||
{
|
||||
let origin_arc = Arc::new(origin);
|
||||
let blob_arc = Arc::new(blob_md);
|
||||
|
|
|
|||
|
|
@ -27,7 +27,7 @@ fn scan_fails_for_bad_rule_yaml() {
|
|||
tmp.path().to_str().unwrap(), // dummy input dir (exists)
|
||||
"--rules-path",
|
||||
tmp.path().to_str().unwrap(), // point loader at bad YAML
|
||||
"--no-validate", // keep the test fast
|
||||
"--no-validate", // keep the test turbo
|
||||
"--no-update-check", // skip update check to avoid network calls
|
||||
])
|
||||
.assert()
|
||||
|
|
|
|||
|
|
@ -61,3 +61,55 @@ fn keep_clones_defaults_to_false() -> anyhow::Result<()> {
|
|||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn turbo_mode_applies_speed_first_defaults() -> anyhow::Result<()> {
|
||||
let args = CommandLineArgs::try_parse_from([
|
||||
"kingfisher",
|
||||
"scan",
|
||||
".",
|
||||
"--turbo",
|
||||
"--no-update-check",
|
||||
])?;
|
||||
|
||||
let command = match args.command {
|
||||
Command::Scan(scan_args) => scan_args,
|
||||
other => panic!("unexpected command parsed: {:?}", other),
|
||||
};
|
||||
|
||||
let scan_args = match command.into_operation()? {
|
||||
ScanOperation::Scan(scan_args) => scan_args,
|
||||
op => panic!("expected scan operation, got {:?}", op),
|
||||
};
|
||||
|
||||
assert!(scan_args.turbo);
|
||||
assert!(scan_args.no_base64);
|
||||
assert!(!scan_args.input_specifier_args.commit_metadata);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn fast_alias_still_enables_turbo_mode() -> anyhow::Result<()> {
|
||||
let args = CommandLineArgs::try_parse_from([
|
||||
"kingfisher",
|
||||
"scan",
|
||||
".",
|
||||
"--turbo",
|
||||
"--no-update-check",
|
||||
])?;
|
||||
|
||||
let command = match args.command {
|
||||
Command::Scan(scan_args) => scan_args,
|
||||
other => panic!("unexpected command parsed: {:?}", other),
|
||||
};
|
||||
|
||||
let scan_args = match command.into_operation()? {
|
||||
ScanOperation::Scan(scan_args) => scan_args,
|
||||
op => panic!("expected scan operation, got {:?}", op),
|
||||
};
|
||||
|
||||
assert!(scan_args.turbo);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
|
|
|||
|
|
@ -155,7 +155,7 @@ fn run_skiplist(skip_regex: Vec<String>, skip_skipword: Vec<String>) -> Result<u
|
|||
skip_aws_account: Vec::new(),
|
||||
skip_aws_account_file: None,
|
||||
no_base64: false,
|
||||
fast: false,
|
||||
turbo: false,
|
||||
no_inline_ignore: false,
|
||||
no_ignore_if_contains: false,
|
||||
validation_retries: 1,
|
||||
|
|
|
|||
|
|
@ -154,7 +154,7 @@ fn test_bitbucket_remote_scan() -> Result<()> {
|
|||
skip_aws_account: Vec::new(),
|
||||
skip_aws_account_file: None,
|
||||
no_base64: false,
|
||||
fast: false,
|
||||
turbo: false,
|
||||
extra_ignore_comments: Vec::new(),
|
||||
no_inline_ignore: false,
|
||||
no_ignore_if_contains: false,
|
||||
|
|
|
|||
|
|
@ -174,7 +174,7 @@ rules:
|
|||
skip_aws_account: Vec::new(),
|
||||
skip_aws_account_file: None,
|
||||
no_base64: false,
|
||||
fast: false,
|
||||
turbo: false,
|
||||
extra_ignore_comments: Vec::new(),
|
||||
no_inline_ignore: false,
|
||||
no_ignore_if_contains: false,
|
||||
|
|
@ -223,7 +223,7 @@ rules:
|
|||
|
||||
#[test]
|
||||
fn test_dedup_branch() -> Result<()> {
|
||||
// A *single* runtime reused for both scans keeps the test fast
|
||||
// A *single* runtime reused for both scans keeps the test turbo
|
||||
let rt = Runtime::new().unwrap();
|
||||
|
||||
let findings_with_dups = run_scan(&rt, true)?; // keep duplicates
|
||||
|
|
|
|||
|
|
@ -161,7 +161,7 @@ fn test_github_remote_scan() -> Result<()> {
|
|||
skip_aws_account: Vec::new(),
|
||||
skip_aws_account_file: None,
|
||||
no_base64: false,
|
||||
fast: false,
|
||||
turbo: false,
|
||||
extra_ignore_comments: Vec::new(),
|
||||
no_inline_ignore: false,
|
||||
no_ignore_if_contains: false,
|
||||
|
|
|
|||
|
|
@ -160,7 +160,7 @@ fn test_gitlab_remote_scan() -> Result<()> {
|
|||
skip_aws_account: Vec::new(),
|
||||
skip_aws_account_file: None,
|
||||
no_base64: false,
|
||||
fast: false,
|
||||
turbo: false,
|
||||
no_inline_ignore: false,
|
||||
no_ignore_if_contains: false,
|
||||
validation_retries: 1,
|
||||
|
|
@ -326,7 +326,7 @@ fn test_gitlab_remote_scan_no_history() -> Result<()> {
|
|||
skip_aws_account: Vec::new(),
|
||||
skip_aws_account_file: None,
|
||||
no_base64: false,
|
||||
fast: false,
|
||||
turbo: false,
|
||||
extra_ignore_comments: Vec::new(),
|
||||
no_inline_ignore: false,
|
||||
no_ignore_if_contains: false,
|
||||
|
|
|
|||
|
|
@ -137,7 +137,7 @@ async fn test_redact_hashes_finding_values() -> Result<()> {
|
|||
skip_aws_account: Vec::new(),
|
||||
skip_aws_account_file: None,
|
||||
no_base64: false,
|
||||
fast: false,
|
||||
turbo: false,
|
||||
extra_ignore_comments: Vec::new(),
|
||||
no_inline_ignore: false,
|
||||
no_ignore_if_contains: false,
|
||||
|
|
|
|||
|
|
@ -143,7 +143,7 @@ impl TestContext {
|
|||
skip_aws_account: Vec::new(),
|
||||
skip_aws_account_file: None,
|
||||
no_base64: false,
|
||||
fast: false,
|
||||
turbo: false,
|
||||
no_inline_ignore: false,
|
||||
no_ignore_if_contains: false,
|
||||
validation_retries: 1,
|
||||
|
|
@ -295,7 +295,7 @@ async fn test_scan_slack_messages() -> Result<()> {
|
|||
skip_aws_account: Vec::new(),
|
||||
skip_aws_account_file: None,
|
||||
no_base64: false,
|
||||
fast: false,
|
||||
turbo: false,
|
||||
extra_ignore_comments: Vec::new(),
|
||||
no_inline_ignore: false,
|
||||
no_ignore_if_contains: false,
|
||||
|
|
|
|||
|
|
@ -217,7 +217,7 @@ async fn test_validation_cache_and_depvars() -> Result<()> {
|
|||
skip_aws_account: Vec::new(),
|
||||
skip_aws_account_file: None,
|
||||
no_base64: false,
|
||||
fast: false,
|
||||
turbo: false,
|
||||
extra_ignore_comments: Vec::new(),
|
||||
no_inline_ignore: false,
|
||||
no_ignore_if_contains: false,
|
||||
|
|
|
|||
|
|
@ -160,7 +160,7 @@ impl TestContext {
|
|||
skip_aws_account: Vec::new(),
|
||||
skip_aws_account_file: None,
|
||||
no_base64: false,
|
||||
fast: false,
|
||||
turbo: false,
|
||||
extra_ignore_comments: Vec::new(),
|
||||
no_inline_ignore: false,
|
||||
no_ignore_if_contains: false,
|
||||
|
|
@ -302,7 +302,7 @@ impl TestContext {
|
|||
skip_aws_account: Vec::new(),
|
||||
skip_aws_account_file: None,
|
||||
no_base64: false,
|
||||
fast: false,
|
||||
turbo: false,
|
||||
no_inline_ignore: false,
|
||||
no_ignore_if_contains: false,
|
||||
validation_retries: 1,
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue