diff --git a/CHANGELOG.md b/CHANGELOG.md index 639e9d2..41373bd 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,8 +4,8 @@ All notable changes to this project will be documented in this file. ## [1.46.0] - Improved rules: AWS, pem -- Added rule for Ollama, Weights and Biases, Cerebras, Friendli, Fireworks.ai, NVIDIA NIM, together.ai, zhipu, - +- Added rule for Ollama, Weights and Biases, Cerebras, Friendli, Fireworks.ai, NVIDIA NIM, together.ai, zhipu +- Added `self-update` command to update the binary independently. Now supports updating over homebrew managed binary ## [1.45.0] - Added `--repo-artifacts` flag to scan repository issues, gists/snippets, and wikis when cloning via `--git-url` diff --git a/README.md b/README.md index 56e5f18..10fdbf7 100644 --- a/README.md +++ b/README.md @@ -8,21 +8,12 @@ Kingfisher is a blazingly fast secret‑scanning and live validation tool built in Rust. It combines Intel’s hardware‑accelerated Hyperscan regex engine with language‑aware parsing via Tree‑Sitter, and **ships with hundreds of built‑in rules** to detect, validate, and triage secrets before they ever reach production

-Kingfisher originated as a fork of Praetorian's Nosey Parker, and is built atop their incredible work and the work contributed by the Nosey Parker community. - -## What Kingfisher Adds -- **Live validation** via cloud-provider APIs -- **Extra targets**: GitLab repos, S3 buckets, Docker images, Jira issues, Confluence pages, and Slack messages -- **Compressed Files**: Supports extracting and scanning compressed files for secrets -- **Baseline mode**: ignore known secrets, flag only new ones -- **Allowlist support**: suppress false positives with custom regexes or words -- **Language-aware detection** (source-code parsing) for ~20 languages -- **Native Windows** binary - +Originally forked from Praetorian’s Nosey Parker, Kingfisher adds live cloud-API validation; many more targets (GitLab, S3, Docker, Jira, Confluence, Slack); compressed-file extraction and scanning; baseline and allowlist controls; language-aware detection (~20 languages); and a native Windows binary. See [Origins and Divergence](#origins-and-divergence) for details. ## Key Features - **Performance**: multithreaded, Hyperscan‑powered scanning built for huge codebases - **Extensible rules**: hundreds of built-in detectors plus YAML-defined custom rules ([docs/RULES.md](/docs/RULES.md)) + - **Broad AI SaaS coverage**: finds and validates tokens for OpenAI, Anthropic, Google Gemini, Cohere, Mistral, Stability AI, Replicate, xAI (Grok), Ollama, Langchain, Perplexity, Weights & Biases, Cerebras, Friendli, Fireworks.ai, NVIDIA NIM, Together.ai, Zhipu, and many more - **Multiple targets**: - **Git history**: local repos or GitHub/GitLab orgs/users - **Repository artifacts**: with `--repo-artifacts`, scan GitHub/GitLab repository artifacts such as issues, pull/merge requests, wikis, snippets, and owner gists in addition to code @@ -154,18 +145,18 @@ docker run --rm \ # 🔐 Detection Rules at a Glance -Kingfisher ships with hundreds of rules that cover everything from classic cloud keys to the latest LLM-API secrets. Below is an overview: +Kingfisher ships with [hundreds of rules](/data/rules/) that cover everything from classic cloud keys to the latest AI SaaS tokens. Below is an overview: | Category | What we catch | |----------|---------------| -| **AI / LLM APIs** | OpenAI, Anthropic, Google Gemini, Cohere, Mistral, Stability AI, Replicate, xAI (Grok), and more -| **Cloud Providers** | AWS, Azure, GCP, Alibaba Cloud, DigitalOcean, IBM Cloud, Cloudflare, and more -| **Dev & CI/CD** | GitHub/GitLab tokens, CircleCI, TravisCI, TeamCity, Docker Hub, npm, PyPI, and more -| **Messaging & Comms** | Slack, Discord, Microsoft Teams, Twilio, Mailgun, SendGrid, Mailchimp, and more -| **Databases & Data Ops** | MongoDB Atlas, PlanetScale, Postgres DSNs, Grafana Cloud, Datadog, Dynatrace, and more -| **Payments & Billing** | Stripe, PayPal, Square, GoCardless, and more -| **Security & DevSecOps** | Snyk, Dependency-Track, CodeClimate, Codacy, OpsGenie, PagerDuty, and more -| **Misc. SaaS & Tools** | 1Password, Adobe, Atlassian/Jira, Asana, Netlify, Baremetrics, and more +| **AI SaaS APIs** | OpenAI, Anthropic, Google Gemini, Cohere, Mistral, Stability AI, Replicate, xAI (Grok), Ollama, Langchain, Perplexity, Weights & Biases, Cerebras, Friendli, Fireworks.ai, NVIDIA NIM, together.ai, Zhipu, and more | +| **Cloud Providers** | AWS, Azure, GCP, Alibaba Cloud, DigitalOcean, IBM Cloud, Cloudflare, and more | +| **Dev & CI/CD** | GitHub/GitLab tokens, CircleCI, TravisCI, TeamCity, Docker Hub, npm, PyPI, and more | +| **Messaging & Comms** | Slack, Discord, Microsoft Teams, Twilio, Mailgun, SendGrid, Mailchimp, and more | +| **Databases & Data Ops** | MongoDB Atlas, PlanetScale, Postgres DSNs, Grafana Cloud, Datadog, Dynatrace, and more | +| **Payments & Billing** | Stripe, PayPal, Square, GoCardless, and more | +| **Security & DevSecOps** | Snyk, Dependency-Track, CodeClimate, Codacy, OpsGenie, PagerDuty, and more | +| **Misc. SaaS & Tools** | 1Password, Adobe, Atlassian/Jira, Asana, Netlify, Baremetrics, and more | ## Write Custom Rules! @@ -543,9 +534,11 @@ Kingfisher automatically queries GitHub for a newer release when it starts and t - **Hands-free updates** – Add `--self-update` to any Kingfisher command - * If a newer version exists, Kingfisher will download it, replace the running binary, and re-launch itself with the **exact same arguments**. + * If a newer version exists, Kingfisher will download it, replace the running binary, and re-launch itself with the **exact same arguments**. * If the update fails or no newer release is found, the current run proceeds as normal +- **Manual update** – Run `kingfisher self-update` to update the binary without scanning + - **Disable version checks** – Pass `--no-update-check` to skip both the startup and shutdown checks entirely # Advanced Options @@ -661,6 +654,20 @@ Use `--rule-stats` to collect timing information for every rule. After scanning, kingfisher scan --help ``` + +## Origins and Divergence + +Kingfisher began as a fork of Praetorian’s Nosey Parker, as our experiment with adding live validation support and embedding that validation directly inside each rule. + +Since that initial fork, it has diverged heavily from Nosey Parker: +- Replaced the SQLite datastore with an in-memory store + Bloom filter +- Collapsed the workflow into a single scan-and-report phase with direct JSON/BSON/SARIF outputs +- Added Tree-Sitter parsing on top of Hyperscan for deeper language-aware detection +- Removed datastore-driven reporting/annotations in favor of live validation, baselines, allowlists, and compressed-file extraction +- Expanded support for new targets (GitLab, Jira, Confluence, Slack, S3, Docker, etc.) +- Delivered cross-platform builds, including native Windows + + # Roadmap - More rules diff --git a/data/rules/cerebras.yml b/data/rules/cerebras.yml index bb5ad17..af0f49c 100644 --- a/data/rules/cerebras.yml +++ b/data/rules/cerebras.yml @@ -33,4 +33,4 @@ rules: - "csk-6nptf4w5cx36fw58t3hkx48jvm52wm693pex5tjm29kn55yt" - "csk-e2knhj8h3h4erp6crfx6rh52tvecj4xnwmtjf3mtrvtt54et" - "csk-rhw8npjrp6kpv9phm55n5nv5rkkm4492jepx3yh65dc9cwe9" - - "csk-w6p3nxk3`c5249mrpmv642fffert28rwdkepffrpn8rtfr9h" + - "csk-w6p3nxk3dc5249mrpmv642fffert28rwdkepffrpn8rtfr9h" diff --git a/src/cli/global.rs b/src/cli/global.rs index c87e61e..8f761de 100644 --- a/src/cli/global.rs +++ b/src/cli/global.rs @@ -7,8 +7,7 @@ use sysinfo::{MemoryRefreshKind, RefreshKind, System}; use tracing::Level; use crate::cli::commands::{ - github::GitHubArgs, gitlab::GitLabArgs, rules::RulesArgs, - scan::ScanArgs, + github::GitHubArgs, gitlab::GitLabArgs, rules::RulesArgs, scan::ScanArgs, }; #[deny(missing_docs)] @@ -63,6 +62,10 @@ pub enum Command { /// Manage rules #[command(alias = "rule")] Rules(RulesArgs), + + /// Update the Kingfisher binary + #[command(name = "self-update")] + SelfUpdate, } pub static RAM_GB: Lazy> = Lazy::new(|| { diff --git a/src/main.rs b/src/main.rs index 38c0a88..15c1a8a 100644 --- a/src/main.rs +++ b/src/main.rs @@ -78,9 +78,10 @@ fn main() -> anyhow::Result<()> { // Determine the number of jobs, defaulting to the number of CPUs let num_jobs = match args.command { Command::Scan(ref scan_args) => scan_args.num_jobs, + Command::SelfUpdate => 1, // Self-update doesn't need a thread pool Command::GitHub(_) => num_cpus::get(), // Default for GitHub commands Command::GitLab(_) => num_cpus::get(), // Default for GitLab commands - Command::Rules(_) => num_cpus::get(), // Default for Rules commands + Command::Rules(_) => num_cpus::get(), // Default for Rules commands }; // Set up the Tokio runtime with the specified number of threads @@ -171,92 +172,97 @@ pub fn determine_exit_code(datastore: &Arc> } async fn async_main(args: CommandLineArgs) -> Result<()> { - // Create a temporary directory - let temp_dir = TempDir::new().context("Failed to create temporary directory")?; - let clone_dir = temp_dir.path().to_path_buf(); - - // Create the in-memory datastore - let datastore = Arc::new(Mutex::new(FindingsStore::new(clone_dir))); setup_logging(&args.global_args); - let update_msg = check_for_update(&args.global_args, None); + let global_args = args.global_args.clone(); + match args.command { - Command::Scan(mut scan_args) => { - // ————————————————————————————————————————— - // If no paths or a single "-", slurp stdin into a temp file - // ————————————————————————————————————————— - info!( - "Launching with {} concurrent scan jobs. Use --num-jobs to override.", - &scan_args.num_jobs - ); - let paths = &scan_args.input_specifier_args.path_inputs; - let is_dash = paths.iter().any(|p| p.as_os_str() == "-"); - if (paths.is_empty() || is_dash) && !atty::is(atty::Stream::Stdin) { - // read all stdin - let mut buf = Vec::new(); - std::io::stdin().read_to_end(&mut buf)?; - // write into temp_dir - let stdin_file = temp_dir.path().join("stdin_input"); - std::fs::write(&stdin_file, buf)?; - // replace inputs - scan_args.input_specifier_args.path_inputs = vec![stdin_file.into()]; - } - - // now proceed exactly as before - let rules_db = Arc::new(load_and_record_rules(&scan_args, &datastore)?); - run_scan(&args.global_args, &scan_args, &rules_db, Arc::clone(&datastore)).await?; - let exit_code = determine_exit_code(&datastore); - - if let Err(e) = temp_dir.close() { - eprintln!("Failed to close temporary directory: {}", e); - } - std::process::exit(exit_code); + Command::SelfUpdate => { + let mut g = global_args; + g.self_update = true; + g.no_update_check = false; + check_for_update(&g, None); + Ok(()) } - Command::Rules(ref rule_args) => match &rule_args.command { - RulesCommand::Check(check_args) => { - run_rules_check(&check_args)?; - } - RulesCommand::List(list_args) => { - run_rules_list(&list_args)?; - } - }, - Command::GitHub(github_args) => match github_args.command { - GitHubCommand::Repos(repos_command) => match repos_command { - GitHubReposCommand::List(list_args) => { - github::list_repositories( - github_args.github_api_url, - args.global_args.ignore_certs, - args.global_args.use_progress(), - &list_args.repo_specifiers.user, - &list_args.repo_specifiers.organization, - list_args.repo_specifiers.all_organizations, - list_args.repo_specifiers.repo_type.into(), - ) - .await?; + command => { + let temp_dir = TempDir::new().context("Failed to create temporary directory")?; + let clone_dir = temp_dir.path().to_path_buf(); + + let datastore = Arc::new(Mutex::new(FindingsStore::new(clone_dir))); + let update_msg = check_for_update(&global_args, None); + match command { + Command::Scan(mut scan_args) => { + info!( + "Launching with {} concurrent scan jobs. Use --num-jobs to override.", + &scan_args.num_jobs + ); + let paths = &scan_args.input_specifier_args.path_inputs; + let is_dash = paths.iter().any(|p| p.as_os_str() == "-"); + if (paths.is_empty() || is_dash) && !atty::is(atty::Stream::Stdin) { + let mut buf = Vec::new(); + std::io::stdin().read_to_end(&mut buf)?; + let stdin_file = temp_dir.path().join("stdin_input"); + std::fs::write(&stdin_file, buf)?; + scan_args.input_specifier_args.path_inputs = vec![stdin_file.into()]; + } + + let rules_db = Arc::new(load_and_record_rules(&scan_args, &datastore)?); + run_scan(&global_args, &scan_args, &rules_db, Arc::clone(&datastore)).await?; + let exit_code = determine_exit_code(&datastore); + + if let Err(e) = temp_dir.close() { + eprintln!("Failed to close temporary directory: {}", e); + } + std::process::exit(exit_code); } - }, - }, - Command::GitLab(gitlab_args) => match gitlab_args.command { - GitLabCommand::Repos(repos_command) => match repos_command { - GitLabReposCommand::List(list_args) => { - kingfisher::gitlab::list_repositories( - gitlab_args.gitlab_api_url, - args.global_args.ignore_certs, - args.global_args.use_progress(), - &list_args.repo_specifiers.user, - &list_args.repo_specifiers.group, - list_args.repo_specifiers.all_groups, - list_args.repo_specifiers.include_subgroups, - list_args.repo_specifiers.repo_type.into(), - ) - .await?; - } - }, - }, + Command::Rules(ref rule_args) => match &rule_args.command { + RulesCommand::Check(check_args) => { + run_rules_check(&check_args)?; + } + RulesCommand::List(list_args) => { + run_rules_list(&list_args)?; + } + }, + Command::GitHub(github_args) => match github_args.command { + GitHubCommand::Repos(repos_command) => match repos_command { + GitHubReposCommand::List(list_args) => { + github::list_repositories( + github_args.github_api_url, + global_args.ignore_certs, + global_args.use_progress(), + &list_args.repo_specifiers.user, + &list_args.repo_specifiers.organization, + list_args.repo_specifiers.all_organizations, + list_args.repo_specifiers.repo_type.into(), + ) + .await?; + } + }, + }, + Command::GitLab(gitlab_args) => match gitlab_args.command { + GitLabCommand::Repos(repos_command) => match repos_command { + GitLabReposCommand::List(list_args) => { + kingfisher::gitlab::list_repositories( + gitlab_args.gitlab_api_url, + global_args.ignore_certs, + global_args.use_progress(), + &list_args.repo_specifiers.user, + &list_args.repo_specifiers.group, + list_args.repo_specifiers.all_groups, + list_args.repo_specifiers.include_subgroups, + list_args.repo_specifiers.repo_type.into(), + ) + .await?; + } + }, + }, + Command::SelfUpdate => unreachable!(), + } + if let Some(msg) = update_msg { + info!("{msg}"); + } + Ok(()) + } } - if let Some(msg) = update_msg { - info!("{msg}"); - } - Ok(()) } /// Create a default ScanArgs instance for rule loading diff --git a/src/update.rs b/src/update.rs index 8f66c59..76629be 100644 --- a/src/update.rs +++ b/src/update.rs @@ -15,11 +15,7 @@ // `style_finding_active_heading` style so that they stand out alongside normal // scan output. -use std::{ - fs, - io::{ErrorKind, IsTerminal}, - path::PathBuf, -}; +use std::io::{ErrorKind, IsTerminal}; use self_update::{backends::github::Update, cargo_crate_version, errors::Error as UpdError}; use semver::Version; @@ -27,17 +23,6 @@ use tracing::{error, info, warn}; use crate::{cli::global::GlobalArgs, reporter::styles::Styles}; -/// Return `true` when the canonical executable path lives inside a Homebrew Cellar. -/// Works for Intel macOS (/usr/local/Cellar), Apple‑Silicon macOS (/opt/homebrew/Cellar) -/// and Linuxbrew (~/.linuxbrew/Cellar). -fn installed_via_homebrew() -> bool { - fn canonical_exe() -> Option { - std::env::current_exe().ok().and_then(|p| fs::canonicalize(p).ok()) - } - - canonical_exe().map(|p| p.components().any(|c| c.as_os_str() == "Cellar")).unwrap_or(false) -} - /// Check GitHub for a newer Kingfisher release and optionally self‑update. /// /// * `base_url` lets tests point at a mock server. @@ -51,16 +36,6 @@ pub fn check_for_update(global_args: &GlobalArgs, base_url: Option<&str>) -> Opt let use_color = std::io::stderr().is_terminal() && !global_args.quiet; let styles = Styles::new(use_color); - let is_brew = installed_via_homebrew(); - if is_brew { - info!( - "{}", - styles.style_finding_active_heading.apply_to( - "Homebrew install detected - will notify about updates but not self-update" - ) - ); - } - info!("{}", "Checking for updates…"); let mut builder = Update::configure(); @@ -145,7 +120,7 @@ pub fn check_for_update(global_args: &GlobalArgs, base_url: Option<&str>) -> Opt info!("{}", styles.style_finding_active_heading.apply_to(&plain)); // Attempt self‑update when allowed and feasible. - if global_args.self_update && !is_brew { + if global_args.self_update { match updater.update() { Ok(status) => info!( "{}", @@ -167,13 +142,6 @@ pub fn check_for_update(global_args: &GlobalArgs, base_url: Option<&str>) -> Opt _ => error!("Failed to update: {e}"), }, } - } else if is_brew { - info!( - "{}", - styles - .style_finding_active_heading - .apply_to("Run `brew upgrade kingfisher` to install the new version.") - ); } Some(plain) diff --git a/src/validation/jwt.rs b/src/validation/jwt.rs index 1f9b2e7..a3ee9c7 100644 --- a/src/validation/jwt.rs +++ b/src/validation/jwt.rs @@ -101,7 +101,6 @@ pub async fn validate_jwt_with(token: &str, opts: &ValidateOptions) -> Result<(b let header_val: serde_json::Value = serde_json::from_slice(&header_json).map_err(|e| anyhow!("invalid header json: {e}"))?; let alg_str = header_val.get("alg").and_then(|v| v.as_str()).unwrap_or(""); - // --- Policy: reject `alg: none` unless explicitly allowed ------------------ if alg_str.eq_ignore_ascii_case("none") { @@ -119,7 +118,7 @@ pub async fn validate_jwt_with(token: &str, opts: &ValidateOptions) -> Result<(b return Ok((false, "unsigned JWT (alg: none) not allowed".into())); } } - + // Safe to decode full header now that we know alg != none let header = decode_header(token).map_err(|e| anyhow!("decode header: {e}"))?; let alg = header.alg;