diff --git a/CHANGELOG.md b/CHANGELOG.md index 49dfe11..03f7d61 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,10 @@ All notable changes to this project will be documented in this file. ## [v1.85.0] +- Report viewer: added `--view-report-port` and `--view-report-address` to `kingfisher scan --view-report`, and `--address` to `kingfisher view`, so the embedded report server can bind to `0.0.0.0` and be reached from the host when running in Docker. Use `--view-report-address 0.0.0.0` with `-p 7890:7890` (or `--view-report-port 7891` with `-p 7891:7891`) to view the HTML report at http://localhost:7890 from your host. +- Updated `kingfisher scan` to accept Git repository URLs as positional targets (for example `kingfisher scan github.com/org/repo` or `kingfisher scan https://gitlab.com/group/project.git`) without requiring `--git-url`. +- Deprecated `--git-url` while preserving backward compatibility; using the flag now emits a migration warning to prefer positional URL targets. +- Updated README/integration/usage/install/demo examples and CLI tests to use positional Git URL scanning syntax. - Added `--turbo` mode: sets `--commit-metadata=false`, `--no-base64`, disables language detection, and disables tree-sitter parsing...for maximum scan speed. Findings will omit Git commit context (author, date, commit hash) and will not include Base64-decoded secrets. - SQLite database scanning: kingfisher now detects and extracts SQLite files (`.db`, `.sqlite`, `.sqlite3`, etc.), dumping each table as SQL text with named columns so secrets stored in database rows are scannable. Controlled by the existing `--extract-archives` flag. - Python bytecode (.pyc) scanning: extracts string constants from compiled Python (`.pyc`, `.pyo`) files via marshal parsing so secrets embedded in bytecode are scannable. Controlled by `--extract-archives`. diff --git a/README.md b/README.md index 3d65bbd..d4be8f5 100644 --- a/README.md +++ b/README.md @@ -75,7 +75,7 @@ NOTE: Replay has been slowed down for demo ## Report Viewer Demo Explore Kingfisher's built-in report viewer and its `--access-map`, which can show what the token (AWS, GCP, Azure, GitHub, GitLab, and Slack...more coming) can actually access. -Note: when you pass `--view-report`, Kingfisher starts a **localhost-only** web server on port `7890` and opens it in your default browser. You'll see this near the end of the scan output, and **Kingfisher will keep running** until you stop it. +Note: when you pass `--view-report`, Kingfisher starts a web server on port `7890` (default) and opens it in your default browser. By default it binds to `127.0.0.1` for security. You'll see this near the end of the scan output, and **Kingfisher will keep running** until you stop it. ```bash INFO kingfisher::cli::commands::view: Starting access-map viewer address=127.0.0.1:7890 @@ -242,13 +242,44 @@ KF_SLACK_TOKEN="xoxp-..." kingfisher scan slack "api_key OR password" docker run --rm -v "$PWD":/src ghcr.io/mongodb/kingfisher:latest scan /src ``` -### 19: Output JSON results +### 19: Run with Docker and view report in browser + +To run a scan in Docker and view the HTML report on your host machine, use `--view-report-address 0.0.0.0` so the server is reachable from outside the container, and map the port with `-p`: + +```bash +docker run --rm \ + -v "$PWD":/src \ + -p 7890:7890 \ + ghcr.io/mongodb/kingfisher:latest \ + scan https://github.com/leaktk/fake-leaks \ + --access-map \ + --view-report \ + --view-report-address 0.0.0.0 +``` + +Then open **http://localhost:7890** in your browser. If port 7890 is already in use, use `--view-report-port` and map accordingly: + +```bash +docker run --rm \ + -v "$PWD":/src \ + -p 7891:7891 \ + ghcr.io/mongodb/kingfisher:latest \ + scan https://github.com/leaktk/fake-leaks \ + --access-map \ + --view-report \ + --view-report-port 7891 \ + --view-report-address 0.0.0.0 +``` + +Then open **http://localhost:7891**. + +### 20: Output JSON results ```bash kingfisher scan /path/to/code --format json --output findings.json ``` -### 20: Map blast radius of discovered credentials +### 21: Map blast radius of discovered credentials ```bash kingfisher scan /path/to/code --access-map --view-report diff --git a/docs/ADVANCED.md b/docs/ADVANCED.md index 8346010..cd19397 100644 --- a/docs/ADVANCED.md +++ b/docs/ADVANCED.md @@ -240,11 +240,10 @@ kingfisher scan /path/to/local/repo --branch kingfisher scan C:\\src\\repo --branch ``` -The same diff-focused workflow works when cloning repositories on the fly with `--git-url`. Kingfisher automatically tries remote-tracking names like `origin/main` and `origin/feature-1`, so you can target the branches involved in a pull request without performing a local checkout first. +The same diff-focused workflow works when cloning repositories on the fly by passing a Git URL directly to `scan`. Kingfisher automatically tries remote-tracking names like `origin/main` and `origin/feature-1`, so you can target the branches involved in a pull request without performing a local checkout first. ```bash -kingfisher scan \ - --git-url https://github.com/org/repo.git \ +kingfisher scan https://github.com/org/repo.git \ --since-commit main \ --branch development ``` @@ -256,16 +255,14 @@ When `--since-commit` is omitted, specifying `--branch` scans the requested ref kingfisher scan ~/tmp/repo --branch feature-123 # Or scan a branch when cloning on the fly -kingfisher scan \ - --git-url https://github.com/org/repo.git \ +kingfisher scan https://github.com/org/repo.git \ --branch origin/feature-123 ``` -In CI systems that expose the base and head commits explicitly, you can pass those SHAs directly while still using `--git-url`: +In CI systems that expose the base and head commits explicitly, you can pass those SHAs directly while scanning a Git URL: ```bash -kingfisher scan \ - --git-url git@github.com:org/repo.git \ +kingfisher scan https://github.com/org/repo.git \ --since-commit "$BASE_COMMIT" \ --branch "$PR_HEAD_COMMIT" ``` @@ -341,8 +338,8 @@ kingfisher scan /path/to/repo --rule-stats - `--no-base64`: By default, Kingfisher finds and decodes base64 blobs and scans them for secrets. This adds a slight performance overhead; use this flag to disable - `--confidence `: (low|medium|high) - `--min-entropy `: Override default threshold -- `--include-contributors`: When using `--git-url` for GitHub or GitLab, include contributor-owned repos in the scan -- `--git-clone-dir `: Choose the parent directory for cloned repos and scan artifacts (use with `--git-url`) +- `--include-contributors`: When scanning GitHub or GitLab URLs, include contributor-owned repos in the scan +- `--git-clone-dir `: Choose the parent directory for cloned repos and scan artifacts (use with Git URL scans) - `--keep-clones`: Preserve cloned repositories on disk after a scan completes - `--repo-clone-limit `: Cap the number of GitHub/GitLab repositories cloned when enumerating orgs/groups or contributor repos - `--no-binary`: Skip binary files diff --git a/docs/INSTALLATION.md b/docs/INSTALLATION.md index a7e1d32..d562092 100644 --- a/docs/INSTALLATION.md +++ b/docs/INSTALLATION.md @@ -341,7 +341,7 @@ docker run --rm \ -e KF_GITHUB_TOKEN=ghp_… \ -v "$PWD":/proj \ ghcr.io/mongodb/kingfisher:latest \ - scan --git-url https://github.com/org/private_repo.git + scan https://github.com/org/private_repo.git # Scan an S3 bucket # Credentials can come from KF_AWS_KEY/KF_AWS_SECRET, --role-arn, or --profile @@ -377,6 +377,15 @@ docker run --rm \ scan /src \ --format json \ --output /out/findings.json + +# Scan and view the HTML report in your browser (Docker) +# Use --view-report-address 0.0.0.0 and -p to expose the report server to the host +docker run --rm \ + -v "$PWD":/src \ + -p 7890:7890 \ + ghcr.io/mongodb/kingfisher:latest \ + scan /src --access-map --view-report --view-report-address 0.0.0.0 +# Then open http://localhost:7890 in your browser ``` ## PyPI Wheels diff --git a/docs/INTEGRATIONS.md b/docs/INTEGRATIONS.md index cc8f6aa..81b4d1b 100644 --- a/docs/INTEGRATIONS.md +++ b/docs/INTEGRATIONS.md @@ -157,7 +157,8 @@ kingfisher scan github --organization my-org \ ### Scan remote GitHub repository -`--git-url` clones the repository and scans its files and history. When the URL +Pass a repository URL as a positional scan target to clone and scan its files and history. +(The legacy `--git-url` flag still works but is deprecated.) When the URL targets GitHub and you pass `--include-contributors`, Kingfisher enumerates repository contributors and attempts to clone **all public repos owned by those contributors**—a common offensive and blue-team pivot when developers leak @@ -176,9 +177,9 @@ extras counts against API rate limits and private artifacts require a Use `--git-clone-dir` to choose where cloned repositories land and `--keep-clones` to preserve them for follow-on analysis. -> **Why does `--git-url` sometimes report fewer findings than scanning a local checkout?**. +> **Why can scanning a remote URL report fewer findings than scanning a local checkout?**. > -> Remote clones created via `--git-url` default to `--mirror`/bare mode so Kingfisher only +> Remote clones default to `--mirror`/bare mode so Kingfisher only > reads the Git history. When you point Kingfisher at an existing working tree (for example > `kingfisher scan ./repo`), it enumerates both the filesystem contents *and* the Git > history. Any secrets that are present in the checked-out files therefore appear twice: @@ -188,23 +189,23 @@ Use `--git-clone-dir` to choose where cloned repositories land and ```bash # Scan the repository only -kingfisher scan --git-url https://github.com/org/repo.git +kingfisher scan github.com/org/repo # Scan the repository plus contributor repos, but cap the crawl -kingfisher scan --git-url https://github.com/org/repo.git \ +kingfisher scan https://github.com/org/repo.git \ --include-contributors \ --repo-clone-limit 250 # Keep clones for later manual inspection -kingfisher scan --git-url https://github.com/org/repo.git \ +kingfisher scan https://github.com/org/repo.git \ --git-clone-dir ./kingfisher-clones \ --keep-clones # Include issues, wiki, and owner gists -kingfisher scan --git-url https://github.com/org/repo.git --repo-artifacts +kingfisher scan https://github.com/org/repo.git --repo-artifacts # Private repositories or artifacts -KF_GITHUB_TOKEN="ghp_…" kingfisher scan --git-url https://github.com/org/private_repo.git --repo-artifacts +KF_GITHUB_TOKEN="ghp_…" kingfisher scan https://github.com/org/private_repo.git --repo-artifacts ``` ## GitLab @@ -239,7 +240,7 @@ kingfisher scan gitlab --group my-group \ ### Scan remote GitLab repository by URL -`--git-url` by itself clones the project repository. When the URL targets +A Git URL target by itself clones the project repository. When the URL targets GitLab and you pass `--include-contributors`, Kingfisher enumerates contributors and tries to clone **their other public projects** to catch secrets that escape the main repo. Apply `--repo-clone-limit` to cap the total repos cloned during @@ -258,23 +259,23 @@ to preserve them for later review. ```bash # Scan the repository only -kingfisher scan --git-url https://gitlab.com/group/project.git +kingfisher scan gitlab.com/group/project.git # Scan the repository plus contributor projects, but cap the crawl -kingfisher scan --git-url https://gitlab.com/group/project.git \ +kingfisher scan https://gitlab.com/group/project.git \ --include-contributors \ --repo-clone-limit 250 # Keep clones for later manual inspection -kingfisher scan --git-url https://gitlab.com/group/project.git \ +kingfisher scan https://gitlab.com/group/project.git \ --git-clone-dir ./kingfisher-clones \ --keep-clones # Include issues, wiki, and snippets -kingfisher scan --git-url https://gitlab.com/group/project.git --repo-artifacts +kingfisher scan https://gitlab.com/group/project.git --repo-artifacts # Private projects or artifacts -KF_GITLAB_TOKEN="glpat-…" kingfisher scan --git-url https://gitlab.com/group/private_project.git --repo-artifacts +KF_GITLAB_TOKEN="glpat-…" kingfisher scan https://gitlab.com/group/private_project.git --repo-artifacts ``` ### List GitLab repositories @@ -360,17 +361,17 @@ kingfisher scan gitea --organization my-org \ ### Scan remote Gitea repository by URL -`--git-url` clones the repository and scans its history. Adding `--repo-artifacts` +A Git URL target clones the repository and scans its history. Adding `--repo-artifacts` also clones the repository wiki if one exists. Private repositories and wikis require `KF_GITEA_TOKEN` (and `KF_GITEA_USERNAME` when cloning via HTTPS). ```bash # Scan the repository only -kingfisher scan --git-url https://gitea.com/org/repo.git +kingfisher scan https://gitea.com/org/repo.git # Include the repository wiki (if present) KF_GITEA_TOKEN="gtoken" KF_GITEA_USERNAME="org" \ - kingfisher scan --git-url https://gitea.com/org/repo.git --repo-artifacts + kingfisher scan https://gitea.com/org/repo.git --repo-artifacts ``` ### List Gitea repositories @@ -414,17 +415,17 @@ kingfisher scan bitbucket --workspace my-team \ ### Scan remote Bitbucket repository by URL -`--git-url` clones the repository and scans its files and history. To inspect +A Git URL target clones the repository and scans its files and history. To inspect Bitbucket artifacts such as issues, add `--repo-artifacts`. Private artifacts require credentials (see [Authenticate to Bitbucket](#authenticate-to-bitbucket)). ```bash # Scan the repository only -kingfisher scan --git-url https://bitbucket.org/hashashash/secretstest.git +kingfisher scan https://bitbucket.org/hashashash/secretstest.git # Include repository issues KF_BITBUCKET_TOKEN="$BITBUCKET_TOKEN" \ - kingfisher scan --git-url https://bitbucket.org/workspace/project.git --repo-artifacts + kingfisher scan https://bitbucket.org/workspace/project.git --repo-artifacts ``` ### List Bitbucket repositories diff --git a/docs/USAGE.md b/docs/USAGE.md index 7916539..50215f7 100644 --- a/docs/USAGE.md +++ b/docs/USAGE.md @@ -105,7 +105,7 @@ Add `--access-map` to enrich JSON, JSONL, BSON, pretty, and SARIF reports with a kingfisher view kingfisher.json ``` -The `view` subcommand starts a local-only server (default port `7890`) that bundles the HTML, CSS, and JavaScript for the access-map viewer directly into the Kingfisher binary. Provide a JSON or JSONL report to load it automatically and Kingfisher will open your browser, or open the page and upload a report in the browser. If port 7890 is already in use, Kingfisher will exit and tell you to re-run with `--port `. +The `view` subcommand starts a server (default port `7890`, bind address `127.0.0.1`) that bundles the HTML, CSS, and JavaScript for the access-map viewer directly into the Kingfisher binary. Provide a JSON or JSONL report to load it automatically and Kingfisher will open your browser, or open the page and upload a report in the browser. If port 7890 is already in use, re-run with `--port `. To allow access from Docker or other hosts, use `--address 0.0.0.0`. ### Pipe any text directly into Kingfisher by passing `-` @@ -348,11 +348,10 @@ kingfisher scan /path/to/local/repo --branch kingfisher scan C:\\src\\repo --branch ``` -The same diff-focused workflow works when cloning repositories on the fly with `--git-url`. Kingfisher automatically tries remote-tracking names like `origin/main` and `origin/feature-1`, so you can target the branches involved in a pull request without performing a local checkout first. +The same diff-focused workflow works when cloning repositories on the fly by passing a Git URL directly to `scan`. Kingfisher automatically tries remote-tracking names like `origin/main` and `origin/feature-1`, so you can target the branches involved in a pull request without performing a local checkout first. ```bash -kingfisher scan \ - --git-url https://github.com/org/repo.git \ +kingfisher scan https://github.com/org/repo.git \ --since-commit main \ --branch development ``` @@ -364,16 +363,14 @@ When `--since-commit` is omitted, specifying `--branch` scans the requested ref kingfisher scan ~/tmp/repo --branch feature-123 # Or scan a branch when cloning on the fly -kingfisher scan \ - --git-url https://github.com/org/repo.git \ +kingfisher scan https://github.com/org/repo.git \ --branch origin/feature-123 ``` -In CI systems that expose the base and head commits explicitly, you can pass those SHAs directly while still using `--git-url`: +In CI systems that expose the base and head commits explicitly, you can pass those SHAs directly while scanning a Git URL: ```bash -kingfisher scan \ - --git-url git@github.com:org/repo.git \ +kingfisher scan https://github.com/org/repo.git \ --since-commit "$BASE_COMMIT" \ --branch "$PR_HEAD_COMMIT" ``` @@ -530,7 +527,7 @@ kingfisher scan github --organization my-org \ ### Scan remote GitHub repository -`--git-url` clones the repository and scans its files and history. When the URL targets GitHub and you pass `--include-contributors`, Kingfisher enumerates repository contributors and attempts to clone **all public repos owned by those contributors**—a common offensive and blue-team pivot when developers leak secrets in personal or side projects. Use `--repo-clone-limit` to cap how many repositories are cloned during this enumeration. +Pass a repository URL as a positional scan target to clone and scan its files and history. (The legacy `--git-url` flag still works but is deprecated.) When the URL targets GitHub and you pass `--include-contributors`, Kingfisher enumerates repository contributors and attempts to clone **all public repos owned by those contributors**—a common offensive and blue-team pivot when developers leak secrets in personal or side projects. Use `--repo-clone-limit` to cap how many repositories are cloned during this enumeration. **NOTE**: This may cause you to be temporarily rate-limited by GitHub. Providing a token (`KF_GITHUB_TOKEN`) will provide a higher rate limit. @@ -538,29 +535,29 @@ To inspect related server-side data, supply `--repo-artifacts`. This flag pulls Use `--git-clone-dir` to choose where cloned repositories land and `--keep-clones` to preserve them for follow-on analysis. -> **Why does `--git-url` sometimes report fewer findings than scanning a local checkout?**. +> **Why can scanning a remote URL report fewer findings than scanning a local checkout?**. > -> Remote clones created via `--git-url` default to `--mirror`/bare mode so Kingfisher only reads the Git history. When you point Kingfisher at an existing working tree (for example `kingfisher scan ./repo`), it enumerates both the filesystem contents *and* the Git history. Any secrets that are present in the checked-out files therefore appear twice: once from the working tree path and once from the commit where the secret entered the history. To replicate the remote behavior locally, either scan a bare clone or disable history scanning with `--git-history none` when targeting a working tree. +> Remote clones default to `--mirror`/bare mode so Kingfisher only reads the Git history. When you point Kingfisher at an existing working tree (for example `kingfisher scan ./repo`), it enumerates both the filesystem contents *and* the Git history. Any secrets that are present in the checked-out files therefore appear twice: once from the working tree path and once from the commit where the secret entered the history. To replicate the remote behavior locally, either scan a bare clone or disable history scanning with `--git-history none` when targeting a working tree. ```bash # Scan the repository only -kingfisher scan --git-url https://github.com/org/repo.git +kingfisher scan github.com/org/repo # Scan the repository plus contributor repos, but cap the crawl -kingfisher scan --git-url https://github.com/org/repo.git \ +kingfisher scan https://github.com/org/repo.git \ --include-contributors \ --repo-clone-limit 250 # Keep clones for later manual inspection -kingfisher scan --git-url https://github.com/org/repo.git \ +kingfisher scan https://github.com/org/repo.git \ --git-clone-dir ./kingfisher-clones \ --keep-clones # Include issues, wiki, and owner gists -kingfisher scan --git-url https://github.com/org/repo.git --repo-artifacts +kingfisher scan https://github.com/org/repo.git --repo-artifacts # Private repositories or artifacts -KF_GITHUB_TOKEN="ghp_…" kingfisher scan --git-url https://github.com/org/private_repo.git --repo-artifacts +KF_GITHUB_TOKEN="ghp_…" kingfisher scan https://github.com/org/private_repo.git --repo-artifacts ``` --- @@ -594,7 +591,7 @@ kingfisher scan gitlab --group my-group \ ### Scan remote GitLab repository by URL -`--git-url` by itself clones the project repository. When the URL targets GitLab and you pass `--include-contributors`, Kingfisher enumerates contributors and tries to clone **their other public projects** to catch secrets that escape the main repo. Apply `--repo-clone-limit` to cap the total repos cloned during this pivot. +A Git URL target by itself clones the project repository. When the URL targets GitLab and you pass `--include-contributors`, Kingfisher enumerates contributors and tries to clone **their other public projects** to catch secrets that escape the main repo. Apply `--repo-clone-limit` to cap the total repos cloned during this pivot. **NOTE**: This may cause you to be temporarily rate-limited by GitLab. Providing a token (`KF_GITLAB_TOKEN`) will provide a higher rate limit. @@ -604,23 +601,23 @@ Use `--git-clone-dir` to choose where cloned projects land and `--keep-clones` t ```bash # Scan the repository only -kingfisher scan --git-url https://gitlab.com/group/project.git +kingfisher scan gitlab.com/group/project.git # Scan the repository plus contributor projects, but cap the crawl -kingfisher scan --git-url https://gitlab.com/group/project.git \ +kingfisher scan https://gitlab.com/group/project.git \ --include-contributors \ --repo-clone-limit 250 # Keep clones for later manual inspection -kingfisher scan --git-url https://gitlab.com/group/project.git \ +kingfisher scan https://gitlab.com/group/project.git \ --git-clone-dir ./kingfisher-clones \ --keep-clones # Include issues, wiki, and snippets -kingfisher scan --git-url https://gitlab.com/group/project.git --repo-artifacts +kingfisher scan https://gitlab.com/group/project.git --repo-artifacts # Private projects or artifacts -KF_GITLAB_TOKEN="glpat-…" kingfisher scan --git-url https://gitlab.com/group/private_project.git --repo-artifacts +KF_GITLAB_TOKEN="glpat-…" kingfisher scan https://gitlab.com/group/private_project.git --repo-artifacts ``` ### List GitLab repositories @@ -705,15 +702,15 @@ kingfisher scan gitea --organization my-org \ ### Scan remote Gitea repository by URL -`--git-url` clones the repository and scans its history. Adding `--repo-artifacts` also clones the repository wiki if one exists. Private repositories and wikis require `KF_GITEA_TOKEN` (and `KF_GITEA_USERNAME` when cloning via HTTPS). +A Git URL target clones the repository and scans its history. Adding `--repo-artifacts` also clones the repository wiki if one exists. Private repositories and wikis require `KF_GITEA_TOKEN` (and `KF_GITEA_USERNAME` when cloning via HTTPS). ```bash # Scan the repository only -kingfisher scan --git-url https://gitea.com/org/repo.git +kingfisher scan https://gitea.com/org/repo.git # Include the repository wiki (if present) KF_GITEA_TOKEN="gtoken" KF_GITEA_USERNAME="org" \ - kingfisher scan --git-url https://gitea.com/org/repo.git --repo-artifacts + kingfisher scan https://gitea.com/org/repo.git --repo-artifacts ``` ### List Gitea repositories @@ -757,15 +754,15 @@ kingfisher scan bitbucket --workspace my-team \ ### Scan remote Bitbucket repository by URL -`--git-url` clones the repository and scans its files and history. To inspect Bitbucket artifacts such as issues, add `--repo-artifacts`. Private artifacts require credentials (see [Authenticate to Bitbucket](#authenticate-to-bitbucket)). +A Git URL target clones the repository and scans its files and history. To inspect Bitbucket artifacts such as issues, add `--repo-artifacts`. Private artifacts require credentials (see [Authenticate to Bitbucket](#authenticate-to-bitbucket)). ```bash # Scan the repository only -kingfisher scan --git-url https://bitbucket.org/hashashash/secretstest.git +kingfisher scan https://bitbucket.org/hashashash/secretstest.git # Include repository issues KF_BITBUCKET_TOKEN="$BITBUCKET_TOKEN" \ - kingfisher scan --git-url https://bitbucket.org/workspace/project.git --repo-artifacts + kingfisher scan https://bitbucket.org/workspace/project.git --repo-artifacts ``` ### List Bitbucket repositories diff --git a/docs/demos/kingfisher-usage-access-map-01.tape b/docs/demos/kingfisher-usage-access-map-01.tape index 61e38a0..005dfcd 100644 --- a/docs/demos/kingfisher-usage-access-map-01.tape +++ b/docs/demos/kingfisher-usage-access-map-01.tape @@ -6,7 +6,7 @@ Set TypingSpeed 60ms Set Framerate 60 Set PlaybackSpeed 1.3 -Type "kingfisher scan --git-url https://github.com/leaktk/fake-leaks.git --access-map --view-report" +Type "kingfisher scan https://github.com/leaktk/fake-leaks.git --access-map --view-report" Enter Wait+Screen@30s /(report|findings|summary|kingfisher)/ diff --git a/src/cli/commands/inputs.rs b/src/cli/commands/inputs.rs index e1beef7..abaa4fe 100644 --- a/src/cli/commands/inputs.rs +++ b/src/cli/commands/inputs.rs @@ -31,8 +31,8 @@ pub struct InputSpecifierArgs { #[arg(num_args = 0.., value_hint = ValueHint::AnyPath)] pub path_inputs: Vec, - /// Clone and scan the Git repository at the given URL - #[arg(long, value_hint = ValueHint::Url)] + /// Deprecated: clone and scan a Git repository URL. Prefer positional targets: `kingfisher scan github.com/org/repo` + #[arg(long = "git-url", value_hint = ValueHint::Url)] pub git_url: Vec, /// Parent directory for cloned Git repositories and scan artifacts @@ -421,7 +421,14 @@ impl InputSpecifierArgs { } /// Emit deprecation warnings for legacy top-level provider flags. - pub fn emit_deprecated_warnings(&self) { + pub fn emit_deprecated_warnings(&self, used_legacy_git_url_flag: bool) { + if used_legacy_git_url_flag { + warn_deprecated_provider( + "Git URL", + "Passing repository URLs with `--git-url` is deprecated. Pass the URL as a positional scan target instead, e.g. `kingfisher scan github.com/org/repo`.", + ); + } + if self.using_legacy_github_flags() { warn_deprecated_provider( "GitHub", diff --git a/src/cli/commands/scan.rs b/src/cli/commands/scan.rs index b378d1f..74cc48d 100644 --- a/src/cli/commands/scan.rs +++ b/src/cli/commands/scan.rs @@ -1,6 +1,10 @@ use anyhow::bail; use clap::{Args, Subcommand, ValueEnum, ValueHint}; -use std::path::{Path, PathBuf}; +use std::{ + net::IpAddr, + path::{Path, PathBuf}, + str::FromStr, +}; use strum::Display; use tracing::debug; use url::Url; @@ -17,6 +21,7 @@ use crate::{ inputs::{ContentFilteringArgs, InputSpecifierArgs}, output::{OutputArgs, ReportOutputFormat}, rules::RuleSpecifierArgs, + view, }, global::RAM_GB, }, @@ -202,6 +207,11 @@ pub struct ScanArgs { /// Disable rule-level `ignore_if_contains` filtering for pattern requirements #[arg(global = true, long = "no-ignore-if-contains", default_value_t = false)] pub no_ignore_if_contains: bool, + + #[arg(skip)] + pub view_report_port: u16, + #[arg(skip)] + pub view_report_address: String, } /// Confidence levels for findings @@ -232,6 +242,24 @@ pub struct ScanCommandArgs { #[arg(global = true, long = "view-report", default_value_t = false)] pub view_report: bool, + /// Port for the report viewer when using --view-report (default 7890) + #[arg( + global = true, + long = "view-report-port", + default_value_t = view::DEFAULT_PORT, + value_name = "PORT" + )] + pub view_report_port: u16, + + /// Bind address for the report viewer when using --view-report (default 127.0.0.1). Use 0.0.0.0 to allow access from Docker or other hosts. + #[arg( + global = true, + long = "view-report-address", + default_value = view::DEFAULT_ADDRESS, + value_name = "ADDRESS" + )] + pub view_report_address: String, + #[command(subcommand)] pub provider: Option, } @@ -253,11 +281,34 @@ pub enum ListRepositoriesCommand { } impl ScanCommandArgs { + fn infer_positional_git_urls(&mut self) { + let mut inferred_git_urls = Vec::new(); + let mut retained_paths = Vec::new(); + + for path in self.scan_args.input_specifier_args.path_inputs.drain(..) { + if path.as_path() == Path::new("-") || path.exists() { + retained_paths.push(path); + continue; + } + + if let Some(git_url) = parse_git_url_target(&path) { + inferred_git_urls.push(git_url); + } else { + retained_paths.push(path); + } + } + + self.scan_args.input_specifier_args.path_inputs = retained_paths; + self.scan_args.input_specifier_args.git_url.extend(inferred_git_urls); + } + /// Convert CLI arguments into a scan or repository-listing operation. pub fn into_operation(mut self) -> anyhow::Result { let mut used_provider_subcommand = false; self.scan_args.view_report = self.view_report; + self.scan_args.view_report_port = self.view_report_port; + self.scan_args.view_report_address = self.view_report_address.clone(); if let Some(provider) = self.provider.take() { used_provider_subcommand = true; @@ -466,9 +517,12 @@ impl ScanCommandArgs { } } + let used_legacy_git_url_flag = !self.scan_args.input_specifier_args.git_url.is_empty(); + self.infer_positional_git_urls(); + if !self.scan_args.input_specifier_args.has_any_input() { bail!( - "Specify a path, --git-url, or use a provider subcommand such as 'kingfisher scan github'" + "Specify a path or Git URL (for example: 'kingfisher scan github.com/org/repo'), or use a provider subcommand such as 'kingfisher scan github'" ); } @@ -483,7 +537,7 @@ impl ScanCommandArgs { } if !used_provider_subcommand { - self.scan_args.input_specifier_args.emit_deprecated_warnings(); + self.scan_args.input_specifier_args.emit_deprecated_warnings(used_legacy_git_url_flag); } if self.scan_args.manage_baseline { @@ -503,6 +557,44 @@ impl ScanCommandArgs { } } +fn parse_git_url_target(path: &Path) -> Option { + let raw = path.to_str()?.trim(); + if raw.is_empty() || raw == "-" || raw.contains('\\') { + return None; + } + + if let Ok(url) = GitUrl::from_str(raw) { + return Some(url); + } + + if raw.contains("://") + || raw.starts_with('/') + || raw.starts_with("./") + || raw.starts_with("../") + || raw.starts_with('~') + { + return None; + } + + let (host, suffix) = raw.split_once('/')?; + if host.is_empty() || suffix.is_empty() { + return None; + } + + let path_segments = suffix.split('/').filter(|segment| !segment.is_empty()).count(); + if path_segments < 2 { + return None; + } + + let host_looks_valid = + host.contains('.') || host == "localhost" || host.parse::().is_ok(); + if !host_looks_valid { + return None; + } + + GitUrl::from_str(&format!("https://{raw}")).ok() +} + #[derive(Subcommand, Debug, Clone)] pub enum ScanInputCommand { /// Scan local files, directories, or Git repositories @@ -552,7 +644,7 @@ pub struct FilesystemScanArgs { #[arg(value_name = "PATH", value_hint = ValueHint::AnyPath)] pub paths: Vec, - /// Git repository URLs to clone and scan + /// Deprecated: git repository URLs to clone and scan. Prefer positional targets. #[arg(long = "git-url", value_hint = ValueHint::Url)] pub git_url: Vec, } diff --git a/src/cli/commands/view.rs b/src/cli/commands/view.rs index dd1040b..49c8b50 100644 --- a/src/cli/commands/view.rs +++ b/src/cli/commands/view.rs @@ -22,6 +22,9 @@ pub const DEFAULT_PORT: u16 = 7890; // Embedded viewer assets - force rebuild static VIEWER_ASSETS: Dir<'_> = include_dir!("$CARGO_MANIFEST_DIR/docs/access-map-viewer"); +/// Default bind address for the report viewer (localhost only for security). +pub const DEFAULT_ADDRESS: &str = "127.0.0.1"; + /// View a Kingfisher access-map report locally. #[derive(clap::Args, Debug)] pub struct ViewArgs { @@ -33,6 +36,10 @@ pub struct ViewArgs { #[arg(long, default_value_t = DEFAULT_PORT)] pub port: u16, + /// Bind address for the report viewer (default 127.0.0.1). Use 0.0.0.0 to allow access from Docker or other hosts. + #[arg(long, default_value = DEFAULT_ADDRESS, value_name = "ADDRESS")] + pub address: String, + #[arg(skip)] pub open_browser: bool, @@ -45,8 +52,10 @@ struct AppState { report: Option>, } -pub fn ensure_port_available(port: u16) -> Result<()> { - StdTcpListener::bind(("127.0.0.1", port)).map_err(|err| match err.kind() { +pub fn ensure_port_available(port: u16, address: &str) -> Result<()> { + let addr: std::net::IpAddr = + address.parse().context("Invalid bind address for report viewer")?; + StdTcpListener::bind((addr, port)).map_err(|err| match err.kind() { std::io::ErrorKind::AddrInUse => anyhow!( "Port {} is already in use. Re-run with --port to choose a different port.", port @@ -81,14 +90,15 @@ pub async fn run(args: ViewArgs) -> Result<()> { None }; - let listener = - TcpListener::bind(("127.0.0.1", args.port)).await.map_err(|err| match err.kind() { - std::io::ErrorKind::AddrInUse => anyhow!( - "Port {} is already in use. Re-run with --port to choose a different port.", - args.port - ), - _ => err.into(), - })?; + let addr: std::net::IpAddr = + args.address.parse().context("Invalid bind address for report viewer")?; + let listener = TcpListener::bind((addr, args.port)).await.map_err(|err| match err.kind() { + std::io::ErrorKind::AddrInUse => anyhow!( + "Port {} is already in use. Re-run with --port to choose a different port.", + args.port + ), + _ => err.into(), + })?; let address: SocketAddr = listener.local_addr().context("Failed to read local listener address")?; diff --git a/src/direct_validate.rs b/src/direct_validate.rs index 9c00929..0183965 100644 --- a/src/direct_validate.rs +++ b/src/direct_validate.rs @@ -964,6 +964,8 @@ pub(crate) fn create_minimal_scan_args() -> crate::cli::commands::scan::ScanArgs turbo: false, no_inline_ignore: false, no_ignore_if_contains: false, + view_report_port: 7890, + view_report_address: "127.0.0.1".to_string(), validation_timeout: 10, validation_retries: 1, validation_rps: None, diff --git a/src/main.rs b/src/main.rs index 22146c0..4057c2a 100644 --- a/src/main.rs +++ b/src/main.rs @@ -239,7 +239,10 @@ async fn async_main(args: CommandLineArgs) -> Result<()> { Command::Scan(scan_command) => match scan_command.into_operation()? { ScanOperation::Scan(mut scan_args) => { if scan_args.view_report { - view::ensure_port_available(view::DEFAULT_PORT)?; + view::ensure_port_available( + scan_args.view_report_port, + &scan_args.view_report_address, + )?; } let view_scan_started_at = chrono::Local::now(); let view_scan_start_time = Instant::now(); @@ -320,7 +323,8 @@ async fn async_main(args: CommandLineArgs) -> Result<()> { let report_bytes = serde_json::to_vec_pretty(&envelope)?; let view_args = view::ViewArgs { report: None, - port: view::DEFAULT_PORT, + port: scan_args.view_report_port, + address: scan_args.view_report_address.clone(), open_browser: true, report_bytes: Some(report_bytes), }; @@ -580,6 +584,8 @@ fn create_default_scan_args() -> cli::commands::scan::ScanArgs { turbo: false, no_inline_ignore: false, no_ignore_if_contains: false, + view_report_port: view::DEFAULT_PORT, + view_report_address: view::DEFAULT_ADDRESS.to_string(), validation_timeout: 10, validation_retries: 1, validation_rps: None, diff --git a/src/reporter.rs b/src/reporter.rs index e5b6366..becb37c 100644 --- a/src/reporter.rs +++ b/src/reporter.rs @@ -1792,6 +1792,8 @@ mod tests { skip_aws_account_file: None, no_inline_ignore: false, no_ignore_if_contains: false, + view_report_port: 7890, + view_report_address: "127.0.0.1".to_string(), validation_timeout: 10, validation_retries: 1, validation_rps: None, diff --git a/src/reporter/json_format.rs b/src/reporter/json_format.rs index 62e636a..39a8c98 100644 --- a/src/reporter/json_format.rs +++ b/src/reporter/json_format.rs @@ -196,6 +196,8 @@ mod tests { turbo: false, no_inline_ignore: false, no_ignore_if_contains: false, + view_report_port: 7890, + view_report_address: "127.0.0.1".to_string(), validation_timeout: 10, validation_retries: 1, validation_rps: None, diff --git a/tests/cli_git_clone_flags.rs b/tests/cli_git_clone_flags.rs index 3b67883..c18b9f4 100644 --- a/tests/cli_git_clone_flags.rs +++ b/tests/cli_git_clone_flags.rs @@ -12,7 +12,6 @@ fn parse_git_clone_dir_and_keep_clones() -> anyhow::Result<()> { let args = CommandLineArgs::try_parse_from([ "kingfisher", "scan", - "--git-url", "https://github.com/octocat/Hello-World.git", "--git-clone-dir", dir.path().to_str().unwrap(), @@ -41,8 +40,7 @@ fn keep_clones_defaults_to_false() -> anyhow::Result<()> { let args = CommandLineArgs::try_parse_from([ "kingfisher", "scan", - "--git-url", - "https://github.com/octocat/Hello-World.git", + "github.com/octocat/Hello-World", "--no-update-check", ])?; @@ -62,6 +60,70 @@ fn keep_clones_defaults_to_false() -> anyhow::Result<()> { Ok(()) } +#[test] +fn deprecated_git_url_flag_still_parses() -> anyhow::Result<()> { + let args = CommandLineArgs::try_parse_from([ + "kingfisher", + "scan", + "--git-url", + "https://github.com/octocat/Hello-World.git", + "--no-update-check", + ])?; + + let command = match args.command { + Command::Scan(scan_args) => scan_args, + other => panic!("unexpected command parsed: {:?}", other), + }; + + let scan_args = match command.into_operation()? { + ScanOperation::Scan(scan_args) => scan_args, + op => panic!("expected scan operation, got {:?}", op), + }; + + assert_eq!(scan_args.input_specifier_args.git_url.len(), 1); + assert_eq!( + scan_args.input_specifier_args.git_url[0].as_str(), + "https://github.com/octocat/Hello-World.git" + ); + assert!(scan_args.input_specifier_args.path_inputs.is_empty()); + + Ok(()) +} + +#[test] +fn positional_git_url_examples_parse() -> anyhow::Result<()> { + let examples = [ + ("github.com/kubernetes/kubernetes", "https://github.com/kubernetes/kubernetes"), + ("https://github.com/org/repo", "https://github.com/org/repo"), + ("gitlab.com/gitlab-org/gitlab", "https://gitlab.com/gitlab-org/gitlab"), + ( + "https://gitlab.com/namespace/project.git", + "https://gitlab.com/namespace/project.git", + ), + ]; + + for (input, expected) in examples { + let args = + CommandLineArgs::try_parse_from(["kingfisher", "scan", input, "--no-update-check"])?; + + let command = match args.command { + Command::Scan(scan_args) => scan_args, + other => panic!("unexpected command parsed: {:?}", other), + }; + + let scan_args = match command.into_operation()? { + ScanOperation::Scan(scan_args) => scan_args, + op => panic!("expected scan operation, got {:?}", op), + }; + + assert_eq!(scan_args.input_specifier_args.git_url.len(), 1); + assert_eq!(scan_args.input_specifier_args.git_url[0].as_str(), expected); + assert!(scan_args.input_specifier_args.path_inputs.is_empty()); + } + + Ok(()) +} + #[test] fn turbo_mode_applies_speed_first_defaults() -> anyhow::Result<()> { let args = CommandLineArgs::try_parse_from([ diff --git a/tests/int_allowlist.rs b/tests/int_allowlist.rs index 75bd34b..df964b6 100644 --- a/tests/int_allowlist.rs +++ b/tests/int_allowlist.rs @@ -158,6 +158,8 @@ fn run_skiplist(skip_regex: Vec, skip_skipword: Vec) -> Result Result<()> { extra_ignore_comments: Vec::new(), no_inline_ignore: false, no_ignore_if_contains: false, + view_report_port: 7890, + view_report_address: "127.0.0.1".to_string(), validation_retries: 1, validation_rps: None, validation_rps_rule: Vec::new(), diff --git a/tests/int_dedup.rs b/tests/int_dedup.rs index 8dcd771..6c0d0c8 100644 --- a/tests/int_dedup.rs +++ b/tests/int_dedup.rs @@ -178,6 +178,8 @@ rules: extra_ignore_comments: Vec::new(), no_inline_ignore: false, no_ignore_if_contains: false, + view_report_port: 7890, + view_report_address: "127.0.0.1".to_string(), validation_retries: 1, validation_rps: None, validation_rps_rule: Vec::new(), diff --git a/tests/int_github.rs b/tests/int_github.rs index 55299fc..de6793e 100644 --- a/tests/int_github.rs +++ b/tests/int_github.rs @@ -165,6 +165,8 @@ fn test_github_remote_scan() -> Result<()> { extra_ignore_comments: Vec::new(), no_inline_ignore: false, no_ignore_if_contains: false, + view_report_port: 7890, + view_report_address: "127.0.0.1".to_string(), validation_retries: 1, validation_rps: None, validation_rps_rule: Vec::new(), diff --git a/tests/int_gitlab.rs b/tests/int_gitlab.rs index f560e0c..dd6f1f1 100644 --- a/tests/int_gitlab.rs +++ b/tests/int_gitlab.rs @@ -163,6 +163,8 @@ fn test_gitlab_remote_scan() -> Result<()> { turbo: false, no_inline_ignore: false, no_ignore_if_contains: false, + view_report_port: 7890, + view_report_address: "127.0.0.1".to_string(), validation_retries: 1, validation_rps: None, validation_rps_rule: Vec::new(), @@ -331,6 +333,8 @@ fn test_gitlab_remote_scan_no_history() -> Result<()> { no_inline_ignore: false, no_ignore_if_contains: false, view_report: false, + view_report_port: 7890, + view_report_address: "127.0.0.1".to_string(), validation_retries: 1, validation_rps: None, validation_rps_rule: Vec::new(), diff --git a/tests/int_redact.rs b/tests/int_redact.rs index 08e494e..858d1b1 100644 --- a/tests/int_redact.rs +++ b/tests/int_redact.rs @@ -141,6 +141,8 @@ async fn test_redact_hashes_finding_values() -> Result<()> { extra_ignore_comments: Vec::new(), no_inline_ignore: false, no_ignore_if_contains: false, + view_report_port: 7890, + view_report_address: "127.0.0.1".to_string(), validation_retries: 1, validation_rps: None, validation_rps_rule: Vec::new(), diff --git a/tests/int_slack.rs b/tests/int_slack.rs index bd65ff9..1d4540d 100644 --- a/tests/int_slack.rs +++ b/tests/int_slack.rs @@ -146,6 +146,8 @@ impl TestContext { turbo: false, no_inline_ignore: false, no_ignore_if_contains: false, + view_report_port: 7890, + view_report_address: "127.0.0.1".to_string(), validation_retries: 1, validation_rps: None, validation_rps_rule: Vec::new(), @@ -300,6 +302,8 @@ async fn test_scan_slack_messages() -> Result<()> { no_inline_ignore: false, no_ignore_if_contains: false, view_report: false, + view_report_port: 7890, + view_report_address: "127.0.0.1".to_string(), validation_retries: 1, validation_rps: None, validation_rps_rule: Vec::new(), diff --git a/tests/int_validation_cache.rs b/tests/int_validation_cache.rs index 0339a72..d6cb97c 100644 --- a/tests/int_validation_cache.rs +++ b/tests/int_validation_cache.rs @@ -221,6 +221,8 @@ async fn test_validation_cache_and_depvars() -> Result<()> { extra_ignore_comments: Vec::new(), no_inline_ignore: false, no_ignore_if_contains: false, + view_report_port: 7890, + view_report_address: "127.0.0.1".to_string(), validation_retries: 1, validation_rps: None, validation_rps_rule: Vec::new(), diff --git a/tests/int_vulnerable_files.rs b/tests/int_vulnerable_files.rs index b4b9fbe..59f43ab 100644 --- a/tests/int_vulnerable_files.rs +++ b/tests/int_vulnerable_files.rs @@ -164,6 +164,8 @@ impl TestContext { extra_ignore_comments: Vec::new(), no_inline_ignore: false, no_ignore_if_contains: false, + view_report_port: 7890, + view_report_address: "127.0.0.1".to_string(), validation_retries: 1, validation_rps: None, validation_rps_rule: Vec::new(), @@ -305,6 +307,8 @@ impl TestContext { turbo: false, no_inline_ignore: false, no_ignore_if_contains: false, + view_report_port: 7890, + view_report_address: "127.0.0.1".to_string(), validation_retries: 1, validation_rps: None, validation_rps_rule: Vec::new(), diff --git a/tests/smoke_github_homebrew.rs b/tests/smoke_github_homebrew.rs index 65b5527..462eb93 100644 --- a/tests/smoke_github_homebrew.rs +++ b/tests/smoke_github_homebrew.rs @@ -4,7 +4,7 @@ use predicates::str::contains; #[test] fn scan_homebrew_github_no_findings() -> anyhow::Result<()> { Command::new(assert_cmd::cargo::cargo_bin!("kingfisher")) - .args(["scan", "--git-url", "https://github.com/homebrew/.github", "--no-update-check"]) + .args(["scan", "https://github.com/homebrew/.github", "--no-update-check"]) .assert() .success() .stdout(contains("|Findings....................: 0"))