forked from mirrors/kingfisher
fixing bugs
This commit is contained in:
parent
7d4719c7c5
commit
7fc01e5aca
9 changed files with 549 additions and 192 deletions
213
AGENTS.md
213
AGENTS.md
|
|
@ -2,151 +2,110 @@
|
||||||
|
|
||||||
Guidance for coding agents working in this repository.
|
Guidance for coding agents working in this repository.
|
||||||
|
|
||||||
## Project Overview
|
|
||||||
|
|
||||||
Kingfisher is an open-source secret scanner and live secret validator written in Rust by MongoDB. It detects, validates, and helps remediate leaked API keys, tokens, and credentials across code repositories, git history, and integrated platforms.
|
|
||||||
|
|
||||||
Key capabilities:
|
|
||||||
- Secret detection with 950 built-in rules (826 standalone detectors + 124 dependent rules; 485 standalone detectors include live validation as of 2026-05-04)
|
|
||||||
- Live credential validation against provider APIs
|
|
||||||
- Direct secret revocation from CLI
|
|
||||||
- Blast radius mapping (AWS, GCP, Azure, GitHub, GitLab, Slack)
|
|
||||||
- Output formats: TOON, JSON, SARIF, interactive HTML
|
|
||||||
- Platform integrations: GitHub, GitLab, Azure Repos, Bitbucket, Gitea, Hugging Face, S3, GCS, Docker, Jira, Confluence, Slack, Microsoft Teams, Postman
|
|
||||||
|
|
||||||
## Scope
|
## Scope
|
||||||
|
|
||||||
- Applies to the entire repository rooted at this file.
|
- Applies to the entire repository rooted at this file.
|
||||||
- If a deeper `AGENTS.md` exists in a subdirectory, that file takes precedence for its subtree.
|
- If a deeper `AGENTS.md` exists, that file takes precedence for its subtree.
|
||||||
|
|
||||||
## Repository Structure
|
## Project
|
||||||
- `src/`: main binary source
|
|
||||||
- `src/cli/commands/`: CLI command implementations
|
|
||||||
- `src/matcher/`: pattern matching engine
|
|
||||||
- `src/scanner/`: core scanning logic
|
|
||||||
- `src/parser/`: language-aware context verification (lightweight lexers, `tl` for HTML, `cssparser` for CSS)
|
|
||||||
- `src/reporter/`: TOON/JSON/SARIF/HTML report generation
|
|
||||||
- `src/access_map/`: access mapping analysis
|
|
||||||
- `crates/kingfisher-core/`: shared types and core logic
|
|
||||||
- `crates/kingfisher-rules/`: rule loading and rule data
|
|
||||||
- `crates/kingfisher-rules/data/rules/`: YAML detection rules
|
|
||||||
- `crates/kingfisher-scanner/`: embeddable high-level scanning API
|
|
||||||
- `crates/kingfisher-scanner/src/validation/`: shared typed and raw credential validators
|
|
||||||
- `tests/`: integration/e2e tests
|
|
||||||
- `testdata/`: test fixtures
|
|
||||||
- `docs/`: user and developer docs
|
|
||||||
- `docs/viewer/`: static hosted/local report viewer assets
|
|
||||||
- `docs-site/`: MkDocs documentation sources, overrides, and generated site output
|
|
||||||
- `vendor/vectorscan-rs/`: vendored vectorscan bindings
|
|
||||||
|
|
||||||
## Toolchain and Environment
|
Kingfisher is a Rust secret scanner, live credential validator, revocation helper, and access-map tool. It scans repositories, git history, local files, archives, cloud storage, source-host artifacts, Docker images, and collaboration-platform exports.
|
||||||
- Shell assumptions in build scripts: `bash` with `set -eu -o pipefail`
|
|
||||||
- Workspace minimum Rust version: `1.94` (`Cargo.toml`)
|
|
||||||
- `make check-rust` enforces `>= 1.94.1` for build targets
|
|
||||||
- Windows Makefile targets (`windows-x64`, `windows-arm64`) expect an MSYS2 environment with `pacman` available.
|
|
||||||
- Prefer `rg` and `rg --files` for fast code/file search.
|
|
||||||
|
|
||||||
## Quick Commands
|
## Key Paths
|
||||||
- Development build: `cargo build`
|
|
||||||
|
- `src/`: main CLI binary and application code.
|
||||||
|
- `src/cli/commands/`: CLI command definitions and wiring.
|
||||||
|
- `src/scanner/`: scan orchestration, input enumeration, repository/artifact fetching, validation phase.
|
||||||
|
- `src/matcher/`: pattern matching, captures, filtering, deduplication.
|
||||||
|
- `src/reporter/`: TOON, JSON, JSONL, SARIF, BSON, HTML, and pretty output.
|
||||||
|
- `src/access_map/`: blast-radius and permission mapping.
|
||||||
|
- `crates/kingfisher-core/`: shared core types.
|
||||||
|
- `crates/kingfisher-rules/`: rule schema, rule loading, and bundled rule data.
|
||||||
|
- `crates/kingfisher-rules/data/rules/`: YAML detection rules.
|
||||||
|
- `crates/kingfisher-scanner/`: embeddable scanning API and shared validators.
|
||||||
|
- `tests/` and `testdata/`: integration tests and fixtures.
|
||||||
|
- `docs/`, `docs/viewer/`, `docs-site/`: docs, report viewer assets, and generated MkDocs site.
|
||||||
|
- `vendor/vectorscan-rs/`: vendored Vectorscan bindings.
|
||||||
|
|
||||||
|
## Toolchain
|
||||||
|
|
||||||
|
- Workspace minimum Rust version is `1.94` in `Cargo.toml`; `make check-rust` enforces `>= 1.94.1` for build targets.
|
||||||
|
- Rust formatting is defined by `rustfmt.toml` (`max_width = 100`, 4 spaces, Unix newlines, reordered imports).
|
||||||
|
- Build scripts assume `bash` with `set -eu -o pipefail`.
|
||||||
|
- Windows Makefile targets expect MSYS2 with `pacman`.
|
||||||
|
|
||||||
|
## Common Commands
|
||||||
|
|
||||||
|
- Build: `cargo build`
|
||||||
- Release build: `cargo build --release`
|
- Release build: `cargo build --release`
|
||||||
- Tests (preferred wrapper): `make tests`
|
- Preferred test wrapper: `make tests`
|
||||||
- Tests (direct): `cargo test --workspace --all-targets`
|
- Direct tests: `cargo test --workspace --all-targets`
|
||||||
- Nextest (if installed): `cargo nextest run --workspace --all-targets`
|
- Nextest: `cargo nextest run --workspace --all-targets`
|
||||||
- Format: `cargo fmt --all`
|
- Format: `cargo fmt --all`
|
||||||
- Lint: `cargo clippy --workspace --all-targets -- -D warnings`
|
- Lint: `cargo clippy --workspace --all-targets -- -D warnings`
|
||||||
- Clean: `make clean`
|
- Clean: `make clean`
|
||||||
|
|
||||||
## Build Targets (Makefile)
|
## Workflow Expectations
|
||||||
- Host convenience:
|
|
||||||
- `make linux`
|
|
||||||
- `make darwin`
|
|
||||||
- `make windows` (Windows host; builds `windows-x64` and `windows-arm64`)
|
|
||||||
- Explicit platform archives:
|
|
||||||
- `make linux-x64`
|
|
||||||
- `make linux-arm64`
|
|
||||||
- `make darwin-x64`
|
|
||||||
- `make darwin-arm64`
|
|
||||||
- `make windows-x64` (Windows host only; MSYS2/MinGW flow)
|
|
||||||
- `make windows-arm64` (Windows host only; MSYS2 clangarm64 flow)
|
|
||||||
- Ubuntu bare-metal (Zig/cargo-zigbuild flow):
|
|
||||||
- `make ubuntu-x64`
|
|
||||||
- `make ubuntu-arm64`
|
|
||||||
|
|
||||||
## Code Style
|
- Keep edits minimal, targeted, and consistent with touched code.
|
||||||
- Rust formatting is defined in `rustfmt.toml` (`max_width = 100`, 4-space indentation, Unix newlines, reordered imports).
|
- Do not revert user-authored or unrelated in-progress changes.
|
||||||
- Keep edits minimal and scoped; preserve existing conventions in touched files.
|
- Prefer clear fixes over broad refactors unless requested.
|
||||||
- Prefer clear, maintainable fixes over broad refactors unless requested.
|
- Run the narrowest relevant tests first; run broader checks when practical.
|
||||||
|
- If a validation/build command cannot be run, state exactly what was skipped and why.
|
||||||
|
- Prefer `kingfisher scan --format toon` for agent/LLM workflows; use `pretty` only when human-interactive output is explicitly desired.
|
||||||
|
- After markdown/doc changes, verify local documentation links when practical.
|
||||||
|
- After `docs-site/` source changes, rebuild with `docs-site/.venv/bin/mkdocs build -f docs-site/mkdocs.yml` when practical so generated output stays in sync.
|
||||||
|
|
||||||
## Architecture Notes
|
## Architecture Notes
|
||||||
- Rules are YAML-driven and loaded from `crates/kingfisher-rules/data/rules/`.
|
|
||||||
- Allocator feature flags are in root `Cargo.toml`:
|
|
||||||
- `use-mimalloc` (default)
|
|
||||||
- `use-jemalloc`
|
|
||||||
- `system-alloc`
|
|
||||||
- Validation modules live in `crates/kingfisher-scanner/src/validation/`; optional validation feature sets are defined in `crates/kingfisher-scanner/Cargo.toml` (e.g., `validation-raw`, `validation-aws`, `validation-gcp`, `validation-database`, `validation-all`).
|
|
||||||
|
|
||||||
## Validation and Revocation Policy
|
- Detection rules are YAML-driven and loaded from `crates/kingfisher-rules/data/rules/`.
|
||||||
- Default rule: define validation logic in rule YAML (`validation:` block), especially `Http` or `Grpc`, not Rust code.
|
- Allocator feature flags live in root `Cargo.toml`: `use-mimalloc` default, `use-jemalloc`, and `system-alloc`.
|
||||||
- Typed validators are first-class schema variants (`AWS`, `AzureStorage`, `Coinbase`, `GCP`, `MongoDB`, `MySQL`, `Postgres`, `Jdbc`, `JWT`) for stable, reusable validation families.
|
- Optional validator feature sets live in `crates/kingfisher-scanner/Cargo.toml`.
|
||||||
- Raw validators use `validation: { type: Raw, content: <name> }` and are the ad-hoc exception path for provider-specific or protocol-specific validation that cannot be expressed reliably in YAML alone. Implement them in `crates/kingfisher-scanner/src/validation/raw.rs`.
|
- Validation modules live primarily in `crates/kingfisher-scanner/src/validation/` and `src/validation.rs`.
|
||||||
- Treat Rust validation additions as rare; prefer extending YAML-based validation first.
|
|
||||||
- If a Rust exception path is required, prefer adding a raw validator before introducing a new typed validator. Add a new typed validator only when it represents a reusable schema-level validation family.
|
|
||||||
- Do not convert existing typed validators to `Raw` just for consistency.
|
|
||||||
- For rules that include validation, add a `revocation:` section whenever the third-party API safely supports revocation.
|
|
||||||
|
|
||||||
## Common Development Tasks
|
## Validation And Revocation Policy
|
||||||
- Add a detection rule: follow the workflow below and validate with relevant tests.
|
|
||||||
- Add a CLI command: implement under `src/cli/commands/` and register in the CLI command wiring.
|
- Default to YAML validation (`validation:`), especially `Http` or `Grpc`; do not add Rust validation unless YAML cannot express the flow reliably.
|
||||||
- Add a validator (rare exception path): implement it in `crates/kingfisher-scanner/src/validation/`, prefer `raw.rs` for one-off provider flows, and wire the narrowest feature/dependencies in `crates/kingfisher-scanner/Cargo.toml` only when YAML validation cannot express the required logic.
|
- Typed validators are schema-level reusable families: `AWS`, `AzureStorage`, `Coinbase`, `GCP`, `MongoDB`, `MySQL`, `Postgres`, `Jdbc`, and `JWT`.
|
||||||
- Update docs-site rule counts: use `uv run '/Users/mickg/src/kingfisher/data/default/rule_cleanup/count_rules.py'` and update `docs-site/overrides/` plus `docs-site/mkdocs.yml` to match the reported totals before rebuilding the docs site.
|
- Raw validators use `validation: { type: Raw, content: <name> }` and are implemented in `crates/kingfisher-scanner/src/validation/raw.rs` for provider-specific exceptions.
|
||||||
|
- If Rust validation is unavoidable, prefer a raw validator before introducing a new typed validator.
|
||||||
|
- Do not convert existing typed validators to `Raw` for consistency alone.
|
||||||
|
- For rules with validation, add `revocation:` when the third-party API safely supports revocation.
|
||||||
|
|
||||||
|
## Rule Authoring
|
||||||
|
|
||||||
## Rule Authoring Workflow
|
|
||||||
Use this when creating or updating rules in `crates/kingfisher-rules/data/rules/`.
|
Use this when creating or updating rules in `crates/kingfisher-rules/data/rules/`.
|
||||||
|
|
||||||
1. Pick a nearby reference rule file in the same provider family and copy its structure.
|
1. Read `docs/RULES.md` before non-trivial rule/schema work.
|
||||||
2. Define a stable rule id (`id`, prefixed with `kingfisher.`) and detection regex (`pattern`) under `rules:`.
|
2. Start from a nearby provider-family rule and preserve the existing YAML style.
|
||||||
3. Include `examples` that must match. These can be tested with `cargo test check_rules` or `kingfisher rules check --rules-path crates/kingfisher-rules/data/rules/slack.yml --load-builtins=false --no-update-check`
|
3. Use a stable `kingfisher.<provider>...` rule id and set `confidence: medium`.
|
||||||
4. Set guardrails:
|
4. Write a valid Hyperscan/Vectorscan regex. Lookahead and lookbehind are not supported.
|
||||||
- `min_entropy` for high-entropy tokens.
|
5. Start `pattern` with `(?x)`, use one unnamed capture around the secret for `{{ TOKEN }}`, and use non-capturing groups for structure.
|
||||||
- `pattern_requirements` (e.g., `min_digits`, `min_uppercase`, `min_lowercase`, `min_special_chars`, `ignore_if_contains`) when format constraints are known.
|
6. Prefer specific token formats and provider context; avoid broad generic patterns.
|
||||||
- `pattern_requirements.checksum` when provider formats include check digits/signatures.
|
7. Use `min_entropy`, `pattern_requirements`, `ignore_if_contains`, and checksum requirements when format constraints are known.
|
||||||
5. Add `validation` only when a reliable provider/API check exists.
|
8. Include `examples` that must match.
|
||||||
6. Put validation in YAML by default. If YAML cannot express the check, use an existing typed validator or `type: Raw` exception path; add new Rust validator logic only for rare, justified cases.
|
9. Use `depends_on_rule` for multi-part credentials; consider `visible: false` for helper rules.
|
||||||
7. Add `revocation` when the provider API supports safe revocation and the flow is well understood.
|
10. Add YAML validation and revocation only when reliable and safe.
|
||||||
8. If a rule needs context from another match (for example ID + secret pair), use `depends_on_rule` and consider `visible: false` on the helper rule.
|
|
||||||
9. Verify locally:
|
|
||||||
- `cargo test -p kingfisher-rules`
|
|
||||||
- `cargo test --workspace --all-targets`
|
|
||||||
- `kingfisher scan ./testdata --rule <rule-family-or-id> --rule-stats`
|
|
||||||
- If validation is implemented: `kingfisher validate --rule <rule-id> <token-or-secret>`
|
|
||||||
10. Confidence for rules should be set at `confidence: medium`
|
|
||||||
11. The `pattern` field must contain a valid Hyperscan/Vectorscan regular expression. Lookahead and lookbehind assertions aren’t supported. Because inefficient or overly broad regex can degrade performance, patterns should be as specific as possible and written to minimize false positives.
|
|
||||||
1. **Writing `pattern`**: Start with `(?x)` (free-spacing). Use one unnamed capture `( ... )` around the secret—it becomes `{{ TOKEN }}`. Use `\b` word boundaries and `(?: ... )` for non-capturing structure. For flexible context between keywords and token, use `(?:.|[\n\r]){0,N}?`. Hyperscan doesn't support `(?=...)`; use `pattern_requirements` (e.g. `min_digits`) instead.
|
|
||||||
|
|
||||||
## Rule Docs (Read Before Editing)
|
## Rule Verification
|
||||||
- `docs/RULES.md`:
|
|
||||||
- `Rule Schema` for required/optional fields
|
|
||||||
- `Character Requirements` for `pattern_requirements`, `ignore_if_contains`, and `checksum`
|
|
||||||
- `Templating with Liquid` for request signing/transforms
|
|
||||||
- `Multi-Step Revocation` for complex revoke flows
|
|
||||||
- `Writing Custom Rules` and examples for best practices
|
|
||||||
|
|
||||||
## Workflow Expectations for Agents
|
- Rule crate: `cargo test -p kingfisher-rules`
|
||||||
- Do not revert user-authored or unrelated in-progress changes.
|
- Rule syntax/check path: `kingfisher rules check --rules-path crates/kingfisher-rules/data/rules/<file>.yml --load-builtins=false --no-update-check`
|
||||||
- Prefer targeted patches.
|
- Scan fixture/corpus: `kingfisher scan ./testdata --rule <rule-family-or-id> --rule-stats`
|
||||||
- After changes, run the narrowest relevant tests first, then broader checks when practical.
|
- Validator check: `kingfisher validate --rule <rule-id> <token-or-secret>`
|
||||||
- If validation commands cannot be run, report exactly what was skipped and why.
|
- Broad regression when practical: `cargo test --workspace --all-targets`
|
||||||
- Prefer `kingfisher scan --format toon` when invoking Kingfisher from an LLM or agent workflow; keep `pretty` for interactive human CLI use unless the task explicitly calls for a different format.
|
|
||||||
- After markdown/doc changes, verify local documentation links when practical.
|
|
||||||
- After `docs-site/` source changes, rebuild with `docs-site/.venv/bin/mkdocs build -f docs-site/mkdocs.yml` when practical so checked-in generated output stays in sync.
|
|
||||||
|
|
||||||
## Documentation Pointers
|
## Common Tasks
|
||||||
- `docs/USAGE.md`
|
|
||||||
- `docs/ADVANCED.md`
|
- Add a detection rule: follow the rule authoring and verification sections above.
|
||||||
- `docs/ARCHITECTURE.md`
|
- Add a CLI command: implement under `src/cli/commands/` and register it in CLI wiring.
|
||||||
- `docs/ACCESS_MAP.md`
|
- Add a validator: prefer YAML first; if Rust is required, use `raw.rs` and the narrowest feature/dependency wiring.
|
||||||
- `docs/DEPLOYMENT.md`
|
- Update docs-site rule counts: run `uv run '/Users/mickg/src/kingfisher/data/default/rule_cleanup/count_rules.py'`, update `docs-site/overrides/` and `docs-site/mkdocs.yml`, then rebuild the docs site when practical.
|
||||||
- `docs/RULES.md`
|
|
||||||
- `docs/INSTALLATION.md`
|
## Docs Pointers
|
||||||
- `docs/INTEGRATIONS.md`
|
|
||||||
- `docs/LIBRARY.md`
|
- Usage: `docs/USAGE.md`, `docs/ADVANCED.md`, `docs/INTEGRATIONS.md`
|
||||||
- `docs/PYPI.md`
|
- Rules: `docs/RULES.md`
|
||||||
|
- Architecture: `docs/ARCHITECTURE.md`, `docs/ACCESS_MAP.md`
|
||||||
|
- Deployment/install: `docs/INSTALLATION.md`, `docs/DEPLOYMENT.md`, `docs/PYPI.md`
|
||||||
|
- Library API: `docs/LIBRARY.md`
|
||||||
|
|
|
||||||
|
|
@ -4,6 +4,9 @@ All notable changes to this project will be documented in this file.
|
||||||
|
|
||||||
## [v1.102.0]
|
## [v1.102.0]
|
||||||
- Security: hardened ASAR and in-memory archive extraction to skip traversal or absolute entry paths before writing to the temp extraction directory.
|
- Security: hardened ASAR and in-memory archive extraction to skip traversal or absolute entry paths before writing to the temp extraction directory.
|
||||||
|
- Security: git clone provider tokens (`KF_GITHUB_TOKEN`, `KF_GITLAB_TOKEN`, `KF_GITEA_TOKEN`, `KF_AZURE_TOKEN`, `KF_HUGGINGFACE_TOKEN`) are now installed as host-scoped, HTTPS-only credential helpers (`credential.https://<host>.helper`) instead of unscoped global ones, so a malicious clone target can no longer capture them via an auth challenge. Trusted hosts derive from each provider's SaaS default plus any configured `--<provider>-api-url`/`--azure-base-url`/`--endpoint`, preserving GitHub Enterprise and other self-hosted flows.
|
||||||
|
- Security: `--output` report files are opened with `O_NOFOLLOW` (with a symlink pre-check on non-Unix) so a symlink planted at the report path inside a scanned repository can no longer redirect the write to truncate or overwrite an arbitrary file.
|
||||||
|
- Security: single-stream gzip/bzip2/xz/zlib decompression is now bounded by a 512 MB decompressed-byte cap, preventing a small compression bomb from exhausting disk during a scan.
|
||||||
- Added 3 detection and validation rules for Cognition Devin API credentials: `kingfisher.devin.1` (legacy personal keys, `apk_user_` prefix), `kingfisher.devin.2` (legacy service keys, `apk_` prefix), and `kingfisher.devin.3` (v3 service-user tokens, `cog_` prefix / RFC 4648 base32). Live validation uses `GET /v1/sessions` for `apk_*` keys and `GET /v3/self` for `cog_` tokens.
|
- Added 3 detection and validation rules for Cognition Devin API credentials: `kingfisher.devin.1` (legacy personal keys, `apk_user_` prefix), `kingfisher.devin.2` (legacy service keys, `apk_` prefix), and `kingfisher.devin.3` (v3 service-user tokens, `cog_` prefix / RFC 4648 base32). Live validation uses `GET /v1/sessions` for `apk_*` keys and `GET /v3/self` for `cog_` tokens.
|
||||||
- Added `kingfisher scan docker --archive <image.tar>` for scanning saved Docker/OCI image archives directly, including OCI-layout `docker save` output and compressed tar archives.
|
- Added `kingfisher scan docker --archive <image.tar>` for scanning saved Docker/OCI image archives directly, including OCI-layout `docker save` output and compressed tar archives.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -736,8 +736,6 @@ rules:
|
||||||
- sample
|
- sample
|
||||||
examples:
|
examples:
|
||||||
- token: "REALVALUE1234"
|
- token: "REALVALUE1234"
|
||||||
negative_examples:
|
|
||||||
- token = "SAMPLETOKEN9999" # dropped by ignore_if_contains
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Example: Custom Special Characters
|
### Example: Custom Special Characters
|
||||||
|
|
|
||||||
|
|
@ -731,8 +731,6 @@ rules:
|
||||||
- sample
|
- sample
|
||||||
examples:
|
examples:
|
||||||
- token: "REALVALUE1234"
|
- token: "REALVALUE1234"
|
||||||
negative_examples:
|
|
||||||
- token = "SAMPLETOKEN9999" # dropped by ignore_if_contains
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Example: Custom Special Characters
|
### Example: Custom Special Characters
|
||||||
|
|
|
||||||
|
|
@ -806,6 +806,13 @@ kingfisher revoke --rule github \
|
||||||
> flags because some deployments front-load auth (an SSO portal for repo
|
> flags because some deployments front-load auth (an SSO portal for repo
|
||||||
> access vs. a direct API endpoint for token validation).
|
> access vs. a direct API endpoint for token validation).
|
||||||
|
|
||||||
|
> **Token scoping.** `KF_GITHUB_TOKEN` is installed as a host-scoped,
|
||||||
|
> HTTPS-only git credential helper, so it is sent only to `github.com` and
|
||||||
|
> to the GHE host(s) you name in `--github-api-url` / `--endpoint github=…`.
|
||||||
|
> An untrusted clone target — or a plaintext `http://` remote — never
|
||||||
|
> receives the token. The same scoping applies to the GitLab, Gitea, Azure,
|
||||||
|
> and Hugging Face clone tokens and their corresponding API-URL flags.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## GitLab
|
## GitLab
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,6 @@
|
||||||
use std::{
|
use std::{
|
||||||
fs,
|
fs,
|
||||||
io::{BufReader, Read},
|
io::{BufReader, Read, Write},
|
||||||
path::{Component, Path, PathBuf},
|
path::{Component, Path, PathBuf},
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
@ -465,17 +465,94 @@ fn safe_create_for_write(path: &Path) -> Result<fs::File> {
|
||||||
Ok(fs::File::create(path)?)
|
Ok(fs::File::create(path)?)
|
||||||
}
|
}
|
||||||
|
|
||||||
fn stream_to_file<R: Read>(mut decoder: R, out_path: &Path) -> Result<CompressedContent> {
|
/// Hard cap on the number of bytes a single-stream decompressor
|
||||||
let mut out_file = safe_create_for_write(out_path)?;
|
/// (gzip/bzip2/xz/zlib) will write to disk for one input. Mirrors the per-entry
|
||||||
std::io::copy(&mut decoder, &mut out_file)?;
|
/// decompressed cap enforced by the ZIP extractors so a small "compression
|
||||||
|
/// bomb" cannot expand without limit and exhaust the scanner's temporary
|
||||||
|
/// filesystem. Output past the cap is dropped and a truncation warning logged.
|
||||||
|
// nosemgrep: this is the defensive cap — do not flag for missing-limit rules.
|
||||||
|
pub const MAX_SINGLE_STREAM_DECOMPRESSED_BYTES: u64 = 512 * 1024 * 1024;
|
||||||
|
|
||||||
|
/// `Write` adaptor that drops everything past `remaining` bytes.
|
||||||
|
///
|
||||||
|
/// Bounds single-stream decompressors so a high-ratio compression bomb can't
|
||||||
|
/// fill the disk. Writes past the cap are reported as fully consumed so the
|
||||||
|
/// underlying decoder runs to completion instead of erroring or looping; the
|
||||||
|
/// truncation is surfaced via [`CappedWriter::truncated`], matching the
|
||||||
|
/// truncate-and-warn behavior of the ZIP entry extractor.
|
||||||
|
struct CappedWriter<W: Write> {
|
||||||
|
inner: W,
|
||||||
|
remaining: u64,
|
||||||
|
truncated: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<W: Write> CappedWriter<W> {
|
||||||
|
fn new(inner: W, cap: u64) -> Self {
|
||||||
|
Self { inner, remaining: cap, truncated: false }
|
||||||
|
}
|
||||||
|
|
||||||
|
fn truncated(&self) -> bool {
|
||||||
|
self.truncated
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<W: Write> Write for CappedWriter<W> {
|
||||||
|
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
|
||||||
|
let allowed = (buf.len() as u64).min(self.remaining) as usize;
|
||||||
|
if allowed > 0 {
|
||||||
|
self.inner.write_all(&buf[..allowed])?;
|
||||||
|
self.remaining -= allowed as u64;
|
||||||
|
}
|
||||||
|
if allowed < buf.len() {
|
||||||
|
self.truncated = true;
|
||||||
|
}
|
||||||
|
// Report the whole buffer as consumed even when the tail was dropped so
|
||||||
|
// `io::copy`/`xz_decompress` don't treat the cap as a write error.
|
||||||
|
Ok(buf.len())
|
||||||
|
}
|
||||||
|
|
||||||
|
fn flush(&mut self) -> std::io::Result<()> {
|
||||||
|
self.inner.flush()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn stream_to_file<R: Read>(decoder: R, out_path: &Path) -> Result<CompressedContent> {
|
||||||
|
stream_to_file_capped(decoder, out_path, MAX_SINGLE_STREAM_DECOMPRESSED_BYTES)
|
||||||
|
}
|
||||||
|
|
||||||
|
fn stream_to_file_capped<R: Read>(
|
||||||
|
mut decoder: R,
|
||||||
|
out_path: &Path,
|
||||||
|
cap: u64,
|
||||||
|
) -> Result<CompressedContent> {
|
||||||
|
let out_file = safe_create_for_write(out_path)?;
|
||||||
|
let mut capped = CappedWriter::new(out_file, cap);
|
||||||
|
std::io::copy(&mut decoder, &mut capped)?;
|
||||||
|
if capped.truncated() {
|
||||||
|
tracing::warn!(
|
||||||
|
"decompressed stream written to {} exceeded {cap} byte cap; truncating",
|
||||||
|
out_path.display()
|
||||||
|
);
|
||||||
|
}
|
||||||
Ok(CompressedContent::RawFile(out_path.to_owned()))
|
Ok(CompressedContent::RawFile(out_path.to_owned()))
|
||||||
}
|
}
|
||||||
|
|
||||||
fn stream_xz_to_file(path: &Path, out_path: &Path) -> Result<CompressedContent> {
|
fn stream_xz_to_file(path: &Path, out_path: &Path) -> Result<CompressedContent> {
|
||||||
|
stream_xz_to_file_capped(path, out_path, MAX_SINGLE_STREAM_DECOMPRESSED_BYTES)
|
||||||
|
}
|
||||||
|
|
||||||
|
fn stream_xz_to_file_capped(path: &Path, out_path: &Path, cap: u64) -> Result<CompressedContent> {
|
||||||
let input = safe_open_for_read(path)?;
|
let input = safe_open_for_read(path)?;
|
||||||
let mut reader = BufReader::new(input);
|
let mut reader = BufReader::new(input);
|
||||||
let mut out_file = safe_create_for_write(out_path)?;
|
let out_file = safe_create_for_write(out_path)?;
|
||||||
xz_decompress(&mut reader, &mut out_file)?;
|
let mut capped = CappedWriter::new(out_file, cap);
|
||||||
|
xz_decompress(&mut reader, &mut capped)?;
|
||||||
|
if capped.truncated() {
|
||||||
|
tracing::warn!(
|
||||||
|
"decompressed xz stream written to {} exceeded {cap} byte cap; truncating",
|
||||||
|
out_path.display()
|
||||||
|
);
|
||||||
|
}
|
||||||
Ok(CompressedContent::RawFile(out_path.to_owned()))
|
Ok(CompressedContent::RawFile(out_path.to_owned()))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -1127,4 +1204,49 @@ mod tests {
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn capped_writer_drops_bytes_past_cap() {
|
||||||
|
use std::io::Write;
|
||||||
|
|
||||||
|
use super::CappedWriter;
|
||||||
|
|
||||||
|
let mut sink = Vec::new();
|
||||||
|
let mut capped = CappedWriter::new(&mut sink, 40);
|
||||||
|
// Report full consumption even though the tail is dropped.
|
||||||
|
assert_eq!(capped.write(&[0u8; 100]).unwrap(), 100);
|
||||||
|
assert!(capped.truncated());
|
||||||
|
capped.flush().unwrap();
|
||||||
|
assert_eq!(sink.len(), 40);
|
||||||
|
|
||||||
|
let mut sink = Vec::new();
|
||||||
|
let mut capped = CappedWriter::new(&mut sink, 40);
|
||||||
|
assert_eq!(capped.write(&[0u8; 10]).unwrap(), 10);
|
||||||
|
assert!(!capped.truncated());
|
||||||
|
assert_eq!(sink.len(), 10);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn stream_to_file_capped_truncates_oversized_stream() -> anyhow::Result<()> {
|
||||||
|
use std::io::Cursor;
|
||||||
|
|
||||||
|
use super::{CompressedContent, stream_to_file_capped};
|
||||||
|
|
||||||
|
let dir = tempdir()?;
|
||||||
|
let out_path = dir.path().join("out.bin");
|
||||||
|
|
||||||
|
// A "decompressed" stream far larger than the cap: only `cap` bytes
|
||||||
|
// should ever reach disk, mirroring a small compression bomb.
|
||||||
|
let payload = vec![b'A'; 8192];
|
||||||
|
let content = stream_to_file_capped(Cursor::new(payload), &out_path, 128)?;
|
||||||
|
|
||||||
|
match content {
|
||||||
|
CompressedContent::RawFile(p) => {
|
||||||
|
let written = std::fs::metadata(&p)?.len();
|
||||||
|
assert_eq!(written, 128, "output must be capped at the byte budget");
|
||||||
|
}
|
||||||
|
other => panic!("expected RawFile, got {other:?}"),
|
||||||
|
}
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -8,7 +8,7 @@ use url::Url;
|
||||||
|
|
||||||
use crate::{bitbucket::is_bitbucket_access_token, git_url::GitUrl};
|
use crate::{bitbucket::is_bitbucket_access_token, git_url::GitUrl};
|
||||||
|
|
||||||
const BITBUCKET_CREDENTIAL_HELPER: &str = r#"credential.helper=!_bbcreds() {
|
const BITBUCKET_CREDENTIAL_HELPER: &str = r#"!_bbcreds() {
|
||||||
if [ -n "$KF_BITBUCKET_OAUTH_TOKEN" ]; then
|
if [ -n "$KF_BITBUCKET_OAUTH_TOKEN" ]; then
|
||||||
echo username="x-token-auth";
|
echo username="x-token-auth";
|
||||||
echo password="$KF_BITBUCKET_OAUTH_TOKEN";
|
echo password="$KF_BITBUCKET_OAUTH_TOKEN";
|
||||||
|
|
@ -29,7 +29,7 @@ const BITBUCKET_CREDENTIAL_HELPER: &str = r#"credential.helper=!_bbcreds() {
|
||||||
fi
|
fi
|
||||||
}; _bbcreds"#;
|
}; _bbcreds"#;
|
||||||
|
|
||||||
const GITEA_CREDENTIAL_HELPER: &str = r#"credential.helper=!_gteacreds() {
|
const GITEA_CREDENTIAL_HELPER: &str = r#"!_gteacreds() {
|
||||||
if [ -n "$KF_GITEA_TOKEN" ]; then
|
if [ -n "$KF_GITEA_TOKEN" ]; then
|
||||||
user="${KF_GITEA_USERNAME:-gitea}";
|
user="${KF_GITEA_USERNAME:-gitea}";
|
||||||
echo username="$user";
|
echo username="$user";
|
||||||
|
|
@ -37,7 +37,7 @@ const GITEA_CREDENTIAL_HELPER: &str = r#"credential.helper=!_gteacreds() {
|
||||||
fi
|
fi
|
||||||
}; _gteacreds"#;
|
}; _gteacreds"#;
|
||||||
|
|
||||||
const AZURE_CREDENTIAL_HELPER: &str = r#"credential.helper=!_azcreds() {
|
const AZURE_CREDENTIAL_HELPER: &str = r#"!_azcreds() {
|
||||||
token="${KF_AZURE_TOKEN:-${KF_AZURE_PAT:-}}";
|
token="${KF_AZURE_TOKEN:-${KF_AZURE_PAT:-}}";
|
||||||
if [ -n "$token" ]; then
|
if [ -n "$token" ]; then
|
||||||
user="${KF_AZURE_USERNAME:-pat}";
|
user="${KF_AZURE_USERNAME:-pat}";
|
||||||
|
|
@ -46,7 +46,7 @@ const AZURE_CREDENTIAL_HELPER: &str = r#"credential.helper=!_azcreds() {
|
||||||
fi
|
fi
|
||||||
}; _azcreds"#;
|
}; _azcreds"#;
|
||||||
|
|
||||||
const HUGGINGFACE_CREDENTIAL_HELPER: &str = r#"credential.helper=!_hfcreds() {
|
const HUGGINGFACE_CREDENTIAL_HELPER: &str = r#"!_hfcreds() {
|
||||||
token="$KF_HUGGINGFACE_TOKEN";
|
token="$KF_HUGGINGFACE_TOKEN";
|
||||||
if [ -n "$token" ]; then
|
if [ -n "$token" ]; then
|
||||||
user="${KF_HUGGINGFACE_USERNAME:-hf_user}";
|
user="${KF_HUGGINGFACE_USERNAME:-hf_user}";
|
||||||
|
|
@ -55,6 +55,52 @@ const HUGGINGFACE_CREDENTIAL_HELPER: &str = r#"credential.helper=!_hfcreds() {
|
||||||
fi
|
fi
|
||||||
}; _hfcreds"#;
|
}; _hfcreds"#;
|
||||||
|
|
||||||
|
const GITHUB_CREDENTIAL_HELPER: &str =
|
||||||
|
r#"!_ghcreds() { echo username="kingfisher"; echo password="$KF_GITHUB_TOKEN"; }; _ghcreds"#;
|
||||||
|
|
||||||
|
const GITLAB_CREDENTIAL_HELPER: &str =
|
||||||
|
r#"!_glcreds() { echo username="oauth2"; echo password="$KF_GITLAB_TOKEN"; }; _glcreds"#;
|
||||||
|
|
||||||
|
/// HTTPS hosts that each provider's credential helper is allowed to target.
|
||||||
|
///
|
||||||
|
/// Credential helpers echo provider tokens to whatever remote `git` is talking
|
||||||
|
/// to. Installing them as unscoped `credential.helper` entries leaks those
|
||||||
|
/// tokens to any HTTP(S) remote that issues an auth challenge — including an
|
||||||
|
/// attacker-controlled scan target. Each helper is therefore bound to a
|
||||||
|
/// `credential.https://<host>.helper` key so `git` only invokes it for the
|
||||||
|
/// provider's own host(s).
|
||||||
|
#[derive(Debug, Clone, Default)]
|
||||||
|
pub struct ProviderHosts {
|
||||||
|
pub github: Vec<String>,
|
||||||
|
pub gitlab: Vec<String>,
|
||||||
|
pub gitea: Vec<String>,
|
||||||
|
pub bitbucket: Vec<String>,
|
||||||
|
pub azure: Vec<String>,
|
||||||
|
pub huggingface: Vec<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl ProviderHosts {
|
||||||
|
/// Well-known public SaaS clone hosts for each supported provider.
|
||||||
|
pub fn saas_defaults() -> Self {
|
||||||
|
Self {
|
||||||
|
github: vec!["github.com".to_string()],
|
||||||
|
gitlab: vec!["gitlab.com".to_string()],
|
||||||
|
gitea: vec!["gitea.com".to_string()],
|
||||||
|
bitbucket: vec!["bitbucket.org".to_string()],
|
||||||
|
azure: vec!["dev.azure.com".to_string()],
|
||||||
|
huggingface: vec!["huggingface.co".to_string()],
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Add a trusted host to `list`, normalizing case and de-duplicating.
|
||||||
|
pub fn add(list: &mut Vec<String>, host: &str) {
|
||||||
|
let host = host.trim().to_ascii_lowercase();
|
||||||
|
if !host.is_empty() && !list.iter().any(|existing| existing == &host) {
|
||||||
|
list.push(host);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/// Represents errors that can occur when interacting with the `git` CLI.
|
/// Represents errors that can occur when interacting with the `git` CLI.
|
||||||
#[derive(Debug, thiserror::Error)]
|
#[derive(Debug, thiserror::Error)]
|
||||||
pub enum GitError {
|
pub enum GitError {
|
||||||
|
|
@ -103,10 +149,19 @@ pub struct Git {
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Git {
|
impl Git {
|
||||||
/// Create a new `Git` instance.
|
/// Create a new `Git` instance that trusts only the public SaaS hosts.
|
||||||
///
|
///
|
||||||
/// * `ignore_certs`: If `true`, disables SSL certificate verification for `git` operations.
|
/// * `ignore_certs`: If `true`, disables SSL certificate verification for `git` operations.
|
||||||
pub fn new(ignore_certs: bool) -> Self {
|
pub fn new(ignore_certs: bool) -> Self {
|
||||||
|
Self::with_provider_hosts(ignore_certs, &ProviderHosts::saas_defaults())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Create a new `Git` instance whose credential helpers are scoped to the
|
||||||
|
/// hosts in `provider_hosts`. Each provider token is offered only to that
|
||||||
|
/// provider's configured HTTPS host(s), never to an arbitrary scan target.
|
||||||
|
///
|
||||||
|
/// * `ignore_certs`: If `true`, disables SSL certificate verification for `git` operations.
|
||||||
|
pub fn with_provider_hosts(ignore_certs: bool, provider_hosts: &ProviderHosts) -> Self {
|
||||||
let mut credentials = Vec::new();
|
let mut credentials = Vec::new();
|
||||||
|
|
||||||
fn normalized_env_var(name: &str) -> Option<String> {
|
fn normalized_env_var(name: &str) -> Option<String> {
|
||||||
|
|
@ -188,42 +243,34 @@ impl Git {
|
||||||
credentials.push(r#"credential.helper="#.into());
|
credentials.push(r#"credential.helper="#.into());
|
||||||
}
|
}
|
||||||
|
|
||||||
// Inject GitHub token helper
|
// Install each provider's helper scoped to that provider's HTTPS
|
||||||
|
// host(s). `git` consults a `credential.https://<host>.helper` entry
|
||||||
|
// only for remotes matching that host, so a provider token is never
|
||||||
|
// echoed to an unrelated (possibly attacker-controlled) clone target.
|
||||||
|
let mut push_scoped = |hosts: &[String], snippet: &str| {
|
||||||
|
for host in hosts {
|
||||||
|
credentials.push("-c".into());
|
||||||
|
credentials.push(format!("credential.https://{host}.helper={snippet}"));
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
if has_github_token {
|
if has_github_token {
|
||||||
credentials.push("-c".into());
|
push_scoped(&provider_hosts.github, GITHUB_CREDENTIAL_HELPER);
|
||||||
credentials.push(
|
|
||||||
r#"credential.helper=!_ghcreds() { echo username="kingfisher"; echo password="$KF_GITHUB_TOKEN"; }; _ghcreds"#.into(),
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// Inject GitLab token helper
|
|
||||||
if has_gitlab_token {
|
if has_gitlab_token {
|
||||||
credentials.push("-c".into());
|
push_scoped(&provider_hosts.gitlab, GITLAB_CREDENTIAL_HELPER);
|
||||||
credentials.push(
|
|
||||||
r#"credential.helper=!_glcreds() { echo username="oauth2"; echo password="$KF_GITLAB_TOKEN"; }; _glcreds"#.into(),
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// Inject Gitea token helper
|
|
||||||
if has_gitea_token {
|
if has_gitea_token {
|
||||||
credentials.push("-c".into());
|
push_scoped(&provider_hosts.gitea, GITEA_CREDENTIAL_HELPER);
|
||||||
credentials.push(GITEA_CREDENTIAL_HELPER.into());
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// Inject Bitbucket credential helper for OAuth tokens or basic auth.
|
|
||||||
if has_bitbucket_credentials {
|
if has_bitbucket_credentials {
|
||||||
credentials.push("-c".into());
|
push_scoped(&provider_hosts.bitbucket, BITBUCKET_CREDENTIAL_HELPER);
|
||||||
credentials.push(BITBUCKET_CREDENTIAL_HELPER.into());
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if has_azure_token {
|
if has_azure_token {
|
||||||
credentials.push("-c".into());
|
push_scoped(&provider_hosts.azure, AZURE_CREDENTIAL_HELPER);
|
||||||
credentials.push(AZURE_CREDENTIAL_HELPER.into());
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if has_huggingface_token {
|
if has_huggingface_token {
|
||||||
credentials.push("-c".into());
|
push_scoped(&provider_hosts.huggingface, HUGGINGFACE_CREDENTIAL_HELPER);
|
||||||
credentials.push(HUGGINGFACE_CREDENTIAL_HELPER.into());
|
|
||||||
}
|
}
|
||||||
|
|
||||||
Self {
|
Self {
|
||||||
|
|
@ -326,16 +373,20 @@ impl Git {
|
||||||
fn repo_arg_for_clone(&self, repo_url: &GitUrl) -> String {
|
fn repo_arg_for_clone(&self, repo_url: &GitUrl) -> String {
|
||||||
if let Some((username, password)) = &self.bitbucket_basic_auth {
|
if let Some((username, password)) = &self.bitbucket_basic_auth {
|
||||||
if let Ok(mut url) = Url::parse(repo_url.as_str()) {
|
if let Ok(mut url) = Url::parse(repo_url.as_str()) {
|
||||||
if url
|
let is_bitbucket = url
|
||||||
.host_str()
|
.host_str()
|
||||||
.map(|host| host.eq_ignore_ascii_case("bitbucket.org"))
|
.map(|host| host.eq_ignore_ascii_case("bitbucket.org"))
|
||||||
.unwrap_or(false)
|
.unwrap_or(false);
|
||||||
|
// Embed credentials only on HTTPS bitbucket.org remotes. The
|
||||||
|
// scoped credential helper is HTTPS-only for the same reason:
|
||||||
|
// putting a token in a plaintext http:// URL would send it over
|
||||||
|
// the wire in the clear.
|
||||||
|
if url.scheme() == "https"
|
||||||
|
&& is_bitbucket
|
||||||
|
&& url.set_username(username).is_ok()
|
||||||
|
&& url.set_password(Some(password)).is_ok()
|
||||||
{
|
{
|
||||||
if url.set_username(username).is_ok()
|
return url.into();
|
||||||
&& url.set_password(Some(password)).is_ok()
|
|
||||||
{
|
|
||||||
return url.into();
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
@ -408,7 +459,10 @@ mod tests {
|
||||||
temp_env::with_var("KF_BITBUCKET_OAUTH_TOKEN", Some("oauth"), || {
|
temp_env::with_var("KF_BITBUCKET_OAUTH_TOKEN", Some("oauth"), || {
|
||||||
let git = Git::new(false);
|
let git = Git::new(false);
|
||||||
assert_eq!(git.credentials.len(), 4);
|
assert_eq!(git.credentials.len(), 4);
|
||||||
assert!(git.credentials.iter().any(|value| value == BITBUCKET_CREDENTIAL_HELPER));
|
assert!(git.credentials.iter().any(|value| value
|
||||||
|
== &format!(
|
||||||
|
"credential.https://bitbucket.org.helper={BITBUCKET_CREDENTIAL_HELPER}"
|
||||||
|
)));
|
||||||
assert!(git.bitbucket_access_token.is_none());
|
assert!(git.bitbucket_access_token.is_none());
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
@ -423,7 +477,10 @@ mod tests {
|
||||||
|| {
|
|| {
|
||||||
let git = Git::new(false);
|
let git = Git::new(false);
|
||||||
assert_eq!(git.credentials.len(), 4);
|
assert_eq!(git.credentials.len(), 4);
|
||||||
assert!(git.credentials.iter().any(|value| value == BITBUCKET_CREDENTIAL_HELPER));
|
assert!(git.credentials.iter().any(|value| value
|
||||||
|
== &format!(
|
||||||
|
"credential.https://bitbucket.org.helper={BITBUCKET_CREDENTIAL_HELPER}"
|
||||||
|
)));
|
||||||
assert!(git.bitbucket_access_token.is_none());
|
assert!(git.bitbucket_access_token.is_none());
|
||||||
},
|
},
|
||||||
);
|
);
|
||||||
|
|
@ -480,6 +537,21 @@ mod tests {
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_repo_arg_for_clone_skips_plaintext_http_bitbucket() {
|
||||||
|
// A plaintext http:// bitbucket.org remote must NOT receive embedded
|
||||||
|
// credentials — that would leak the token over the wire and bypass the
|
||||||
|
// HTTPS-only credential-helper scoping.
|
||||||
|
let url =
|
||||||
|
GitUrl::try_from(url::Url::parse("http://bitbucket.org/workspace/demo.git").unwrap())
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
temp_env::with_vars(&[("KF_BITBUCKET_OAUTH_TOKEN", Some("token123"))], || {
|
||||||
|
let git = Git::new(false);
|
||||||
|
assert_eq!(git.repo_arg_for_clone(&url), url.as_str());
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_repo_arg_for_clone_leaves_non_bitbucket_urls_untouched() {
|
fn test_repo_arg_for_clone_leaves_non_bitbucket_urls_untouched() {
|
||||||
let url = GitUrl::try_from(
|
let url = GitUrl::try_from(
|
||||||
|
|
@ -505,7 +577,10 @@ mod tests {
|
||||||
temp_env::with_var("KF_BITBUCKET_TOKEN", Some(token), || {
|
temp_env::with_var("KF_BITBUCKET_TOKEN", Some(token), || {
|
||||||
let git = Git::new(false);
|
let git = Git::new(false);
|
||||||
assert_eq!(git.credentials.len(), 4);
|
assert_eq!(git.credentials.len(), 4);
|
||||||
assert!(git.credentials.iter().any(|value| value == BITBUCKET_CREDENTIAL_HELPER));
|
assert!(git.credentials.iter().any(|value| value
|
||||||
|
== &format!(
|
||||||
|
"credential.https://bitbucket.org.helper={BITBUCKET_CREDENTIAL_HELPER}"
|
||||||
|
)));
|
||||||
assert_eq!(git.bitbucket_access_token.as_deref(), Some(token));
|
assert_eq!(git.bitbucket_access_token.as_deref(), Some(token));
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
@ -515,7 +590,10 @@ mod tests {
|
||||||
temp_env::with_var("KF_BITBUCKET_TOKEN", Some("token123"), || {
|
temp_env::with_var("KF_BITBUCKET_TOKEN", Some("token123"), || {
|
||||||
let git = Git::new(false);
|
let git = Git::new(false);
|
||||||
assert_eq!(git.credentials.len(), 4);
|
assert_eq!(git.credentials.len(), 4);
|
||||||
assert!(git.credentials.iter().any(|value| value == BITBUCKET_CREDENTIAL_HELPER));
|
assert!(git.credentials.iter().any(|value| value
|
||||||
|
== &format!(
|
||||||
|
"credential.https://bitbucket.org.helper={BITBUCKET_CREDENTIAL_HELPER}"
|
||||||
|
)));
|
||||||
assert_eq!(git.bitbucket_access_token.as_deref(), None);
|
assert_eq!(git.bitbucket_access_token.as_deref(), None);
|
||||||
assert_eq!(
|
assert_eq!(
|
||||||
git.bitbucket_basic_auth,
|
git.bitbucket_basic_auth,
|
||||||
|
|
@ -542,7 +620,10 @@ mod tests {
|
||||||
],
|
],
|
||||||
);
|
);
|
||||||
assert_eq!(git.credentials.len(), 4);
|
assert_eq!(git.credentials.len(), 4);
|
||||||
assert!(git.credentials.iter().any(|value| value == BITBUCKET_CREDENTIAL_HELPER));
|
assert!(git.credentials.iter().any(|value| value
|
||||||
|
== &format!(
|
||||||
|
"credential.https://bitbucket.org.helper={BITBUCKET_CREDENTIAL_HELPER}"
|
||||||
|
)));
|
||||||
assert_eq!(git.bitbucket_access_token.as_deref(), Some(trimmed_token));
|
assert_eq!(git.bitbucket_access_token.as_deref(), Some(trimmed_token));
|
||||||
},
|
},
|
||||||
);
|
);
|
||||||
|
|
@ -591,4 +672,57 @@ mod tests {
|
||||||
git.create_fresh_clone(&invalid_url, temp_dir.path(), CloneMode::Bare).unwrap_err();
|
git.create_fresh_clone(&invalid_url, temp_dir.path(), CloneMode::Bare).unwrap_err();
|
||||||
assert!(matches!(err, GitError::GitError { .. }));
|
assert!(matches!(err, GitError::GitError { .. }));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn github_helper_is_scoped_to_provider_host_only() {
|
||||||
|
temp_env::with_var("KF_GITHUB_TOKEN", Some("test_token"), || {
|
||||||
|
let git = Git::new(false);
|
||||||
|
|
||||||
|
// The only bare `credential.helper=` entry is the empty reset that
|
||||||
|
// clears inherited helpers. The token-bearing helper must never be
|
||||||
|
// installed unscoped — an unscoped helper is what leaked provider
|
||||||
|
// tokens to any remote that issued an auth challenge.
|
||||||
|
let unscoped: Vec<&String> = git
|
||||||
|
.credentials
|
||||||
|
.iter()
|
||||||
|
.filter(|value| value.starts_with("credential.helper="))
|
||||||
|
.collect();
|
||||||
|
assert_eq!(unscoped, vec![&"credential.helper=".to_string()]);
|
||||||
|
|
||||||
|
// The GitHub helper is bound to https://github.com.
|
||||||
|
assert!(git.credentials.iter().any(|value| value
|
||||||
|
== &format!("credential.https://github.com.helper={GITHUB_CREDENTIAL_HELPER}")));
|
||||||
|
|
||||||
|
// No credential entry targets an unrelated/attacker host.
|
||||||
|
assert!(!git.credentials.iter().any(|value| value.contains("127.0.0.1")));
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn provider_helper_scoped_to_each_configured_host() {
|
||||||
|
let hosts = ProviderHosts {
|
||||||
|
github: vec!["github.com".to_string(), "ghe.corp.example.com".to_string()],
|
||||||
|
..ProviderHosts::default()
|
||||||
|
};
|
||||||
|
temp_env::with_var("KF_GITHUB_TOKEN", Some("test_token"), || {
|
||||||
|
let git = Git::with_provider_hosts(false, &hosts);
|
||||||
|
assert!(git.credentials.iter().any(|value| value
|
||||||
|
== &format!("credential.https://github.com.helper={GITHUB_CREDENTIAL_HELPER}")));
|
||||||
|
assert!(git.credentials.iter().any(|value| value
|
||||||
|
== &format!(
|
||||||
|
"credential.https://ghe.corp.example.com.helper={GITHUB_CREDENTIAL_HELPER}"
|
||||||
|
)));
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn no_helper_installed_for_provider_without_trusted_host() {
|
||||||
|
// An empty host list means the token has nowhere safe to go: no helper
|
||||||
|
// is installed, so the token cannot leak even to its own SaaS host.
|
||||||
|
let hosts = ProviderHosts { github: Vec::new(), ..ProviderHosts::default() };
|
||||||
|
temp_env::with_var("KF_GITHUB_TOKEN", Some("test_token"), || {
|
||||||
|
let git = Git::with_provider_hosts(false, &hosts);
|
||||||
|
assert!(!git.credentials.iter().any(|value| value.contains("_ghcreds")));
|
||||||
|
});
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -20,7 +20,7 @@ use crate::{
|
||||||
global,
|
global,
|
||||||
},
|
},
|
||||||
confluence, findings_store, gcs,
|
confluence, findings_store, gcs,
|
||||||
git_binary::{CloneMode, Git},
|
git_binary::{CloneMode, Git, ProviderHosts},
|
||||||
git_url::GitUrl,
|
git_url::GitUrl,
|
||||||
gitea, github, gitlab, huggingface, jira,
|
gitea, github, gitlab, huggingface, jira,
|
||||||
matcher::{Match, Matcher, MatcherStats},
|
matcher::{Match, Matcher, MatcherStats},
|
||||||
|
|
@ -43,6 +43,70 @@ fn repo_host_contains(repo_url: &GitUrl, needle: &str) -> bool {
|
||||||
.unwrap_or(false)
|
.unwrap_or(false)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Map a configured API/base URL to the host `git clone` actually contacts,
|
||||||
|
/// including a non-default port. The SaaS API subdomains are folded back to
|
||||||
|
/// their public clone hosts; every other host (i.e. self-hosted/enterprise) is
|
||||||
|
/// already the clone host.
|
||||||
|
fn clone_host(url: &Url) -> Option<String> {
|
||||||
|
let host = match url.host_str()?.to_ascii_lowercase().as_str() {
|
||||||
|
"api.github.com" => "github.com".to_string(),
|
||||||
|
"api.bitbucket.org" => "bitbucket.org".to_string(),
|
||||||
|
other => other.to_string(),
|
||||||
|
};
|
||||||
|
Some(match url.port() {
|
||||||
|
Some(port) => format!("{host}:{port}"),
|
||||||
|
None => host,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Parse an `--endpoint` URL (which may omit the scheme) and return its clone host.
|
||||||
|
fn endpoint_clone_host(raw: &str) -> Option<String> {
|
||||||
|
let trimmed = raw.trim();
|
||||||
|
let url = Url::parse(trimmed).or_else(|_| Url::parse(&format!("https://{trimmed}"))).ok()?;
|
||||||
|
clone_host(&url)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Derive the per-provider HTTPS hosts that credential helpers may target.
|
||||||
|
///
|
||||||
|
/// Starts from the public SaaS hosts and adds any host the user explicitly
|
||||||
|
/// configured for a provider via `--<provider>-api-url`, `--azure-base-url`, or
|
||||||
|
/// `--endpoint <provider>=URL`. Tokens are then offered only to these hosts,
|
||||||
|
/// never to an arbitrary (possibly attacker-controlled) scan target.
|
||||||
|
fn provider_hosts(args: &scan::ScanArgs, global_args: &global::GlobalArgs) -> ProviderHosts {
|
||||||
|
let mut hosts = ProviderHosts::saas_defaults();
|
||||||
|
let isa = &args.input_specifier_args;
|
||||||
|
|
||||||
|
for (list, url) in [
|
||||||
|
(&mut hosts.github, &isa.github_api_url),
|
||||||
|
(&mut hosts.gitlab, &isa.gitlab_api_url),
|
||||||
|
(&mut hosts.gitea, &isa.gitea_api_url),
|
||||||
|
(&mut hosts.bitbucket, &isa.bitbucket_api_url),
|
||||||
|
(&mut hosts.azure, &isa.azure_base_url),
|
||||||
|
] {
|
||||||
|
if let Some(host) = clone_host(url) {
|
||||||
|
ProviderHosts::add(list, &host);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// `--endpoint PROVIDER=URL` only supports github/gitlab/gitea for git hosts.
|
||||||
|
for entry in &global_args.endpoint {
|
||||||
|
let Some((provider, url)) = entry.split_once('=') else {
|
||||||
|
continue;
|
||||||
|
};
|
||||||
|
let Some(host) = endpoint_clone_host(url) else {
|
||||||
|
continue;
|
||||||
|
};
|
||||||
|
match provider.trim().to_ascii_lowercase().replace('_', "-").as_str() {
|
||||||
|
"github" => ProviderHosts::add(&mut hosts.github, &host),
|
||||||
|
"gitlab" => ProviderHosts::add(&mut hosts.gitlab, &host),
|
||||||
|
"gitea" => ProviderHosts::add(&mut hosts.gitea, &host),
|
||||||
|
_ => {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
hosts
|
||||||
|
}
|
||||||
|
|
||||||
fn apply_repo_clone_limit(
|
fn apply_repo_clone_limit(
|
||||||
repo_urls: &mut Vec<GitUrl>,
|
repo_urls: &mut Vec<GitUrl>,
|
||||||
limit: Option<usize>,
|
limit: Option<usize>,
|
||||||
|
|
@ -121,6 +185,7 @@ where
|
||||||
// clones, which is exactly the disk-overflow bug we're trying to fix.
|
// clones, which is exactly the disk-overflow bug we're trying to fix.
|
||||||
let (ready_tx, ready_rx) = crossbeam_channel::bounded(std::cmp::max(2, clone_concurrency * 2));
|
let (ready_tx, ready_rx) = crossbeam_channel::bounded(std::cmp::max(2, clone_concurrency * 2));
|
||||||
let ignore_certs = global_args.ignore_certs;
|
let ignore_certs = global_args.ignore_certs;
|
||||||
|
let provider_hosts = provider_hosts(args, global_args);
|
||||||
|
|
||||||
ThreadPoolBuilder::new()
|
ThreadPoolBuilder::new()
|
||||||
.num_threads(clone_concurrency)
|
.num_threads(clone_concurrency)
|
||||||
|
|
@ -132,8 +197,9 @@ where
|
||||||
let datastore = Arc::clone(datastore);
|
let datastore = Arc::clone(datastore);
|
||||||
let repo_url = repo_url.clone();
|
let repo_url = repo_url.clone();
|
||||||
let progress = progress.clone();
|
let progress = progress.clone();
|
||||||
|
let provider_hosts = provider_hosts.clone();
|
||||||
scope.spawn(move |_| {
|
scope.spawn(move |_| {
|
||||||
let git = Git::new(ignore_certs);
|
let git = Git::with_provider_hosts(ignore_certs, &provider_hosts);
|
||||||
let output_dir = {
|
let output_dir = {
|
||||||
let datastore = datastore.lock().unwrap();
|
let datastore = datastore.lock().unwrap();
|
||||||
datastore.clone_destination(&repo_url)
|
datastore.clone_destination(&repo_url)
|
||||||
|
|
|
||||||
74
src/util.rs
74
src/util.rs
|
|
@ -1,6 +1,6 @@
|
||||||
use std::{
|
use std::{
|
||||||
borrow::Cow,
|
borrow::Cow,
|
||||||
fs::File,
|
fs::{File, OpenOptions},
|
||||||
io::{BufReader, BufWriter, stdin, stdout},
|
io::{BufReader, BufWriter, stdin, stdout},
|
||||||
path::Path,
|
path::Path,
|
||||||
sync::atomic::{AtomicBool, Ordering},
|
sync::atomic::{AtomicBool, Ordering},
|
||||||
|
|
@ -121,7 +121,49 @@ pub fn get_writer_for_file_or_stdout<P: AsRef<Path>>(
|
||||||
) -> std::io::Result<Box<dyn std::io::Write>> {
|
) -> std::io::Result<Box<dyn std::io::Write>> {
|
||||||
match path {
|
match path {
|
||||||
None => Ok(Box::new(BufWriter::new(stdout()))),
|
None => Ok(Box::new(BufWriter::new(stdout()))),
|
||||||
Some(p) => Ok(Box::new(BufWriter::new(File::create(p)?))),
|
Some(p) => Ok(Box::new(BufWriter::new(create_no_follow(p.as_ref())?))),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Create (truncating) a file for writing, refusing to follow a symlink at the
|
||||||
|
/// final path component.
|
||||||
|
///
|
||||||
|
/// A scanned repository can contain a symlink at the report output path (e.g.
|
||||||
|
/// `report.json` in the workspace). Plain `File::create` follows it and
|
||||||
|
/// truncates whatever the link targets, letting a malicious repo clobber files
|
||||||
|
/// outside the workspace as the scanner user. `O_NOFOLLOW` makes the open fail
|
||||||
|
/// atomically when the final component is a symlink, closing the TOCTOU window.
|
||||||
|
fn create_no_follow(path: &Path) -> std::io::Result<File> {
|
||||||
|
let mut opts = OpenOptions::new();
|
||||||
|
opts.write(true).create(true).truncate(true);
|
||||||
|
|
||||||
|
#[cfg(unix)]
|
||||||
|
{
|
||||||
|
use std::os::unix::fs::OpenOptionsExt;
|
||||||
|
opts.custom_flags(libc::O_NOFOLLOW);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Without O_NOFOLLOW, fall back to a pre-open symlink check. This is racy
|
||||||
|
// (TOCTOU) but still rejects the common case of a committed symlink sitting
|
||||||
|
// at the report path.
|
||||||
|
#[cfg(not(unix))]
|
||||||
|
if std::fs::symlink_metadata(path).is_ok_and(|meta| meta.file_type().is_symlink()) {
|
||||||
|
return Err(std::io::Error::new(
|
||||||
|
std::io::ErrorKind::AlreadyExists,
|
||||||
|
format!("refusing to write output through symlink: {}", path.display()),
|
||||||
|
));
|
||||||
|
}
|
||||||
|
|
||||||
|
match opts.open(path) {
|
||||||
|
Ok(file) => Ok(file),
|
||||||
|
// O_NOFOLLOW surfaces a symlinked final component as ELOOP; report it as
|
||||||
|
// a clear refusal rather than the opaque OS message.
|
||||||
|
#[cfg(unix)]
|
||||||
|
Err(e) if e.raw_os_error() == Some(libc::ELOOP) => Err(std::io::Error::new(
|
||||||
|
std::io::ErrorKind::AlreadyExists,
|
||||||
|
format!("refusing to write output through symlink: {}", path.display()),
|
||||||
|
)),
|
||||||
|
Err(e) => Err(e),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
/// Returns a buffered reader for a specified file path or stdin if none is
|
/// Returns a buffered reader for a specified file path or stdin if none is
|
||||||
|
|
@ -256,6 +298,34 @@ mod tests {
|
||||||
std::fs::File::open(&path).unwrap().read_to_string(&mut file_content).unwrap();
|
std::fs::File::open(&path).unwrap().read_to_string(&mut file_content).unwrap();
|
||||||
assert_eq!(file_content, "Test content");
|
assert_eq!(file_content, "Test content");
|
||||||
}
|
}
|
||||||
|
#[cfg(unix)]
|
||||||
|
#[test]
|
||||||
|
fn test_get_writer_for_file_refuses_symlink() {
|
||||||
|
use std::io::Write;
|
||||||
|
|
||||||
|
let dir = tempfile::tempdir().unwrap();
|
||||||
|
let target = dir.path().join("target.txt");
|
||||||
|
std::fs::write(&target, b"ORIGINAL_CONTENT").unwrap();
|
||||||
|
|
||||||
|
// Simulate a malicious symlink planted at the report output path.
|
||||||
|
let link = dir.path().join("report.json");
|
||||||
|
std::os::unix::fs::symlink(&target, &link).unwrap();
|
||||||
|
|
||||||
|
let err = get_writer_for_file_or_stdout(Some(&link)).err();
|
||||||
|
assert!(err.is_some(), "writer must refuse a symlinked output path");
|
||||||
|
|
||||||
|
// The symlink target must be left untouched (not truncated).
|
||||||
|
assert_eq!(std::fs::read(&target).unwrap(), b"ORIGINAL_CONTENT");
|
||||||
|
|
||||||
|
// A regular (non-symlink) path still works and truncates as before.
|
||||||
|
let regular = dir.path().join("plain.json");
|
||||||
|
std::fs::write(®ular, b"stale").unwrap();
|
||||||
|
let mut writer = get_writer_for_file_or_stdout(Some(®ular)).unwrap();
|
||||||
|
writer.write_all(b"fresh").unwrap();
|
||||||
|
writer.flush().unwrap();
|
||||||
|
drop(writer);
|
||||||
|
assert_eq!(std::fs::read(®ular).unwrap(), b"fresh");
|
||||||
|
}
|
||||||
#[test]
|
#[test]
|
||||||
fn test_get_reader_for_file_or_stdin_stdin() {
|
fn test_get_reader_for_file_or_stdin_stdin() {
|
||||||
// Test reading from stdin (mocked)
|
// Test reading from stdin (mocked)
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue