Guidance for coding agents working in this repository.
## Project Overview
Kingfisher is an open-source secret scanner and live secret validator written in Rust by MongoDB. It detects, validates, and helps remediate leaked API keys, tokens, and credentials across code repositories, git history, and integrated platforms.
Key capabilities:
- Secret detection with 500+ built-in rules (YAML-based, SIMD-accelerated via Hyperscan/vectorscan)
- Live credential validation against provider APIs
- Rust formatting is defined in `rustfmt.toml` (`max_width = 100`, 4-space indentation, Unix newlines, reordered imports).
- Keep edits minimal and scoped; preserve existing conventions in touched files.
- Prefer clear, maintainable fixes over broad refactors unless requested.
## Architecture Notes
- Rules are YAML-driven and loaded from `crates/kingfisher-rules/data/rules/`.
- Allocator feature flags are in root `Cargo.toml`:
-`use-mimalloc` (default)
-`use-jemalloc`
-`system-alloc`
- Validation modules live in `crates/kingfisher-scanner/src/validation/`; optional validation feature sets are defined in `crates/kingfisher-scanner/Cargo.toml` (e.g., `validation-aws`, `validation-gcp`, `validation-database`, `validation-all`).
## Validation and Revocation Policy
- Default rule: define validation logic in rule YAML (`validation:` block), not Rust code.
- Code-based validation in `crates/kingfisher-scanner/src/validation/` is an exception path for cases that cannot be expressed reliably in YAML alone (for example AWS, GCP, Coinbase, MongoDB, and similar complex/provider-specific flows).
- Add a validator (rare exception path): implement in `crates/kingfisher-scanner/src/validation/` and wire feature flags/dependencies in `crates/kingfisher-scanner/Cargo.toml` only when YAML validation cannot express the required logic.
3. Include `examples` that must match. These can be tested with `cargo test check_rules` or `kingfisher rules check --rules-path crates/kingfisher-rules/data/rules/slack.yml --load-builtins=false --no-update-check`
-`pattern_requirements` (e.g., `min_digits`, `min_uppercase`, `min_lowercase`, `min_special_chars`, `ignore_if_contains`) when format constraints are known.
-`pattern_requirements.checksum` when provider formats include check digits/signatures.
5. Add `validation` only when a reliable provider/API check exists.
6. Put validation in YAML by default; only use Rust validator logic for rare, justified exceptions.
7. Add `revocation` when the provider API supports safe revocation and the flow is well understood.
8. If a rule needs context from another match (for example ID + secret pair), use `depends_on_rule` and consider `visible: false` on the helper rule.
10. Confidence for rules should be set at `confidence: medium`
11. The `pattern` field must contain a valid Hyperscan/Vectorscan regular expression. Lookahead and lookbehind assertions aren’t supported. Because inefficient or overly broad regex can degrade performance, patterns should be as specific as possible and written to minimize false positives.
1.**Writing `pattern`**: Start with `(?x)` (free-spacing). Use one unnamed capture `( ... )` around the secret—it becomes `{{ TOKEN }}`. Use `\b` word boundaries and `(?: ... )` for non-capturing structure. For flexible context between keywords and token, use `(?:.|[\n\r]){0,N}?`. Hyperscan doesn't support `(?=...)`; use `pattern_requirements` (e.g. `min_digits`) instead.
- Prefer `kingfisher scan --format toon` when invoking Kingfisher from an LLM or agent workflow; keep `pretty` for interactive human CLI use unless the task explicitly calls for a different format.