A _rule_ in Kingfisher is a YAML document that describes how to detect and (optionally) validate or revoke secrets in your codebase. With custom rules you can:
This document explains how to write custom rules for Kingfisher using a YAML-based rule system. The rules define regular expressions to detect secrets in source code and other textual data, and they can include validation or revocation steps to confirm or invalidate the secret. By using a rules-based system, Kingfisher is highly extensible—new rules can be added or existing ones modified without changing the core code.
response_is_html: true # by default, validation responses containing HTML or considered invalid. Set to `true` if you expect HTML returned from a validation response
response_matcher:
- report_response: true # always include raw payload
1.`Http` and `Grpc`: YAML-native validation flows. Prefer these first.
2. Typed validators: schema-level validation families already modeled in the rule schema, such as `AWS`, `AzureStorage`, `Coinbase`, `GCP`, `MongoDB`, `MySQL`, `Postgres`, `Jdbc`, and `JWT`.
3. Raw validators: provider-specific or protocol-specific exception paths dispatched through `validation: type: Raw`.
Raw validation looks like this:
```yaml
validation:
type: Raw
content: kraken
```
Use `Raw` only when the provider check cannot be expressed reliably with `Http` or `Grpc` and does not justify a new reusable validator family. Raw validator implementations live in `crates/kingfisher-scanner/src/validation/raw.rs`.
Typed validators are safer and more reusable because the validator kind is part of the schema. `Raw` validators are string-dispatched and fail at runtime if the `content` name is unknown. If you need a Rust-backed exception path for one provider, prefer `Raw`; reserve new typed validators for stable validation families that can be reused across rules.
1.**Step 1 (Lookup)**: Query the API to retrieve an internal ID, token identifier, or other metadata
2.**Step 2 (Delete)**: Use the extracted value(s) to perform the actual revocation/deletion
Kingfisher supports up to 2 sequential steps in a revocation workflow. Each step can extract values from its response, making them available as variables in subsequent steps.
### Response Extractors
Values can be extracted from HTTP responses using the following methods:
| Extractor Type | Description | Example |
|----------------|-------------|---------|
| **JsonPath** | Extract from JSON response using JSONPath syntax | `$.data.id`, `$.items[0].token_id` |
| **Regex** | Extract using regex with a capture group | `"token_id":\s*"([^"]+)"` |
| **Header** | Extract an HTTP response header value | `X-Token-ID` |
| **Body** | Use the entire response body as-is | - |
| **StatusCode** | Extract the HTTP status code as a string | - |
### Multi-Step Revocation Schema
```yaml
revocation:
type: HttpMultiStep
content:
steps:
- name: <step_name> # Optional: human-readable step name
request: # Standard HTTP request configuration
method: GET|POST|DELETE|...
url: https://api.example.com/...
headers:
Header-Name: "value"
body: "optional request body"
response_matcher: # Required for final step only
- type: StatusMatch
status: [200]
extract: # Optional: extract variables from response
VARIABLE_NAME: # Variable name (uppercase recommended)
type: JsonPath|Regex|Header|Body|StatusCode
path: "$.path.to.value" # For JsonPath
pattern: "regex pattern" # For Regex (use first capture group)
name: "header-name" # For Header
- name: <next_step> # Subsequent steps can use extracted variables
JSONPath supports nested objects and array indexing:
```yaml
extract:
# Extract from nested object
USER_ID:
type: JsonPath
path: "$.data.user.id"
# Extract from array (first element)
FIRST_TOKEN_ID:
type: JsonPath
path: "$.tokens[0].id"
# Extract from nested array
SESSION_ID:
type: JsonPath
path: "$.data.sessions[0].session_id"
```
### Example 4: Single-Step Migration Path
Existing single-step revocations remain unchanged and continue to work:
```yaml
# This continues to work as before
revocation:
type: Http
content:
request:
method: DELETE
url: https://api.service.com/tokens/revoke
headers:
Authorization: "Bearer {{ TOKEN }}"
response_matcher:
- type: StatusMatch
status: [204]
```
### When to Use Multi-Step Revocation
Use multi-step revocation when:
- **The API requires looking up an ID first**: Some services don't accept the token directly for revocation
- **You need metadata from the token**: The revocation endpoint requires additional information only available via a separate API call
- **The service uses indirect revocation**: The token must be associated with another resource (session, key, credential) that needs to be identified first
Do NOT use multi-step revocation when:
- **The API accepts the token directly**: Use the simpler single-step `Http` revocation
- **You need more than 2 steps**: Kingfisher supports a maximum of 2 steps
- **The service provides a native revocation method**: Use `AWS` or `GCP` types when applicable
Kingfisher leverages the Liquid template engine for dynamic parts of HTTP request bodies, headers, query parameters, and multipart payloads. The engine supports both built-in and custom filters to manipulate the captured secret (TOKEN) or other named captures ({{ NAME }}).
- **Capture Injection**: The unnamed capture from your regex becomes {{ TOKEN }}. Named captures are made available as uppercase variables (e.g. {{ RDMVAL }}).
- **Filter Pipeline**: You can chain filters using the pipe (|) syntax:
```liquid
{{ TOKEN | b64enc | url_encode }}
```
Arguments: Some filters accept parameters, provided after a colon:
| `crc32` | – | Computes the CRC32 checksum of the input and returns a decimal value. | `{{ TOKEN \| crc32 }}` |
| `crc32_dec` | `digits` (integer, optional) | Computes the CRC32 checksum and returns the last `digits` decimal characters (zero-padded). Defaults to the full value when omitted. | `{{ TOKEN \| crc32_dec: 6 }}` |
| `crc32_hex` | `digits` (integer, optional) | Computes the CRC32 checksum and returns the last `digits` hexadecimal characters (zero-padded). Defaults to the full value when omitted. | `{{ TOKEN \| crc32_hex: 8 }}` |
| `crc32_le_b64` | `len` (integer, optional) | Computes the CRC32 checksum, encodes the little-endian bytes using Base64, and optionally truncates to the first `len` characters. | `{{ TOKEN \| crc32_le_b64: 6 }}` |
| `hmac_sha256_b64key` | `key` (string, base64-encoded) | Decodes the key from Base64 to raw bytes, then computes HMAC-SHA256. Returns Base64. Use for Azure SAS and other protocols where the signing key is base64-encoded. | `{{ to_sign \| hmac_sha256_b64key: TOKEN }}` |
| `newline` | – | Returns a single newline character (`\n`). Useful inside YAML block scalars where a literal newline would break indentation. | `{{ "" \| newline }}` |
| `base36` | `width` (integer, optional) | Encodes the input number as Base36, left-padding with zeros as needed. | `{{ TOKEN \| crc32 \| base36: 6 }}` |
**Stable Request Values:** HTTP and gRPC validation requests also expose stable per-request template variables. Use these when the same generated value must appear in multiple places within one request. Currently:
In your YAML rule definition, you add a `depends_on_rule` section. Here you specify:
- **rule_id:** The identifier of the rule whose output is required.
- **variable:** The name (typically in uppercase) that will be used to reference the captured value from the dependency rule.
- **Chaining Captures:**
When Kingfisher scans a file, it processes rules in a specific order. If a rule has a dependency, the engine first checks whether the dependent rule has already matched on the same input (or blob). If it did, the captured value (for example, an access key ID) is made available to the dependent rule.
- **Using the Captured Value:**
This captured value can then be used during the validation phase. For instance, if you have a rule for an Algolia Admin API Key that depends on an Algolia Application ID (captured as `APPID`), the validation logic can incorporate the `APPID` value to confirm that the secret matches the expected pattern or format for that specific account.
`depends_on_rule` is for capture chaining and validation context. It does not automatically hide the main secret finding, and it does not by itself mean the rule must be parser-verified before it can be reported from raw text.
- **Primary secret rule:** leave the secret rule visible unless it is also only a helper; helper rules should usually be the ones marked `visible: false`
Consider this example rule for an Algolia Application ID and Admin Key combination. To validate that this is an active credential, both must be matched:
* Algolia Application ID Rule (kingfisher.algolia.2):
This rule scans for an Algolia Application ID—a 10-character alphanumeric string. It is marked with visible: false so that even if it matches, the finding is not directly reported. Its primary role is to provide a supporting value for other rules rather than to be flagged as a secret by itself.
* Algolia Admin API Key Rule (kingfisher.algolia.1):
This rule detects the Algolia Admin API Key using a regex pattern. It includes a depends_on_rule property that specifies a dependency on the Algolia Application ID rule.
* The dependency declares that the rule requires the output of the Algolia Application ID rule, and the captured value is assigned to the variable APPID.
* In the validation section, this captured `APPID` is used dynamically in the HTTP request (for example, in the header `X-Algolia-Application-Id` and in the URL).
The dependency mechanism (depends_on_rule) ensures that:
* Non-secret data (like an application ID) is captured without cluttering the scan report (thanks to visible: false).
* The secret (the API key) is validated in context, with the necessary supporting information automatically injected.
* Rules remain modular and extensible; you can update the dependent rule or its pattern independently, and the change will automatically be reflected where the value is used.
## The `visible: false` Property
The `visible: false` property tells Kingfisher to hide the finding from the final scan report. This is particularly useful for rules that capture data not meant to be reported as a secret, but rather to serve as supporting context for another rule.
For example, a rule might match a username, an email address, an AWS Access Key ID, or an Application ID. While these pieces of information are captured during scanning, they are not secrets on their own. Instead, they are used by other rules—via the `depends_on_rule` mechanism—to validate an associated secret. By marking such rules as `visible: false`, you prevent these non-secret findings from cluttering your report, yet their values remain available for dependent rules.
`visible: false` helps keep the scan output focused on actual secrets while still capturing important contextual data needed for comprehensive validation.
The `pattern_requirements` field allows you to specify data type requirements for matched secrets. This is particularly useful when:
- Your regex pattern must be permissive (due to Hyperscan limitations)
- You want to enforce password complexity requirements
- You need to filter out low-quality matches that lack certain character types
Kingfisher's regex engine (Hyperscan) does not support lookahead assertions like `(?=.*\d)` to require specific character types. Instead, use the `pattern_requirements` field to filter matches post-detection.
### Available Requirements
```yaml
pattern_requirements:
min_digits: 1 # Require at least 1 digit (0-9)
min_uppercase: 1 # Require at least 1 uppercase letter (A-Z)
min_lowercase: 1 # Require at least 1 lowercase letter (a-z)
min_special_chars: 1 # Require at least 1 special character
special_chars: "!@#$%^&*" # Optional: define which characters are "special"
`ignore_if_contains` performs a case-insensitive substring check. If any entry (after trimming whitespace) appears within the match, the match is discarded. This is helpful for dropping known dummy tokens such as "test" or "demo" that otherwise satisfy the regex.
The optional `checksum` block renders Liquid templates against the match to determine whether the captured checksum matches your expectation. Both templates gain access to `{{ MATCH }}`, `{{ FULL_MATCH }}`, and every named capture in two forms: the original capture name and its uppercase alias (e.g. `{{ body }}` and `{{ BODY }}`). Use helper filters like `suffix`, `crc32`, and `base62` to mirror provider-specific checksum pipelines. If a required capture is missing or the rendered values differ, Kingfisher skips the finding—logging the reason, including checksum lengths, at the `DEBUG` level. Set `skip_if_missing` to `true` to treat absent captures as legacy matches.
When any of these filters remove a match it is logged at the `DEBUG` level so you can see exactly why the skip occurred. If you need to keep every match even when one of these substrings appears, pass `--no-ignore-if-contains` to `kingfisher scan`. The flag disables this post-processing step without changing the rule definitions.
### Are `requires_capture` and `skip_if_missing` equivalent?
`requires_capture`
- Optional field that names a specific regex capture that must be present before the checksum templates are evaluated.
- In the engine, Kingfisher checks whether that capture exists in the match context. If it’s missing, the behavior falls back to whatever `skip_if_missing` dictates (fail or treat as a legacy match).
`skip_if_missing`
- Boolean switch that controls what happens when Kingfisher can’t render the checksum—because there’s no match context or a required capture is absent.
-`true`: silently skip (pass) the match so legacy, non-checksum tokens are still accepted.
-`false`: treat the situation as a validation failure.
In short, `requires_capture` identifies which capture must exist, while `skip_if_missing` determines whether missing data is a hard failure or an allowed legacy case.
When writing custom rules, consider the following best practices:
1.**Multi-line Regex:** Write your regex patterns over multiple lines for clarity. Use the `(?x)` flag to enable free-spacing mode.
2.**Optimize for Performance:** Structure your regex to minimize backtracking. Use non-capturing groups where possible and keep the pattern as concise as possible.
3.**Validation Integration:** Define a `validation` section if you want to verify the detected secret. Prefer `Http` or `Grpc`; use an existing typed validator when the rule matches a supported validator family; use `Raw` only for rare provider-specific exception paths. You can use Liquid templating to insert dynamic values where supported. Use the unnamed capture as `TOKEN` and any named captures in uppercase.
4.**Revocation Integration:** Define a `revocation` section if you want to revoke a detected secret. It uses the same HTTP request format and template variables as `validation`.
5.**Test with Examples:** Always include examples that should match and, optionally, negative examples to ensure your rule behaves as expected.
This advanced example uses the liquid-rs filters included with Kingfisher to sign requests that validate Alibaba Cloud long-lived and STS temporary credential pairs: