When scanning a self-hosted Gitea/Forgejo instance, the API may be
reachable at a different hostname than the git clone endpoint (e.g.,
internal API vs. public clone URL behind a reverse proxy). The
--clone-url-base flag rewrites the scheme, host, and port of clone
URLs returned by the API, preserving the path.
Example:
kingfisher scan gitea \
--api-url https://forge.internal.example.com/api/v1/ \
--clone-url-base https://forge.internal.example.com/ \
--user eblume
This avoids routing clone traffic through an external proxy when the
API and git endpoints share the same internal host but the instance's
ROOT_URL points to the public endpoint.
Includes unit tests for the URL rewriting function and an integration
test using wiremock to verify the full enumeration path.
Addresses review feedback on the validator panic-containment change:
- Keep raw panic payloads out of the cached and user-visible
`validation_response_body`, since a panic message can embed secret
material (e.g. a token captured in a debug string). The visible body
now reports only the stable rule id, and the detailed payload is
emitted via truncated structured logging.
- Replace the nested `Result<Result<(), String>, Elapsed>` with a
self-describing `ValidationOutcome` enum (`Completed` / `Panicked` /
`TimedOut`) so call sites and signatures read clearly.
- Document why the `AssertUnwindSafe` panic boundary is sound: the
recovery path deterministically resets the match's validation fields,
and the shared counters/cache are only mutated after the boundary
returns, so an unwind cannot leave them inconsistent.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Replace the inline RS256 token and committed public key with a
throwaway RSA keypair generated at runtime; the token is signed from
readable claims so no key material or opaque blobs live in the repo
- Add `rsa` as a dev-only dependency (getrandom feature) for in-test
key generation; release binary is unaffected
Addresses review feedback on #386.
- Git repository scans now extract archive blobs encountered in the object database, not just on the filesystem. Previously a .zip/.jar/.apk/.tar.gz committed to a repo was scanned as raw compressed bytes, so secrets inside it were invisible. The git enumerator fans each archive entry out as a synthetic blob with the original commit metadata. Honors --no-extract-archives for opt-out.
- Performance: ZIP-based git blobs ≤ 64 MB extract entirely in memory (no temp-file round trip), beating the v1.99.0 baseline by ~15% on a 80 GiB monorepo despite scanning ~300K additional archive-content blobs. Larger archives auto-fall-back to a disk-streaming extractor.
- Memory safety: hard caps on archive extraction — 64 MB compressed pre-flight, 256 MB aggregate decompressed per archive (in-memory and disk paths), 512 MB per entry, plus a PK\x03\x04 magic-byte gate. Worst-case footprint is bounded at ~num_jobs * 320 MB.