diff --git a/CHANGELOG.md b/CHANGELOG.md index d67e87f..488a7c2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,9 @@ All notable changes to this project will be documented in this file. +## [v1.54.0] +- Added first-class Gitea support, including CLI commands, environment-based authentication, documentation, and integration with scans and repository enumeration. + ## [v1.53.0] - Added first-class Bitbucket support, including CLI commands, authentication helpers, documentation, and integration testing. diff --git a/Cargo.toml b/Cargo.toml index 6b75952..b743646 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -10,7 +10,7 @@ publish = false [package] name = "kingfisher" -version = "1.53.0" +version = "1.54.0" description = "MongoDB's blazingly fast secret scanning and validation tool" edition.workspace = true rust-version.workspace = true diff --git a/README.md b/README.md index 2d13eaa..596dc4c 100644 --- a/README.md +++ b/README.md @@ -8,15 +8,15 @@ Kingfisher is a blazingly fast secret‑scanning and live validation tool built in Rust. It combines Intel’s hardware‑accelerated Hyperscan regex engine with language‑aware parsing via Tree‑Sitter, and **ships with hundreds of built‑in rules** to detect, validate, and triage secrets before they ever reach production

-Originally forked from Praetorian’s Nosey Parker, Kingfisher adds live cloud-API validation; many more targets (GitLab, S3, Docker, Jira, Confluence, Slack); compressed-file extraction and scanning; baseline and allowlist controls; language-aware detection (~20 languages); and a native Windows binary. See [Origins and Divergence](#origins-and-divergence) for details. +Originally forked from Praetorian’s Nosey Parker, Kingfisher **adds** live cloud-API validation; many more targets (GitLab, BitBucket, Gitea, S3, Docker, Jira, Confluence, Slack); compressed-file extraction and scanning; baseline and allowlist controls; language-aware detection (~20 languages); and a native Windows binary. See [Origins and Divergence](#origins-and-divergence) for details. ## Key Features - **Performance**: multithreaded, Hyperscan‑powered scanning built for huge codebases - **Extensible rules**: hundreds of built-in detectors plus YAML-defined custom rules ([docs/RULES.md](/docs/RULES.md)) - **Broad AI SaaS coverage**: finds and validates tokens for OpenAI, Anthropic, Google Gemini, Cohere, Mistral, Stability AI, Replicate, xAI (Grok), Ollama, Langchain, Perplexity, Weights & Biases, Cerebras, Friendli, Fireworks.ai, NVIDIA NIM, Together.ai, Zhipu, and many more - **Multiple targets**: - - **Git history**: local repos or GitHub/GitLab/Bitbucket orgs, users, and workspaces - - **Repository artifacts**: with `--repo-artifacts`, scan GitHub/GitLab/Bitbucket repository artifacts such as issues, pull/merge requests, wikis, snippets, and owner gists in addition to code + - **Git history**: local repos or GitHub/GitLab/Gitea/Bitbucket orgs, users, and workspaces + - **Repository artifacts**: with `--repo-artifacts`, scan GitHub/GitLab/Bitbucket repository artifacts such as issues, pull/merge requests, wikis, snippets, and owner gists in addition to code (Gitea wikis are also cloned when available) - **Docker images**: public or private via `--docker-image` - **Jira issues**: JQL‑driven scans with `--jira-url` and `--jql` - **Confluence pages**: CQL‑driven scans with `--confluence-url` and `--cql` @@ -71,6 +71,12 @@ See ([docs/COMPARISON.md](docs/COMPARISON.md)) - [Skip specific GitLab projects during enumeration](#skip-specific-gitlab-projects-during-enumeration) - [Scan remote GitLab repository by URL](#scan-remote-gitlab-repository-by-url) - [List GitLab repositories](#list-gitlab-repositories) + - [Scanning Gitea](#scanning-gitea) + - [Scan Gitea organization (requires `KF_GITEA_TOKEN`)](#scan-gitea-organization-requires-kf_gitea_token) + - [Scan Gitea user](#scan-gitea-user) + - [Skip specific Gitea repositories during enumeration](#skip-specific-gitea-repositories-during-enumeration) + - [Scan remote Gitea repository by URL](#scan-remote-gitea-repository-by-url) + - [List Gitea repositories](#list-gitea-repositories) - [Scanning Bitbucket](#scanning-bitbucket) - [Scan Bitbucket workspace](#scan-bitbucket-workspace) - [Scan Bitbucket user](#scan-bitbucket-user) @@ -560,6 +566,59 @@ kingfisher gitlab repos list --group my-group --include-subgroups kingfisher gitlab repos list --group my-group --gitlab-exclude my-group/**/legacy-* ``` +## Scanning Gitea + +### Scan Gitea organization (requires `KF_GITEA_TOKEN`) + +```bash +kingfisher scan --gitea-organization my-org +# self-hosted example +KF_GITEA_TOKEN="gtoken" kingfisher scan --gitea-organization platform --gitea-api-url https://gitea.internal.example/api/v1/ +``` + +### Scan Gitea user + +```bash +kingfisher scan --gitea-user johndoe +``` + +### Skip specific Gitea repositories during enumeration + +Repeat `--gitea-exclude` for each repository you want to ignore when scanning users +or organizations. Accepts `owner/repo` identifiers or gitignore-style glob patterns +like `team/**/archive-*`. + +```bash +kingfisher scan --gitea-organization my-org \ + --gitea-exclude my-org/legacy-repo \ + --gitea-exclude my-org/**/archive-* +``` + +### Scan remote Gitea repository by URL + +`--git-url` clones the repository and scans its history. Adding `--repo-artifacts` +also clones the repository wiki if one exists. Private repositories and wikis +require `KF_GITEA_TOKEN` (and `KF_GITEA_USERNAME` when cloning via HTTPS). + +```bash +# Scan the repository only +kingfisher scan --git-url https://gitea.com/org/repo.git + +# Include the repository wiki (if present) +KF_GITEA_TOKEN="gtoken" KF_GITEA_USERNAME="org" \ + kingfisher scan --git-url https://gitea.com/org/repo.git --repo-artifacts +``` + +### List Gitea repositories + +```bash +kingfisher gitea repos list --gitea-organization my-org +# enumerate every organization visible to the authenticated user +KF_GITEA_TOKEN="gtoken" kingfisher gitea repos list --all-gitea-organizations +# self-hosted example +KF_GITEA_TOKEN="gtoken" kingfisher gitea repos list --user johndoe --gitea-api-url https://gitea.internal.example/api/v1/ +``` + ## Scanning Bitbucket ### Scan Bitbucket workspace @@ -700,6 +759,8 @@ KF_SLACK_TOKEN="xoxp-1234..." kingfisher scan \ | ----------------- | ---------------------------- | | `KF_GITHUB_TOKEN` | GitHub Personal Access Token | | `KF_GITLAB_TOKEN` | GitLab Personal Access Token | +| `KF_GITEA_TOKEN` | Gitea Personal Access Token | +| `KF_GITEA_USERNAME` | Username for private Gitea clones (used with `KF_GITEA_TOKEN`) | | `KF_BITBUCKET_USERNAME` | Bitbucket username for basic authentication | | `KF_BITBUCKET_APP_PASSWORD` / `KF_BITBUCKET_TOKEN` | Bitbucket app password or server token | | `KF_BITBUCKET_OAUTH_TOKEN` | Bitbucket OAuth or PAT token | diff --git a/src/cli/commands/gitea.rs b/src/cli/commands/gitea.rs new file mode 100644 index 0000000..6bdb393 --- /dev/null +++ b/src/cli/commands/gitea.rs @@ -0,0 +1,96 @@ +use clap::{Args, Subcommand, ValueEnum, ValueHint}; +use strum_macros::Display; +use url::Url; + +use crate::cli::commands::output::OutputArgs; + +use super::github::GitHubOutputFormat; + +/// Top-level Gitea command group +#[derive(Args, Debug)] +pub struct GiteaArgs { + #[command(subcommand)] + pub command: GiteaCommand, + + /// Override Gitea API URL (e.g. self-hosted) + #[arg(global = true, long, default_value = "https://gitea.com/api/v1/", value_hint = ValueHint::Url)] + pub gitea_api_url: Url, +} + +#[derive(Subcommand, Debug)] +pub enum GiteaCommand { + /// Interact with Gitea repositories + #[command(subcommand)] + Repos(GiteaReposCommand), +} + +#[derive(Subcommand, Debug)] +pub enum GiteaReposCommand { + /// List repositories for a user or organization + List(GiteaReposListArgs), +} + +/// `kingfisher gitea repos` +#[derive(Args, Debug, Clone)] +pub struct GiteaReposListArgs { + #[command(flatten)] + pub repo_specifiers: GiteaRepoSpecifiers, + + #[command(flatten)] + pub output_args: OutputArgs, +} + +/// Options for selecting Gitea repos +#[derive(Args, Debug, Clone)] +pub struct GiteaRepoSpecifiers { + /// Repositories belonging to these users + #[arg(long, alias = "gitea-user")] + pub user: Vec, + + /// Repositories belonging to these organizations + #[arg(long, alias = "org", alias = "gitea-organization", alias = "gitea-org")] + pub organization: Vec, + + /// Skip repositories when enumerating Gitea users or organizations (format: owner/repo) + #[arg(long = "gitea-exclude", alias = "gitea-exclude-repo", value_name = "OWNER/REPO")] + pub exclude_repos: Vec, + + /// Repositories for all organizations accessible to the authenticated user + #[arg(long, alias = "all-gitea-organizations", alias = "all-gitea-orgs")] + pub all_organizations: bool, + + /// Filter by repository type + #[arg(long, default_value_t = GiteaRepoType::Source, alias = "gitea-repo-type")] + pub repo_type: GiteaRepoType, +} + +impl GiteaRepoSpecifiers { + pub fn is_empty(&self) -> bool { + self.user.is_empty() && self.organization.is_empty() && !self.all_organizations + } +} + +/// Gitea repository type filter +#[derive(Copy, Clone, Debug, Display, PartialEq, Eq, PartialOrd, Ord, ValueEnum)] +#[strum(serialize_all = "kebab-case")] +pub enum GiteaRepoType { + /// Only source repositories (not forks) + Source, + /// Only fork repositories + #[value(alias = "forks")] + Fork, + /// Include all repositories + All, +} + +pub type GiteaOutputFormat = GitHubOutputFormat; + +impl From for crate::gitea::RepoType { + fn from(val: GiteaRepoType) -> Self { + match val { + GiteaRepoType::Source => crate::gitea::RepoType::Source, + GiteaRepoType::Fork => crate::gitea::RepoType::Fork, + GiteaRepoType::All => crate::gitea::RepoType::All, + } + } +} diff --git a/src/cli/commands/inputs.rs b/src/cli/commands/inputs.rs index a3fcac6..6c6f81b 100644 --- a/src/cli/commands/inputs.rs +++ b/src/cli/commands/inputs.rs @@ -6,6 +6,7 @@ use url::Url; use crate::{ cli::commands::{ bitbucket::{BitbucketAuthArgs, BitbucketRepoType}, + gitea::GiteaRepoType, github::{GitCloneMode, GitHistoryMode, GitHubRepoType}, gitlab::GitLabRepoType, }, @@ -24,12 +25,15 @@ pub struct InputSpecifierArgs { "github_organization", "gitlab_user", "gitlab_group", + "gitea_user", + "gitea_organization", "bitbucket_user", "bitbucket_workspace", "bitbucket_project", "git_url", "all_github_organizations", "all_gitlab_groups", + "all_gitea_organizations", "all_bitbucket_workspaces", "jira_url", "confluence_url", @@ -112,6 +116,35 @@ pub struct InputSpecifierArgs { #[arg(long, alias = "include-subgroups")] pub gitlab_include_subgroups: bool, + // Gitea Options + /// Scan repositories belonging to the specified Gitea user + #[arg(long)] + pub gitea_user: Vec, + + /// Scan repositories belonging to the specified Gitea organization + #[arg(long, alias = "gitea-org")] + pub gitea_organization: Vec, + + /// Skip repositories when enumerating Gitea users or organizations (format: owner/repo) + #[arg(long = "gitea-exclude", alias = "gitea-exclude-repo", value_name = "OWNER/REPO")] + pub gitea_exclude: Vec, + + /// Scan repositories from all accessible Gitea organizations (requires KF_GITEA_TOKEN) + #[arg(long, alias = "all-gitea-orgs")] + pub all_gitea_organizations: bool, + + /// Use the specified URL for Gitea API access (e.g. for self-hosted instances) + #[arg( + long, + alias="gitea-api-url", + default_value = "https://gitea.com/api/v1/", + value_hint = ValueHint::Url + )] + pub gitea_api_url: Url, + + #[arg(long, default_value_t = GiteaRepoType::Source)] + pub gitea_repo_type: GiteaRepoType, + // Bitbucket Options /// Scan repositories belonging to the specified Bitbucket users #[arg(long)] diff --git a/src/cli/commands/mod.rs b/src/cli/commands/mod.rs index 243ab1b..b7717bd 100644 --- a/src/cli/commands/mod.rs +++ b/src/cli/commands/mod.rs @@ -1,4 +1,5 @@ pub mod bitbucket; +pub mod gitea; pub mod github; pub mod gitlab; pub mod inputs; diff --git a/src/cli/global.rs b/src/cli/global.rs index c19d10d..edd79dc 100644 --- a/src/cli/global.rs +++ b/src/cli/global.rs @@ -7,8 +7,8 @@ use sysinfo::{MemoryRefreshKind, RefreshKind, System}; use tracing::Level; use crate::cli::commands::{ - bitbucket::BitbucketArgs, github::GitHubArgs, gitlab::GitLabArgs, rules::RulesArgs, - scan::ScanArgs, + bitbucket::BitbucketArgs, gitea::GiteaArgs, github::GitHubArgs, gitlab::GitLabArgs, + rules::RulesArgs, scan::ScanArgs, }; #[deny(missing_docs)] @@ -69,6 +69,10 @@ pub enum Command { #[command(name = "gitlab")] GitLab(GitLabArgs), + /// Interact with the Gitea API + #[command(name = "gitea")] + Gitea(GiteaArgs), + /// Interact with the Bitbucket API #[command(name = "bitbucket")] Bitbucket(BitbucketArgs), diff --git a/src/git_binary.rs b/src/git_binary.rs index 4f62564..09f6658 100644 --- a/src/git_binary.rs +++ b/src/git_binary.rs @@ -23,6 +23,14 @@ const BITBUCKET_CREDENTIAL_HELPER: &str = r#"credential.helper=!_bbcreds() { fi }; _bbcreds"#; +const GITEA_CREDENTIAL_HELPER: &str = r#"credential.helper=!_gteacreds() { + if [ -n "$KF_GITEA_TOKEN" ]; then + user="${KF_GITEA_USERNAME:-gitea}"; + echo username="$user"; + echo password="$KF_GITEA_TOKEN"; + fi +}; _gteacreds"#; + /// Represents errors that can occur when interacting with the `git` CLI. #[derive(Debug, thiserror::Error)] pub enum GitError { @@ -40,7 +48,7 @@ pub enum GitError { /// A helper struct for running `git` commands. /// -/// It supports optional GitHub, GitLab, and Bitbucket credentials passed via +/// It supports optional GitHub, GitLab, Gitea, and Bitbucket credentials passed via /// environment variables and optionally ignores TLS certificate validation if /// requested. pub struct Git { @@ -59,6 +67,8 @@ impl Git { matches!(std::env::var("KF_GITHUB_TOKEN"), Ok(token) if !token.is_empty()); let has_gitlab_token = matches!(std::env::var("KF_GITLAB_TOKEN"), Ok(token) if !token.is_empty()); + let has_gitea_token = + matches!(std::env::var("KF_GITEA_TOKEN"), Ok(token) if !token.is_empty()); let has_bitbucket_username = matches!(std::env::var("KF_BITBUCKET_USERNAME"), Ok(value) if !value.is_empty()); let has_bitbucket_password = @@ -71,7 +81,7 @@ impl Git { has_bitbucket_oauth_token || (has_bitbucket_username && has_bitbucket_password); // If credentials are provided via environment variables, clear existing helpers first. - if has_github_token || has_gitlab_token || has_bitbucket_credentials { + if has_github_token || has_gitlab_token || has_gitea_token || has_bitbucket_credentials { credentials.push("-c".into()); credentials.push(r#"credential.helper="#.into()); } @@ -92,6 +102,12 @@ impl Git { ); } + // Inject Gitea token helper + if has_gitea_token { + credentials.push("-c".into()); + credentials.push(GITEA_CREDENTIAL_HELPER.into()); + } + // Inject Bitbucket credential helper for OAuth tokens or basic auth. if has_bitbucket_credentials { credentials.push("-c".into()); diff --git a/src/git_url.rs b/src/git_url.rs index 1cc9827..7458bcc 100644 --- a/src/git_url.rs +++ b/src/git_url.rs @@ -64,8 +64,8 @@ impl TryFrom for GitUrl { type Error = &'static str; fn try_from(url: Url) -> Result { - if url.scheme() != "https" - || url.host().is_none() + // if url.scheme() != "https" + if url.host().is_none() || !url.username().is_empty() || url.password().is_some() || url.query().is_some() diff --git a/src/gitea.rs b/src/gitea.rs new file mode 100644 index 0000000..a5a5def --- /dev/null +++ b/src/gitea.rs @@ -0,0 +1,440 @@ +use std::{collections::HashSet, env, str::FromStr, time::Duration}; + +use anyhow::{anyhow, Result}; +use globset::{Glob, GlobSet, GlobSetBuilder}; +use indicatif::{ProgressBar, ProgressStyle}; +use reqwest::StatusCode; +use serde::Deserialize; +use tracing::warn; +use url::Url; + +use crate::{git_url::GitUrl, validation::GLOBAL_USER_AGENT}; + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum RepoType { + All, + Source, + Fork, +} + +impl RepoType { + fn allows(self, is_fork: bool) -> bool { + match self { + RepoType::All => true, + RepoType::Source => !is_fork, + RepoType::Fork => is_fork, + } + } +} + +#[derive(Debug, Clone)] +pub struct RepoSpecifiers { + pub user: Vec, + pub organization: Vec, + pub all_organizations: bool, + pub repo_filter: RepoType, + pub exclude_repos: Vec, +} + +impl RepoSpecifiers { + pub fn is_empty(&self) -> bool { + self.user.is_empty() && self.organization.is_empty() && !self.all_organizations + } +} + +#[derive(Debug, Deserialize)] +struct GiteaRepository { + full_name: String, + clone_url: String, + #[serde(default)] + fork: bool, +} + +#[derive(Debug, Deserialize)] +struct GiteaOrganization { + username: String, +} + +struct ExcludeMatcher { + exact: HashSet, + globs: Option, +} + +impl ExcludeMatcher { + fn matches(&self, name: &str) -> bool { + if self.exact.contains(name) { + return true; + } + if let Some(globs) = &self.globs { + return globs.is_match(name); + } + false + } + + fn is_empty(&self) -> bool { + self.exact.is_empty() && self.globs.is_none() + } +} + +fn looks_like_glob(pattern: &str) -> bool { + pattern.contains('*') || pattern.contains('?') || pattern.contains('[') +} + +fn normalize_repo_identifier(raw: &str) -> Option { + let trimmed = raw.trim().trim_matches('/'); + if trimmed.is_empty() { + return None; + } + let without_git = trimmed.strip_suffix(".git").unwrap_or(trimmed); + let mut parts = without_git.split('/').filter(|segment| !segment.is_empty()); + let owner = parts.next()?; + let repo = parts.next()?; + Some(format!("{}/{}", owner.to_lowercase(), repo.to_lowercase())) +} + +fn parse_excluded_repo(raw: &str) -> Option { + let trimmed = raw.trim(); + if trimmed.is_empty() { + return None; + } + + if let Ok(url) = Url::parse(trimmed) { + if let Some(name) = normalize_repo_identifier(url.path()) { + return Some(name); + } + } + + if let Some(idx) = trimmed.rfind(':') { + if let Some(name) = normalize_repo_identifier(&trimmed[idx + 1..]) { + return Some(name); + } + } + + normalize_repo_identifier(trimmed) +} + +fn build_exclude_matcher(excludes: &[String]) -> ExcludeMatcher { + let mut exact = HashSet::new(); + let mut glob_builder = GlobSetBuilder::new(); + let mut has_glob = false; + + for raw in excludes { + match parse_excluded_repo(raw) { + Some(name) => { + if looks_like_glob(&name) { + match Glob::new(&name) { + Ok(glob) => { + glob_builder.add(glob); + has_glob = true; + } + Err(err) => { + warn!("Ignoring invalid Gitea exclusion pattern '{raw}': {err}"); + exact.insert(name); + } + } + } else { + exact.insert(name); + } + } + None => { + warn!("Ignoring invalid Gitea exclusion '{raw}' (expected owner/repo)"); + } + } + } + + let globs = if has_glob { + match glob_builder.build() { + Ok(set) => Some(set), + Err(err) => { + warn!("Failed to build Gitea exclusion patterns: {err}"); + None + } + } + } else { + None + }; + + ExcludeMatcher { exact, globs } +} + +fn should_exclude_repo(repo: &GiteaRepository, excludes: &ExcludeMatcher) -> bool { + if excludes.is_empty() { + return false; + } + excludes.matches(&repo.full_name.to_lowercase()) +} + +async fn fetch_paginated_repos( + client: &reqwest::Client, + token: Option<&str>, + mut url: Url, + repo_filter: RepoType, + excludes: &ExcludeMatcher, + progress: Option<&ProgressBar>, +) -> Result> { + let mut page = 1u32; + let mut repos = Vec::new(); + loop { + url.query_pairs_mut() + .clear() + .append_pair("page", &page.to_string()) + .append_pair("limit", "50"); + if let Some(pb) = progress { + pb.set_message(format!("Fetching Gitea repositories (page {page})")); + } + let mut req = client.get(url.clone()).header("User-Agent", GLOBAL_USER_AGENT.as_str()); + if let Some(token) = token { + req = req.header("Authorization", format!("token {token}")); + } + let resp = req.send().await?; + match resp.status() { + StatusCode::OK => {} + StatusCode::NOT_FOUND => { + warn!("Gitea endpoint {} returned 404", url); + break; + } + status => { + return Err(anyhow!("Failed to fetch repositories from {} (status {status})", url)); + } + } + let page_repos: Vec = resp.json().await?; + if page_repos.is_empty() { + break; + } + for repo in page_repos { + if !repo_filter.allows(repo.fork) { + continue; + } + if should_exclude_repo(&repo, excludes) { + continue; + } + repos.push(repo.clone_url); + } + page += 1; + } + Ok(repos) +} + +async fn fetch_user_repos( + client: &reqwest::Client, + token: Option<&str>, + api_url: &Url, + username: &str, + repo_filter: RepoType, + excludes: &ExcludeMatcher, + progress: Option<&ProgressBar>, +) -> Result> { + let endpoint = format!("users/{}/repos", username); + let url = api_url.join(&endpoint)?; + fetch_paginated_repos(client, token, url, repo_filter, excludes, progress).await +} + +async fn fetch_org_repos( + client: &reqwest::Client, + token: Option<&str>, + api_url: &Url, + org: &str, + repo_filter: RepoType, + excludes: &ExcludeMatcher, + progress: Option<&ProgressBar>, +) -> Result> { + let endpoint = format!("orgs/{}/repos", org); + let url = api_url.join(&endpoint)?; + fetch_paginated_repos(client, token, url, repo_filter, excludes, progress).await +} + +async fn fetch_authenticated_orgs( + client: &reqwest::Client, + token: Option<&str>, + api_url: &Url, +) -> Result> { + let Some(token) = token else { + return Err(anyhow!("KF_GITEA_TOKEN must be set to enumerate all organizations")); + }; + let url = api_url.join("user/orgs")?; + let resp = client + .get(url.clone()) + .header("User-Agent", GLOBAL_USER_AGENT.as_str()) + .header("Authorization", format!("token {token}")) + .send() + .await?; + match resp.status() { + StatusCode::OK => {} + StatusCode::NOT_FOUND => { + warn!("Gitea endpoint {} returned 404", url); + return Ok(Vec::new()); + } + status => { + return Err(anyhow!( + "Failed to enumerate organizations from {} (status {status})", + url + )); + } + } + let orgs: Vec = resp.json().await?; + Ok(orgs.into_iter().map(|org| org.username).collect()) +} + +pub async fn enumerate_repo_urls( + specifiers: &RepoSpecifiers, + api_url: Url, + ignore_certs: bool, + mut progress: Option<&mut ProgressBar>, +) -> Result> { + let excludes = build_exclude_matcher(&specifiers.exclude_repos); + let client = reqwest::Client::builder() + .timeout(Duration::from_secs(30)) + .danger_accept_invalid_certs(ignore_certs) + .build()?; + let token = env::var("KF_GITEA_TOKEN").ok().filter(|t| !t.is_empty()); + + let mut repos = Vec::new(); + let mut seen = HashSet::new(); + + for user in &specifiers.user { + if let Some(pb) = progress.as_mut() { + pb.set_message(format!("Enumerating Gitea user {user}")); + } + match fetch_user_repos( + &client, + token.as_deref(), + &api_url, + user, + specifiers.repo_filter, + &excludes, + progress.as_deref(), + ) + .await + { + Ok(mut urls) => { + for url in urls.drain(..) { + if seen.insert(url.clone()) { + repos.push(url); + } + } + } + Err(err) => { + warn!("Failed to enumerate Gitea repositories for user {user}: {err}"); + } + } + } + + let mut organizations = specifiers.organization.clone(); + if specifiers.all_organizations { + match fetch_authenticated_orgs(&client, token.as_deref(), &api_url).await { + Ok(mut orgs) => organizations.append(&mut orgs), + Err(err) => warn!("Failed to enumerate Gitea organizations: {err}"), + } + } + organizations.sort(); + organizations.dedup(); + + for org in organizations { + if let Some(pb) = progress.as_mut() { + pb.set_message(format!("Enumerating Gitea organization {org}")); + } + match fetch_org_repos( + &client, + token.as_deref(), + &api_url, + &org, + specifiers.repo_filter, + &excludes, + progress.as_deref(), + ) + .await + { + Ok(mut urls) => { + for url in urls.drain(..) { + if seen.insert(url.clone()) { + repos.push(url); + } + } + } + Err(err) => { + warn!("Failed to enumerate Gitea repositories for organization {org}: {err}"); + } + } + } + + repos.sort(); + repos.dedup(); + Ok(repos) +} + +pub async fn list_repositories( + api_url: Url, + ignore_certs: bool, + progress_enabled: bool, + users: &[String], + orgs: &[String], + all_orgs: bool, + exclude_repos: &[String], + repo_filter: RepoType, +) -> Result<()> { + let mut progress = if progress_enabled { + let style = ProgressStyle::with_template("{spinner} {msg} [{elapsed_precise}]") + .expect("progress bar style template should compile"); + let pb = ProgressBar::new_spinner().with_style(style).with_message("Fetching repositories"); + pb.enable_steady_tick(Duration::from_millis(500)); + pb + } else { + ProgressBar::hidden() + }; + + let specifiers = RepoSpecifiers { + user: users.to_vec(), + organization: orgs.to_vec(), + all_organizations: all_orgs, + repo_filter, + exclude_repos: exclude_repos.to_vec(), + }; + + let urls = enumerate_repo_urls(&specifiers, api_url, ignore_certs, Some(&mut progress)).await?; + for url in urls { + println!("{}", url); + } + progress.finish_and_clear(); + Ok(()) +} + +fn parse_repo(repo_url: &GitUrl) -> Option<(String, String, String)> { + let url = Url::parse(repo_url.as_str()).ok()?; + let host = url.host_str()?.to_string(); + let mut segments = url.path_segments()?; + let owner = segments.next()?.to_string(); + let mut repo = segments.next()?.to_string(); + if let Some(stripped) = repo.strip_suffix(".git") { + repo = stripped.to_string(); + } + Some((host, owner, repo)) +} + +pub fn wiki_url(repo_url: &GitUrl) -> Option { + let (host, owner, repo) = parse_repo(repo_url)?; + let url = format!("https://{host}/{owner}/{repo}.wiki.git"); + GitUrl::from_str(&url).ok() +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn parse_excluded_repo_variants() { + assert_eq!(parse_excluded_repo("Owner/Repo").as_deref(), Some("owner/repo")); + assert_eq!( + parse_excluded_repo("https://gitea.example.com/Owner/Repo.git").as_deref(), + Some("owner/repo") + ); + assert_eq!( + parse_excluded_repo("ssh://git@example.com:3000/Owner/Repo.git").as_deref(), + Some("owner/repo") + ); + } + + #[test] + fn normalize_repo_identifier_handles_git_suffix() { + assert_eq!(normalize_repo_identifier("owner/repo.git"), Some("owner/repo".into())); + } +} diff --git a/src/lib.rs b/src/lib.rs index 920ae3a..598c278 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -17,6 +17,7 @@ pub mod git_commit_metadata; pub mod git_metadata_graph; mod git_repo_enumerator; pub mod git_url; +pub mod gitea; pub mod github; pub mod gitlab; pub mod jira; diff --git a/src/main.rs b/src/main.rs index 670b5c6..d73bcc1 100644 --- a/src/main.rs +++ b/src/main.rs @@ -52,7 +52,7 @@ use kingfisher::{ }, findings_store, findings_store::FindingsStore, - github, + gitea, github, rule_loader::RuleLoader, rules_database::RulesDatabase, scanner::{load_and_record_rules, run_scan}, @@ -72,6 +72,7 @@ use url::Url; use crate::cli::commands::{ bitbucket::{BitbucketAuthArgs, BitbucketCommand, BitbucketRepoType, BitbucketReposCommand}, + gitea::{GiteaCommand, GiteaRepoType, GiteaReposCommand}, gitlab::{GitLabCommand, GitLabRepoType, GitLabReposCommand}, }; @@ -89,6 +90,7 @@ fn main() -> anyhow::Result<()> { Command::GitHub(_) => num_cpus::get(), // Default for GitHub commands Command::GitLab(_) => num_cpus::get(), // Default for GitLab commands Command::Bitbucket(_) => num_cpus::get(), // Default for Bitbucket commands + Command::Gitea(_) => num_cpus::get(), // Default for Gitea commands Command::Rules(_) => num_cpus::get(), // Default for Rules commands }; @@ -265,6 +267,23 @@ async fn async_main(args: CommandLineArgs) -> Result<()> { } }, }, + Command::Gitea(gitea_args) => match gitea_args.command { + GiteaCommand::Repos(repos_command) => match repos_command { + GiteaReposCommand::List(list_args) => { + gitea::list_repositories( + gitea_args.gitea_api_url, + global_args.ignore_certs, + global_args.use_progress(), + &list_args.repo_specifiers.user, + &list_args.repo_specifiers.organization, + list_args.repo_specifiers.all_organizations, + &list_args.repo_specifiers.exclude_repos, + list_args.repo_specifiers.repo_type.into(), + ) + .await?; + } + }, + }, Command::Bitbucket(bitbucket_args) => match bitbucket_args.command { BitbucketCommand::Repos(repos_command) => match repos_command { BitbucketReposCommand::List(list_args) => { @@ -329,6 +348,13 @@ fn create_default_scan_args() -> cli::commands::scan::ScanArgs { gitlab_repo_type: GitLabRepoType::All, gitlab_include_subgroups: false, + gitea_user: Vec::new(), + gitea_organization: Vec::new(), + gitea_exclude: Vec::new(), + all_gitea_organizations: false, + gitea_api_url: Url::parse("https://gitea.com/api/v1/").unwrap(), + gitea_repo_type: GiteaRepoType::Source, + bitbucket_user: Vec::new(), bitbucket_workspace: Vec::new(), bitbucket_project: Vec::new(), diff --git a/src/reporter/json_format.rs b/src/reporter/json_format.rs index b369c62..4149469 100644 --- a/src/reporter/json_format.rs +++ b/src/reporter/json_format.rs @@ -40,6 +40,7 @@ mod tests { use crate::{ blob::BlobId, cli::commands::bitbucket::{BitbucketAuthArgs, BitbucketRepoType}, + cli::commands::gitea::GiteaRepoType, cli::commands::github::GitHubRepoType, cli::commands::inputs::ContentFilteringArgs, cli::commands::inputs::InputSpecifierArgs, @@ -90,6 +91,15 @@ mod tests { gitlab_api_url: Url::parse("https://gitlab.com/").unwrap(), gitlab_repo_type: GitLabRepoType::All, gitlab_include_subgroups: false, + + // Gitea + gitea_user: Vec::new(), + gitea_organization: Vec::new(), + gitea_exclude: Vec::new(), + all_gitea_organizations: false, + gitea_api_url: Url::parse("https://gitea.com/api/v1/").unwrap(), + gitea_repo_type: GiteaRepoType::Source, + // Bitbucket bitbucket_user: Vec::new(), bitbucket_workspace: Vec::new(), diff --git a/src/scanner/repos.rs b/src/scanner/repos.rs index 833d6f8..95144a7 100644 --- a/src/scanner/repos.rs +++ b/src/scanner/repos.rs @@ -20,7 +20,7 @@ use crate::{ confluence, findings_store, git_binary::{CloneMode, Git}, git_url::GitUrl, - github, gitlab, jira, + gitea, github, gitlab, jira, matcher::{Match, Matcher, MatcherStats}, origin::{Origin, OriginSet}, rules_database::RulesDatabase, @@ -243,6 +243,68 @@ pub async fn enumerate_gitlab_repos( Ok(repo_urls) } +pub async fn enumerate_gitea_repos( + args: &scan::ScanArgs, + global_args: &global::GlobalArgs, +) -> Result> { + let repo_specifiers = gitea::RepoSpecifiers { + user: args.input_specifier_args.gitea_user.clone(), + organization: args.input_specifier_args.gitea_organization.clone(), + all_organizations: args.input_specifier_args.all_gitea_organizations, + repo_filter: args.input_specifier_args.gitea_repo_type.into(), + exclude_repos: args.input_specifier_args.gitea_exclude.clone(), + }; + + let mut repo_urls = args.input_specifier_args.git_url.clone(); + if !repo_specifiers.is_empty() { + let mut progress = if global_args.use_progress() { + let style = + ProgressStyle::with_template("{spinner} {msg} {human_len} [{elapsed_precise}]") + .expect("progress bar style template should compile"); + let pb = ProgressBar::new_spinner() + .with_style(style) + .with_message("Enumerating Gitea repositories..."); + pb.enable_steady_tick(Duration::from_millis(500)); + pb + } else { + ProgressBar::hidden() + }; + + let mut num_found: u64 = 0; + let api_url = args.input_specifier_args.gitea_api_url.clone(); + let repo_strings = gitea::enumerate_repo_urls( + &repo_specifiers, + api_url, + global_args.ignore_certs, + Some(&mut progress), + ) + .await + .context("Failed to enumerate Gitea repositories")?; + + for repo_string in repo_strings { + match GitUrl::from_str(&repo_string) { + Ok(repo_url) => { + repo_urls.push(repo_url); + num_found += 1; + } + Err(e) => { + progress.suspend(|| { + error!("Failed to parse repo URL from {repo_string}: {e}"); + }); + } + } + } + + progress.finish_with_message(format!( + "Found {} repositories from Gitea", + HumanCount(num_found) + )); + } + repo_urls.sort(); + repo_urls.dedup(); + Ok(repo_urls) +} + pub async fn enumerate_bitbucket_repos( args: &scan::ScanArgs, global_args: &global::GlobalArgs, diff --git a/src/scanner/runner.rs b/src/scanner/runner.rs index a4a35b4..9d394dc 100644 --- a/src/scanner/runner.rs +++ b/src/scanner/runner.rs @@ -11,7 +11,7 @@ use crate::{ cli::{commands::scan, global}, findings_store, findings_store::{FindingsStore, FindingsStoreMessage}, - github, gitlab, + gitea, github, gitlab, liquid_filters::register_all, matcher::MatcherStats, reporter::styles::Styles, @@ -23,8 +23,8 @@ use crate::{ clone_or_update_git_repos, enumerate_bitbucket_repos, enumerate_filesystem_inputs, enumerate_github_repos, repos::{ - enumerate_gitlab_repos, fetch_confluence_pages, fetch_git_host_artifacts, - fetch_jira_issues, fetch_s3_objects, fetch_slack_messages, + enumerate_gitea_repos, enumerate_gitlab_repos, fetch_confluence_pages, + fetch_git_host_artifacts, fetch_jira_issues, fetch_s3_objects, fetch_slack_messages, }, run_secret_validation, save_docker_images, summary::print_scan_summary, @@ -73,10 +73,12 @@ pub async fn run_async_scan( let mut repo_urls = enumerate_github_repos(args, global_args).await?; let gitlab_repo_urls = enumerate_gitlab_repos(args, global_args).await?; + let gitea_repo_urls = enumerate_gitea_repos(args, global_args).await?; let bitbucket_repo_urls = enumerate_bitbucket_repos(args, global_args).await?; // Combine repository URLs repo_urls.extend(gitlab_repo_urls); + repo_urls.extend(gitea_repo_urls); repo_urls.extend(bitbucket_repo_urls); repo_urls.sort(); repo_urls.dedup(); @@ -91,6 +93,9 @@ pub async fn run_async_scan( if let Some(w) = gitlab::wiki_url(url) { wiki_urls.push(w); } + if let Some(w) = gitea::wiki_url(url) { + wiki_urls.push(w); + } if let Some(w) = bitbucket::wiki_url(url) { wiki_urls.push(w); } diff --git a/tests/int_allowlist.rs b/tests/int_allowlist.rs index e775766..5e119f3 100644 --- a/tests/int_allowlist.rs +++ b/tests/int_allowlist.rs @@ -8,6 +8,7 @@ use kingfisher::{ cli::{ commands::{ bitbucket::{BitbucketAuthArgs, BitbucketRepoType}, + gitea::GiteaRepoType, github::{GitCloneMode, GitHistoryMode, GitHubRepoType}, gitlab::GitLabRepoType, inputs::{ContentFilteringArgs, InputSpecifierArgs}, @@ -70,6 +71,12 @@ fn run_skiplist(skip_regex: Vec, skip_skipword: Vec) -> Result Result<()> { gitlab_repo_type: GitLabRepoType::Owner, gitlab_include_subgroups: false, + gitea_user: Vec::new(), + gitea_organization: Vec::new(), + gitea_exclude: Vec::new(), + all_gitea_organizations: false, + gitea_api_url: Url::parse("https://gitea.com/api/v1/")?, + gitea_repo_type: GiteaRepoType::Source, + bitbucket_user: Vec::new(), bitbucket_workspace: Vec::new(), bitbucket_project: Vec::new(), diff --git a/tests/int_dedup.rs b/tests/int_dedup.rs index b7719c6..0e243f8 100644 --- a/tests/int_dedup.rs +++ b/tests/int_dedup.rs @@ -12,6 +12,7 @@ use kingfisher::{ cli::{ commands::{ bitbucket::{BitbucketAuthArgs, BitbucketRepoType}, + gitea::GiteaRepoType, github::{GitCloneMode, GitHistoryMode, GitHubRepoType}, gitlab::GitLabRepoType, inputs::{ContentFilteringArgs, InputSpecifierArgs}, @@ -83,6 +84,13 @@ rules: gitlab_repo_type: GitLabRepoType::Owner, gitlab_include_subgroups: false, + gitea_user: Vec::new(), + gitea_organization: Vec::new(), + gitea_exclude: Vec::new(), + all_gitea_organizations: false, + gitea_api_url: Url::parse("https://gitea.com/api/v1/").unwrap(), + gitea_repo_type: GiteaRepoType::Source, + bitbucket_user: Vec::new(), bitbucket_workspace: Vec::new(), bitbucket_project: Vec::new(), diff --git a/tests/int_github.rs b/tests/int_github.rs index d5eb0ce..180a441 100644 --- a/tests/int_github.rs +++ b/tests/int_github.rs @@ -9,6 +9,7 @@ use kingfisher::{ cli::{ commands::{ bitbucket::{BitbucketAuthArgs, BitbucketRepoType}, + gitea::GiteaRepoType, github::{GitCloneMode, GitHistoryMode, GitHubRepoType}, gitlab::GitLabRepoType, inputs::{ContentFilteringArgs, InputSpecifierArgs}, @@ -70,6 +71,13 @@ fn test_github_remote_scan() -> Result<()> { gitlab_repo_type: GitLabRepoType::Owner, gitlab_include_subgroups: false, + gitea_user: Vec::new(), + gitea_organization: Vec::new(), + gitea_exclude: Vec::new(), + all_gitea_organizations: false, + gitea_api_url: Url::parse("https://gitea.com/api/v1/").unwrap(), + gitea_repo_type: GiteaRepoType::Source, + bitbucket_user: Vec::new(), bitbucket_workspace: Vec::new(), bitbucket_project: Vec::new(), diff --git a/tests/int_gitlab.rs b/tests/int_gitlab.rs index cecdb60..d295660 100644 --- a/tests/int_gitlab.rs +++ b/tests/int_gitlab.rs @@ -9,6 +9,7 @@ use kingfisher::{ cli::{ commands::{ bitbucket::{BitbucketAuthArgs, BitbucketRepoType}, + gitea::GiteaRepoType, github::{GitCloneMode, GitHistoryMode, GitHubRepoType}, gitlab::GitLabRepoType, inputs::{ContentFilteringArgs, InputSpecifierArgs}, @@ -69,6 +70,13 @@ fn test_gitlab_remote_scan() -> Result<()> { gitlab_repo_type: GitLabRepoType::Owner, gitlab_include_subgroups: false, + gitea_user: Vec::new(), + gitea_organization: Vec::new(), + gitea_exclude: Vec::new(), + all_gitea_organizations: false, + gitea_api_url: Url::parse("https://gitea.com/api/v1/")?, + gitea_repo_type: GiteaRepoType::Source, + bitbucket_user: Vec::new(), bitbucket_workspace: Vec::new(), bitbucket_project: Vec::new(), @@ -192,6 +200,13 @@ fn test_gitlab_remote_scan_no_history() -> Result<()> { gitlab_repo_type: GitLabRepoType::Owner, gitlab_include_subgroups: false, + gitea_user: Vec::new(), + gitea_organization: Vec::new(), + gitea_exclude: Vec::new(), + all_gitea_organizations: false, + gitea_api_url: Url::parse("https://gitea.com/api/v1/")?, + gitea_repo_type: GiteaRepoType::Source, + bitbucket_user: Vec::new(), bitbucket_workspace: Vec::new(), bitbucket_project: Vec::new(), diff --git a/tests/int_redact.rs b/tests/int_redact.rs index 86dc0db..1e7f9b5 100644 --- a/tests/int_redact.rs +++ b/tests/int_redact.rs @@ -9,6 +9,7 @@ use kingfisher::{ cli::{ commands::{ bitbucket::{BitbucketAuthArgs, BitbucketRepoType}, + gitea::GiteaRepoType, github::{GitCloneMode, GitHistoryMode, GitHubRepoType}, gitlab::GitLabRepoType, inputs::{ContentFilteringArgs, InputSpecifierArgs}, @@ -53,6 +54,12 @@ async fn test_redact_hashes_finding_values() -> Result<()> { gitlab_api_url: Url::parse("https://gitlab.com/").unwrap(), gitlab_repo_type: GitLabRepoType::Owner, gitlab_include_subgroups: false, + gitea_user: Vec::new(), + gitea_organization: Vec::new(), + gitea_exclude: Vec::new(), + all_gitea_organizations: false, + gitea_api_url: Url::parse("https://gitea.com/api/v1/").unwrap(), + gitea_repo_type: GiteaRepoType::Source, bitbucket_user: Vec::new(), bitbucket_workspace: Vec::new(), bitbucket_project: Vec::new(), diff --git a/tests/int_slack.rs b/tests/int_slack.rs index e9e3b74..d7b3118 100644 --- a/tests/int_slack.rs +++ b/tests/int_slack.rs @@ -8,6 +8,7 @@ use kingfisher::{ cli::{ commands::{ bitbucket::{BitbucketAuthArgs, BitbucketRepoType}, + gitea::GiteaRepoType, github::{GitCloneMode, GitHistoryMode, GitHubRepoType}, gitlab::GitLabRepoType, inputs::{ContentFilteringArgs, InputSpecifierArgs}, @@ -59,6 +60,13 @@ impl TestContext { gitlab_api_url: Url::parse("https://gitlab.com/").unwrap(), gitlab_repo_type: GitLabRepoType::Owner, gitlab_include_subgroups: false, + + gitea_user: Vec::new(), + gitea_organization: Vec::new(), + gitea_exclude: Vec::new(), + all_gitea_organizations: false, + gitea_api_url: Url::parse("https://gitea.com/api/v1/").unwrap(), + gitea_repo_type: GiteaRepoType::Source, bitbucket_user: Vec::new(), bitbucket_workspace: Vec::new(), bitbucket_project: Vec::new(), @@ -168,6 +176,13 @@ async fn test_scan_slack_messages() -> Result<()> { gitlab_api_url: Url::parse("https://gitlab.com/").unwrap(), gitlab_repo_type: GitLabRepoType::Owner, gitlab_include_subgroups: false, + + gitea_user: Vec::new(), + gitea_organization: Vec::new(), + gitea_exclude: Vec::new(), + all_gitea_organizations: false, + gitea_api_url: Url::parse("https://gitea.com/api/v1/").unwrap(), + gitea_repo_type: GiteaRepoType::Source, bitbucket_user: Vec::new(), bitbucket_workspace: Vec::new(), bitbucket_project: Vec::new(), diff --git a/tests/int_validation_cache.rs b/tests/int_validation_cache.rs index 3ff5ec1..28c7bda 100644 --- a/tests/int_validation_cache.rs +++ b/tests/int_validation_cache.rs @@ -12,6 +12,7 @@ use kingfisher::{ cli::{ commands::{ bitbucket::{BitbucketAuthArgs, BitbucketRepoType}, + gitea::GiteaRepoType, github::{GitCloneMode, GitHistoryMode, GitHubRepoType}, gitlab::GitLabRepoType, inputs::{ContentFilteringArgs, InputSpecifierArgs}, @@ -125,6 +126,14 @@ async fn test_validation_cache_and_depvars() -> Result<()> { gitlab_api_url: Url::parse("https://gitlab.com/").unwrap(), gitlab_repo_type: GitLabRepoType::Owner, gitlab_include_subgroups: false, + + gitea_user: Vec::new(), + gitea_organization: Vec::new(), + gitea_exclude: Vec::new(), + all_gitea_organizations: false, + gitea_api_url: Url::parse("https://gitea.com/api/v1/").unwrap(), + gitea_repo_type: GiteaRepoType::Source, + bitbucket_user: Vec::new(), bitbucket_workspace: Vec::new(), bitbucket_project: Vec::new(), diff --git a/tests/int_vulnerable_files.rs b/tests/int_vulnerable_files.rs index 3fe9aff..6141037 100644 --- a/tests/int_vulnerable_files.rs +++ b/tests/int_vulnerable_files.rs @@ -10,6 +10,7 @@ use kingfisher::{ cli::{ commands::{ bitbucket::{BitbucketAuthArgs, BitbucketRepoType}, + gitea::GiteaRepoType, github::{GitCloneMode, GitHistoryMode, GitHubRepoType}, gitlab::GitLabRepoType, inputs::{ContentFilteringArgs, InputSpecifierArgs}, @@ -69,6 +70,13 @@ impl TestContext { gitlab_repo_type: GitLabRepoType::Owner, gitlab_include_subgroups: false, + gitea_user: Vec::new(), + gitea_organization: Vec::new(), + gitea_exclude: Vec::new(), + all_gitea_organizations: false, + gitea_api_url: Url::parse("https://gitea.com/api/v1/").unwrap(), + gitea_repo_type: GiteaRepoType::Source, + bitbucket_user: Vec::new(), bitbucket_workspace: Vec::new(), bitbucket_project: Vec::new(), @@ -165,6 +173,13 @@ impl TestContext { gitlab_repo_type: GitLabRepoType::Owner, gitlab_include_subgroups: false, + gitea_user: Vec::new(), + gitea_organization: Vec::new(), + gitea_exclude: Vec::new(), + all_gitea_organizations: false, + gitea_api_url: Url::parse("https://gitea.com/api/v1/").unwrap(), + gitea_repo_type: GiteaRepoType::Source, + bitbucket_user: Vec::new(), bitbucket_workspace: Vec::new(), bitbucket_project: Vec::new(),