Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan GitHub and GitLab refs that aren't cloned by default #1918

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rgmz
Copy link
Contributor

@rgmz rgmz commented Oct 18, 2023

Description:

This fixes #1588.

In my experience, this find significantly more secrets with a negligible performance impact.

The only issue is that these secrets are technically not a part of the repository, so refactoring may be necessary to indicate that a result comes from a historical PR/MR branch. It now outputs the source pull/merge request (based on git log --source), in case the commit only exists in the PR history and not the actual repo history, which can happen when PRs are squashed.

✅ Found verified result 🐷🔑
Detector Type: AWS
Decoder Type: PLAIN
Raw result: AKIAXYZDQCEN4B6JSJQI
Resource_type: Access key
Account: 534261010715
Is_canary: true
Message: This is an AWS canary token generated at canarytokens.org, and was not set off; learn more here: https://trufflesecurity.com/canaries
Arn: arn:aws:iam::534261010715:user/canarytokens.com@@88uc3ciwodujg5f18inco2yvu
Pull Request: 2387
Commit: e47780e2e4d2dbaa3d1e63bdfe1cf00eb2c5681b
Email: Ahrav Dutta <ahrav.dutta@trufflesec.com>
File: pkg/gitparse/gitparse_test.go
Line: 715
Link: https://github.com/trufflesecurity/trufflehog/blob/e47780e2e4d2dbaa3d1e63bdfe1cf00eb2c5681b/pkg/gitparse/gitparse_test.go#L715
Repository: https://github.com/trufflesecurity/trufflehog.git
Timestamp: 2024-02-05 17:37:58 +0000

image

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@rgmz rgmz requested a review from a team as a code owner October 18, 2023 21:27
@rgmz rgmz force-pushed the feat/additional-refspecs branch 4 times, most recently from f64e0a3 to e2fb273 Compare October 23, 2023 21:59
@rgmz rgmz force-pushed the feat/additional-refspecs branch 2 times, most recently from b2e724c to ec2de50 Compare October 30, 2023 00:06
@rgmz rgmz force-pushed the feat/additional-refspecs branch from ec2de50 to 438418c Compare January 27, 2024 17:05
@rgmz rgmz force-pushed the feat/additional-refspecs branch from 438418c to 7cb8af2 Compare April 9, 2024 15:39
@rgmz rgmz marked this pull request as draft April 12, 2024 11:25
@rgmz rgmz force-pushed the feat/additional-refspecs branch from 7cb8af2 to 92c0f83 Compare April 12, 2024 12:53
@CLAassistant
Copy link

CLAassistant commented Apr 12, 2024

CLA assistant check
All committers have signed the CLA.

@rgmz rgmz force-pushed the feat/additional-refspecs branch 10 times, most recently from 956b38d to c9a7acd Compare April 13, 2024 22:44
@rgmz rgmz marked this pull request as ready for review April 13, 2024 22:44
@rgmz rgmz requested a review from a team as a code owner April 13, 2024 22:44
pkg/output/plain.go Outdated Show resolved Hide resolved
@rgmz rgmz force-pushed the feat/additional-refspecs branch from c9a7acd to 9ac8dbe Compare April 18, 2024 02:32
Copy link
Collaborator

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Just to make sure I understand: This PR has two discrete changes, right? (Pulling down all the refs and printing the source ref of found secrets.)

pkg/sources/git/git.go Outdated Show resolved Hide resolved
pkg/output/plain.go Outdated Show resolved Hide resolved
@rgmz rgmz mentioned this pull request Apr 23, 2024
2 tasks
@rgmz rgmz force-pushed the feat/additional-refspecs branch from 9ac8dbe to d526837 Compare May 2, 2024 12:55
@zricethezav
Copy link
Collaborator

@rgmz @bplaxco alright after banging my head against this git source I think it's better if we use "remote.origin.fetch=+refs/:refs/remotes/origin/". Introducing --mirror has a large blast radius. We should just be able to append -c remote.origin.fetch=+refs/*:refs/remotes/origin/* to the clone command and call it a day.

@zricethezav zricethezav mentioned this pull request Jun 18, 2024
2 tasks
@rgmz rgmz force-pushed the feat/additional-refspecs branch 4 times, most recently from 983d41e to f057784 Compare June 20, 2024 15:06
@rgmz
Copy link
Contributor Author

rgmz commented Jun 20, 2024

@rgmz is it theoretically possible to split this work up into two separate changes: One that clones using --mirror and one that reports the provenance of detected secrets? I'm trying to think of ways to minimize the change risk.

I've rebased this onto #2988. This now only has changes related to reporting ref provenance.

@rgmz rgmz force-pushed the feat/additional-refspecs branch from f057784 to 87956c7 Compare June 21, 2024 02:52
@rgmz rgmz force-pushed the feat/additional-refspecs branch 2 times, most recently from 005d5e1 to 805f5dc Compare July 1, 2024 18:37
@rgmz rgmz force-pushed the feat/additional-refspecs branch from 805f5dc to 9d902a0 Compare November 3, 2024 14:44
@rgmz rgmz requested a review from a team as a code owner November 3, 2024 14:44
@rgmz rgmz force-pushed the feat/additional-refspecs branch 3 times, most recently from 8d64793 to 139ebe2 Compare November 11, 2024 19:21
@rgmz rgmz force-pushed the feat/additional-refspecs branch from 139ebe2 to 83eee14 Compare December 2, 2024 14:03
@rgmz rgmz requested a review from a team as a code owner December 2, 2024 14:03
@rgmz rgmz force-pushed the feat/additional-refspecs branch from 83eee14 to 74653a7 Compare December 15, 2024 16:48
@rgmz rgmz force-pushed the feat/additional-refspecs branch from 74653a7 to bef8f77 Compare December 31, 2024 15:19
Copy link
Collaborator

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty neat - I'm only requesting changes because of the fact that the PR adds the expanded refs functionality unilaterally, no matter whether it's enabled. (I suspect this is a rebase artifact.)

proto/source_metadata.proto Show resolved Hide resolved
Comment on lines +1446 to +1448
Email: sanitizer.UTF8(comment.GetUser().GetEmail()),
Timestamp: sanitizer.UTF8(comment.GetCreatedAt().String()),
Link: sanitizer.UTF8(comment.GetHTMLURL()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask why you reordered these fields everywhere, or, alternatively, how the new order was picked? This isn't blocking or anything but it does add review noise so I'm curious about whether there's a reason for it I'm not picking up on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while; IIRC, I ran into issues caused by SourceMetadataFunc being a long list of strings with an arbitrary order. I tried to order the fields semantically (based on git log) so that it was easier to grok at a glance and compare uses of SourceMetadataFunc like-for-like.

commit 8d2c7e6760b9bdc4fe36fee11cb3f28d7e469203 (HEAD -> feat/additional-refspecs)
Author: Richard Gomez <32133502+rgmz@users.noreply.github.com>
Date:   Fri Apr 12 08:53:18 2024 -0400

    feat(gitparse): track ref sources

diff --git a/hack/snifftest/main.go b/hack/snifftest/main.go
...

pkg/sources/git/git.go Outdated Show resolved Hide resolved
pkg/sources/git/git.go Outdated Show resolved Hide resolved
@@ -231,19 +233,23 @@ func (c *Parser) RepoPath(
) (chan *Diff, error) {
args := []string{
"-C", source,
"--no-replace-objects",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's from this hidden comment thread. GitHub's PR UI is a nightmare.

pkg/gitparse/gitparse.go Outdated Show resolved Hide resolved
return "Merge request #" + string(mrNumber)
}

return fmt.Sprintf("%s (hidden ref)", string(ref))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "hidden" just mean "not visible via the respective web UI?" My small concern here is that this also happens if we simply fail to parse the relevant ref because we made a mistake somewhere. What do you think of just returning the ref as-is in that case - would it be confusing?

Copy link
Contributor Author

@rgmz rgmz Jan 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "hidden" just mean "not visible via the respective web UI?"

It means "not visible with default Git behaviour"; sometimes these refs are only discoverable via the web UI or API. The term is inspired by this blog post.

The Git ref namespace (refs/*) can store almost any path, however, Git and Git-adjacent tools only look at refs/heads/* and refs/tags/*. Other refs (if they exist) aren't fetched unless you explicitly request them from a remote.

Two examples of this are:

  1. Deleted GitHub/GitLab pull request branches being stored under refs/pull/* and refs/merge-requests/*
  2. Deleted commits that are manually fetched

My small concern here is that this also happens if we simply fail to parse the relevant ref because we made a mistake somewhere. What do you think of just returning the ref as-is in that case - would it be confusing?

I think it's worth signaling that these refs are special and don't technically exist in the repository. Returning the ref as-is is already a source of confusion (#3493).

@rgmz rgmz force-pushed the feat/additional-refspecs branch 3 times, most recently from 33d70b7 to d7953e9 Compare January 1, 2025 17:13
@rgmz rgmz force-pushed the feat/additional-refspecs branch from d7953e9 to 4176d48 Compare January 11, 2025 18:57
@rgmz rgmz force-pushed the feat/additional-refspecs branch 2 times, most recently from d8a905d to 00340e1 Compare January 27, 2025 14:11
@rgmz rgmz force-pushed the feat/additional-refspecs branch from 00340e1 to 5e5db88 Compare January 29, 2025 02:23
rgmz added 2 commits February 2, 2025 12:31
'Hidden' refs, such as 'refs/pull/1004/head' may cause confusion if reported upon. GitHub, for example, will display a banner saying that the commit doesn't belong to the repository.
This parse the output of 'git log --source' and converts it to a human-readable format, IF the ref is 'hidden'.
@rgmz rgmz force-pushed the feat/additional-refspecs branch from 5e5db88 to 098735c Compare February 2, 2025 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Scan GitHub and GitLab refs that aren't pulled by default
6 participants