The purpose of src-fingerprint
is to provide an easy way to extract git related information (namely all file shas of a repository) from your hosted source version control system.
This util's main command is the collect
command used to collect source code fingerprints from a version control system or a local repository. It supports 3 main VCS:
- GitHub and GitHub Enterprise
- Gitlab CE and EE
- Bitbucket
If you're using Homebrew you can add GitGuardian's tap and then install src-fingerprint. Just run the following commands:
brew tap gitguardian/tap
brew install src-fingerprint
Deb and RPM packages are available on Cloudsmith.
Setup instructions:
Open a PowerShell prompt and run this command:
iwr -useb https://raw.githubusercontent.com/GitGuardian/src-fingerprint/main/scripts/windows-installer.ps1 | iex
The script asks for the installation directory. To install silently, use these commands instead:
iwr -useb https://raw.githubusercontent.com/GitGuardian/src-fingerprint/main/scripts/windows-installer.ps1 -Outfile install.ps1
.\install.ps1 C:\Destination\Dir
rm install.ps1
Note that src-fingerprint
requires Unix commands such as bash
to be available, so it runs better from a "Git Bash" prompt.
You can also download the archives directly from the releases page.
You need go
installed and GOBIN
in your PATH
. Once that is done, run the command:
$ go get -u github.com/gitguardian/src-fingerprint/cmd/src-fingerprint
- Click on your profile picture at the top right of the screen. A dropdown menu will appear and you will be able to access your personal settings by clicking on Settings.
- On your profile, go to Developer Settings.
- Select Personal Access Tokens.
- Click on
Generate a new token
. - Click the
repo
box. This is the only scope we need. - Click on
Generate token
. The token will only be available at this time so make sure you keep it in a safe place.
- Click on your profile picture at the top right of the screen. A dropdown menu will appear and you will be able to access your personal settings by clicking on Preferences.
- In the left sidebar, click on
Access Tokens
. - Click the
read_api
box. This is the only scope we need. You can set an end-date for the token validity if you want more security. - Click on
Create personal token
. The token will only be available at this time so make sure you keep it in a safe place.
The output format can be chosen between jsonl
, json
, gzip-jsonl
and gzip-json
with the option --export-format
.
The default format is gzip-jsonl
to minimize the size of the output file.
The default output filepath is ./fingerprints.jsonl.gz
. Use --output
to override this behavior.
Also, note that if you were to download fingerprints for repositories of a big organization, src-fingerprint
has a limit to process no more than 100
repositories. You can override this limit with the option --limit
, a limit of 0 will process all repos of the organization.
Note that if multiple organizations are passed, the limit is applied to each one independently.
There is no default timeout, it can be set with the option --timeout
. Similarly to the limit, it is applied to each source independently.
Here is an example of some lines of a .jsonl
format output:
{"repository_name":"src-fingerprint","private":false,"sha":"a0c16efce5e767f04ba0c6988d121147099a17df","type":"blob","filepath":".env.example","size":"31"}
{"repository_name":"src-fingerprint","private":false,"sha":"d425eb0f8af66203dbeef50c921ea5bff0f2acba","type":"blob","filepath":".github/workflows/tag.yml","size":"882"}
{"repository_name":"src-fingerprint","private":false,"sha":"c7f341033d78474b125dd56d8adaa3f0fc47faf2","type":"blob","filepath":".github/workflows/test.yml","size":"899"}
{"repository_name":"src-fingerprint","private":false,"sha":"f4409d88950abd4585d8938571864726533a7fa5","type":"blob","filepath":".gitignore","size":"356"}
{"repository_name":"src-fingerprint","private":false,"sha":"f733f951ace2e032c270d2f3cf79c2efb8187b5b","type":"blob","filepath":".gitlab-ci.yml","size":"85"}
{"repository_name":"src-fingerprint","private":false,"sha":"d17ae66a017477bc65a2f433bf23d551ffc6bd75","type":"blob","filepath":".golangci.yml","size":"1196"}
{"repository_name":"src-fingerprint","private":false,"sha":"ee08a617cfb1c63c1c55fa4cb15e8bac0095346f","type":"blob","filepath":".goreleaser.yml","size":"2127"}
Note that by default, src-fingerprint
will exclude forked repositories from the fingerprints computation. For GitHub provider archived repositories and public repositories will also be excluded by default. Use flags --include-forked-repos
, --include-archived-repos
or include-public-repos
to change this behavior.
For all the following examples, we assume that the user is able to clone repositories using an HTTP URL with basic authentication. If for any reason this is not possible with the user's organization, src-fingerprint
supports ssh cloning by using the dedicated option --ssh-cloning
. Note though that this option is not the standard configuration of the tool but rather a workaround for this type of edge case. Especially, this option may bring some issues in the event of discrepancies in permissions between the token provided for API-based repos listing, and the SSH keys used to clone these repos.
- Export all fingerprints from private repositories from GitHub Orgs to the default path
./fingerprints.jsonl.gz
with logs:
env VCS_TOKEN="<token>" src-fingerprint -v collect --provider github --object ORG_1_NAME --object ORG_2_NAME
- Export all fingerprints of every repository the user can access to the default path
./fingerprints.jsonl.gz
:
env VCS_TOKEN="<token>" src-fingerprint -v collect --provider github --include-public-repos --include-forked-repos --include-archived-repos
- Export all fingerprints from private repositories of a GitLab group to the default path
./fingerprints.jsonl.gz
with logs:
Note : If you are targeting a self-hosted GitLab instance, use the--provider-url
to specify its url, don't forget to include the scheme.
env VCS_TOKEN="<token>" src-fingerprint -v collect --provider gitlab --object "GitGuardian-dev-group"
- Export all fingerprints of every project the user can access to the default path
./fingerprints.jsonl.gz
with logs:
env VCS_TOKEN="<token>" src-fingerprint -v collect --provider gitlab --include-forked-repos
- Export all fingerprints from a Bitbucket project with private repository to the default path
./fingerprints.jsonl.gz
with logs:
Note : If you are targeting a self-hosted BitBucket instance, use the--provider-url
to specify its url, don't forget to include the scheme.
env VCS_TOKEN="<token>" src-fingerprint -v collect --provider bitbucket --object "GitGuardian Project"
- Export all fingerprints of every repository the user can access to the default path
./fingerprints.jsonl.gz
with logs:
env VCS_TOKEN="<token>" src-fingerprint -v collect --provider bitbucket
Allows the processing of a single repository given a git clone URL
- ssh cloning
src-fingerprint collect -p repository -u '[email protected]:GitGuardian/gg-shield.git'
- http cloning with basic authentication
src-fingerprint collect -p repository -u 'https://user:[email protected]/GitGuardian/gg-shield.git'
- http cloning without basic authentication
src-fingerprint collect -p repository -u 'https://github.com/GitGuardian/gg-shield.git'
- repository in multiple local directories
src-fingerprint collect -p repository -u /projects/gitlab/src-fingerprint -u /projects/gitlab/internal-api
- repository in current directory
src-fingerprint collect -p repository -u .
src-fingerprint
will by default process each object (--object
/-u
) one by one. When an object (ie: a GitHub Organization)
contains multiple repositories, they are processed in parallel by multiple cloners, the number of cloners is configurable
with --cloners
. Adding more cloners will increase the memory usage of src-fingerprint
. When extracting fingerprints
from multiple sources (e.g. with multiple --object values), you can use the option --pool
to configure the number of
workers that will process the objects in parallel. Each worker will have --cloners
cloners. Be cautious when increasing
both --cloners
and --pool
, the memory usage may increase drastically.
GitGuardian src-fingerprint
is MIT licensed.