Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dependency to go-git #100

Closed
nscuro opened this issue Nov 21, 2021 · 9 comments
Closed

Remove dependency to go-git #100

nscuro opened this issue Nov 21, 2021 · 9 comments

Comments

@nscuro
Copy link
Member

nscuro commented Nov 21, 2021

Just use raw git commands instead. go-git introduces a lot of dependencies we don't otherwise use. Our usage of Git functions is also very shallow, nothing that would justify import such a powerful lib.

@developer-guy
Copy link
Contributor

hello @nscuro, I can take care of this one, would you mind assigning it to me? I'm willing to work on it 🙋🏻‍♂️

@nscuro
Copy link
Member Author

nscuro commented Jan 6, 2022

@developer-guy Sure, thank you!

@imjasonh
Copy link

Is this necessary to support go mod download in cases where the license should be detected? Or is it needed for anything else?

If not, it might be worth moving license detection code entirely into a separate module in this repo, so that users who want to depend on the library don't have to take a dependency on go-git, or any license detection code.

I'd be happy to help work out how that package should be arranged, I'm working on consuming this in ko in ko-build/ko#573, and it ballooned our binary size quite a lot 🎈

Opting out of these dependencies could also be toggled behind a build tag, which would reduce binary size. But Go modules would still download the code, and vulnerability scanners etc would be needlessly triggered for unused deps.

@nscuro
Copy link
Member Author

nscuro commented Jan 27, 2022

Is this necessary to support go mod download in cases where the license should be detected? Or is it needed for anything else?

go-git is currently only used for version detection.

I'm working on consuming this in ko in ko-build/ko#573, and it ballooned our binary size quite a lot 🎈

Yeah... While go-git definitely plays a role in this, the majority of the size comes from go-license-detector, which includes more than 4mb of binary license data. The easiest way to drop this would be to find a lighter substitute for go-license-detector, but I'm open to other tweaks as well.

@imjasonh
Copy link

One idea that might be worth investigating: instead of having WithLicenses(bool), take WithLicenses(Licenser), with a minimal Licenser interface and the default being some no-op licenser. Then, you can provider another Licenser implementation in a separate Go module in the cyclonedx-gomod repo that actually does license detection, go-git, etc.

Users who don't want to detect licenses don't have to import your heavyweight licenser implementation in its separate module, and users who do, can. It also opens up the possibility of other licenser implementations, inside or outside your repos.

@nscuro
Copy link
Member Author

nscuro commented Jan 28, 2022

@imjasonh Interesting idea for sure! I like the modularity aspect of it, but managing multiple modules could become a little cumbersome. Especially since license detection an essential feature for the app and mod modes of cdx-gomod.

Maybe I can combine this suggestion with one of your earlier ideas:

Opting out of these dependencies could also be toggled behind a build tag

We can still introduce WithLicenses(Licenser), but instead of having Licenser implementations in separate modules, we could have them in a separate package. As long as that package is not importet by you, it won't end up in your binary. Build tags are kinda hard to keep track of, especially in libraries.

That being said, this will still hold true:

Go modules would still download the code

However, this not so much:

and vulnerability scanners etc would be needlessly triggered for unused deps.

If you do the vuln scanning based on what your app actually uses (e.g. eventually go audit, or if you do the scanning based on SBOMs 😎), this is a non-issue.

Sounds acceptable to you? I'll try to do some experiments this weekend.
I think a solution to this should definitely be included in the upcoming release.

@imjasonh
Copy link

Yeah I completely understand the headaches of managing multiple modules inside the same repo. We jump through quite a lot of hoops in go-containerregistry to isolate our main packages and CLI from heavy dependencies like k8s.io/client-go, cred helpers, etc., and there's always more to do. It's a pain.

If the build tag approach bears fruit that'd be great to see. I've been hacking a bit this morning on the multiple module approach just to get an idea for it, and it's pretty gnarly. You'd either have to have:

  1. three modules, library, cli, licenses, where cli -> licenses, or
  2. two modules, library and cli+licenses

In either case it means a pretty drastic overhaul of the package layout, and I don't know exactly which packages you have users for today that might be broken by changes. Of the two, option 2 sounds easier since the library is already new code since the last release, so there shouldn't be any breakage-sensitive users to worry about.

(btw, thank you so much for using internal packages for stuff, that makes everything at least a little easier 🙏 )

However, this not so much:
...
If you do the vuln scanning based on what your app actually uses...

Yeah, ideally scanners would be smart enough, but I've seen some pretty noisy false positives coming from dependabot especially, which is mostly what I'm worried about. 😕

@nscuro nscuro added this to the v1.2.0 milestone Jan 29, 2022
@nscuro
Copy link
Member Author

nscuro commented Feb 5, 2022

Another thing to consider here is that github.com/go-enry/go-license-detector/v4 depends on go-git as well. This means that even if we remove the direct dependency to go-git, it will still be pulled in by all projects that use our local license detector.

I've also been reflecting on @samj1912's comment (#112 (comment)) about limiting external dependencies being a good thing (or even requirement) in some cases. His argument also feeds into anchore/syft#761 (comment).

In addition, there are still interesting CDX uses cases like pedigree that I'd like to experiment with. For example, when using replace with forks of modules, it may be possible to compare and record the differences between original and fork using git. Having a library like go-git for this instead of having to call the git command for everything would be desirable here.

In short, I'm thinking about removing this from the v1.2.0 milestone, and either including it in a later version or dropping this altogether. However, as always, I'm open to y'all's input.

@nscuro nscuro modified the milestones: v1.2.0, v1.3.0 Feb 11, 2022
@nscuro
Copy link
Member Author

nscuro commented May 10, 2022

Closing. We're going to keep using the library over shelling out to git for now. Thanks @developer-guy though for offering help!

@nscuro nscuro closed this as completed May 10, 2022
@nscuro nscuro removed this from the v1.3.0 milestone May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants