Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/go: automatic and partial vendoring in module mode #30240

Closed
bcmills opened this issue Feb 14, 2019 · 77 comments
Closed

proposal: cmd/go: automatic and partial vendoring in module mode #30240

bcmills opened this issue Feb 14, 2019 · 77 comments
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge modules Proposal
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Feb 14, 2019

This proposal overlaps with (and hopefully unifies) several existing issues, linked in the text below.

I'd like to implement it soon, in the 1.13 1.14 cycle, so if you have feedback please do respond quickly. 🙂

Problem summary

Users want a durable, local view of their source code that works with existing diff tools and does not require per-user configuration in cloned repositories.

  • Relying on module proxies does not necessarily satisfy delivery contracts.
  • Saved module caches do not interoperate well with version-control and code-review tools.
  • -mod=vendor requires configuration per user (GOFLAGS) or per invocation, and makes it too easy to ship code that produces a different build in vendored mode than in the normal module mode.

Proposal

Under this proposal, the source code for the packages listed in vendor/modules.txt — and the go.mod files for the modules listed in vendor/modules.txt, if any — will be drawn from the vendor directory automatically (#27227).

If a replace directive in the main module specifies a module path, the module source code will be vendored under the path that provides the replacement, not the path being replaced. That preserves the 1:1 correspondence between import paths and filesystem directories, while allowing replacement targets to alias other modules (#26904). If a replace directive specifies a file path, then either that path must be outside the vendor directory or the vendor/modules.txt file must not exist (#29169).

Package patterns such as all and example.com/... will match only the packages that are present in the vendor directory, not unvendored packages from the same module. During the build, if additional packages from the vendored modules are needed in order to satisfy an import, the source for those packages will be fetched (from the module cache, if available) and added to the vendor directory. (Packages from outside the already-vendored modules will not be vendored automatically.)

Any time the go.mod file is written, if a module path found in vendor/modules.txt has a different version than that found in the build list, the already-vendored packages and go.mod file from the previous version will be deleted, and updated versions of those packages will be written in their place (#29058). Transitive imports of those packages will be resolved, and may populate additional packages in other already-vendored modules.

If go get removes a module from the build list entirely, its package source and go.mod file will be removed, but an entry for the module (with version none) will remain in vendor/modules.txt. That way, if a future operation (such as a go get or go build) adds the module to the build list again, it will remain vendored as before.

When go mod tidy is run, it will add or remove packages from the vendor directory so that it continues to contain only the subset of packages found in the transitive import graph. It will also remove go.mod files and entries in vendor/modules.txt for modules that are no longer present in the build list.

To encourage the minimal use of vendor directories, the go mod vendor subcommand will accept an optional list of packages or modules. go mod vendor <module> will update the vendor directory to contain the go.mod file for <module> and source code for its packages that appear in the transitive import graph of the main module. (Note that, since the criterion for inclusion of a package is its existence in the import graph, vendoring in an additional module should not affect the contents of any previously-vendored modules.)

go mod vendor <pattern> for an arbitrary module pattern will add # <pattern> to vendor/modules.txt, and vendor in the go.mod files (and any packages found in the import graph) for modules matching <pattern>, adding individual comments to vendor/modules.txt for those modules.

Note in particular that go mod vendor all will copy in go.mod files for all of the module dependencies in the module graph (and add entries in vendor/modules.txt for those modules). That ensures that after go mod vendor all, go list can produce accurate results without making any further network requests (see also #19234 and #29772).

The go mod vendor subcommand will accept a new flag, -d. go mod vendor -d <pattern> will remove all previously-vendored modules matching <pattern> from the vendor directory (and from vendor/modules.txt), as well as any previously-stored patterns matching those modules (including <pattern> itself, if present).

go mod vendor, without further arguments, is equivalent to go mod vendor all. go mod vendor -d is equivalent to go mod vendor -d all. If go mod vendor -d causes vendor/modules.txt to become empty, it will also remove the entire vendor directory.


Edits

@bcmills bcmills added Proposal NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. modules labels Feb 14, 2019
@bcmills bcmills added this to the Go1.13 milestone Feb 14, 2019
@bcmills
Copy link
Contributor Author

bcmills commented Feb 14, 2019

(CC @jayconrod @rasky @JeremyLoy @theckman)

@FiloSottile
Copy link
Contributor

To encourage the minimal use of vendor directories, ...

Why are partial vendor folders something we want to encourage? Most use cases listed here and in the linked issues would require all dependencies vendored at all times.

Also, can you clarify if go get or go mod tidy would ever add new modules to the vendor folder, or if running go mod vendor would still be required after every new dependency is added in order to avoid a partial vendor folder?

@bcmills
Copy link
Contributor Author

bcmills commented Feb 14, 2019

Why are partial vendor folders something we want to encourage? Most use cases listed here and in the linked issues would require all dependencies vendored at all times.

Some dependencies are more robust than others. For example, you might trust github.com to be generally available, but want to vendor in dependencies that happen to be hosted using bzr or svn so that you don't have to install those tools on every machine that will build your module.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 14, 2019

Also, can you clarify if go get or go mod tidy would ever add new modules to the vendor folder, or if running go mod vendor would still be required after every new dependency is added in order to avoid a partial vendor folder?

go get and go mod tidy would not add dependencies to vendor automatically.

We could perhaps make go mod vendor (without arguments) set some flag in modules.txt to indicate that all additional modules should be vendored.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 14, 2019

More generally, though, the main goal of automatic vendor updates is to prevent version skew. Copying in newly-added modules does not further that goal, since there are no out-of-date contents in the first place.

@thepudds
Copy link
Contributor

thepudds commented Feb 14, 2019

I would suspect the most common use case might be vendoring 100% of dependencies?

If so, and if vendoring in 1.13 is going to be able to track updates via go get and go mod tidy in some cases, it would seem that once you have signaled you want automated tracking it likely should be the default behavior at that point to be 100% complete in any automated tracking, rather than defaulting to partial tracking? (For example, track all updates after a go mod vendor with no args, as you suggested two comments back)?

@FiloSottile
Copy link
Contributor

It definitely makes sense to support partial ones. I just suspect (and might be wrong!) that 90% of users opting into vendoring really mean to vendor everything, and a reasonable chunk of that 90% would be surprised by it behaving otherwise.

@ianthehat
Copy link

In the presence of a reliable proxy, I can't think of any reasonable cases for a partial vendor directory, and lots of possible confusion.
I would personally argue we go in the other direction, as in if you try to build in vendor mode it is not allowed to see anything outside the current module (except the stdlib)

@bcmills
Copy link
Contributor Author

bcmills commented Feb 14, 2019

@ianthehat, one use-cases for vendoring, given proxies, is to vendor in private code for which the proxies do not have access.

For example, a contract-based startup might want to vendor in their proprietary utility modules before delivering the code to their customers.

@thepudds
Copy link
Contributor

@bcmills Could you comment on the interplay with -mod=readonly, and/or options to disable automatic downloads for people who would prefer to fail if vendor is missing something?

@ianthehat
Copy link

@bcmills you can achieve the same effect by copying to an internal package and rewriting the import paths, which would be more honest, and also allow for local modifications (something else that those kinds of contractors often need to do as well). If you don't want to rewrite the import paths, you could check it in as a sub-module and use a replace directive (you probably have full control of the main go.mods for that kind of work)
Or you could add a directory with the zip and mod files and use it as a file proxy (which is something it might be worth looking into as a better version of vendoring)
I don't think making the normal use much worse for such extreme edge cases would be the wrong choice.

@lopezator
Copy link

I like removing complexity, flags, and per user (or per project) configuration when using vendored mode. I think that automatic detection of vendor folder (and assume you are in vendored mode) when a vendor folder is present it's a great idea.

I sometimes mix vendored and non-vendored projects, and switching between would be great to be as transparent as possible.

I agree though with the opinion of some of the folks above, IMHO supporting partial vendoring would be confusing and it will add complexity.

For example, in our usecase, we are using non-vendored mode for our main projects, adding a GOPROXY for public libraries, but don't want to cache our private libraries there (for security, and because cache server and source server are on the same local network, it just doesn't add any benefit for us). #26334 would be enough for this.

Vendored mode, in the other hand, it's great to distribute self-contained/small apps/tools.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 15, 2019

@ianthehat

I would personally argue we go in the other direction, as in if you try to build in vendor mode it is not allowed to see anything outside the current module (except the stdlib)

Part of the point of this proposal is to avoid the need for a distinct “vendor mode”. Modules are integrated into the normal go workflow, and if we're serious about supporting vendoring, then I would argue that vendoring should be integrated too.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 15, 2019

Or you could add a directory with the zip and mod files and use it as a file proxy (which is something it might be worth looking into as a better version of vendoring)

We've considered that, but it really doesn't work well with version control systems: the diffs are incomprehensible and the blobs can end up consuming a lot more space than they ought to (depending on the encoding).

@bcmills
Copy link
Contributor Author

bcmills commented Feb 15, 2019

Re partial vendoring: given module proxies, the major use-case for vendoring is for modules that are not available via the public proxies. (Recall that the word “vendor” literally means “one who sells”.)

Module mode substantially reduces the need to duplicate code: you no longer have to copy all of your dependencies into your own repository, and you especially don't need to do that for stable, publicly-available, open-source dependencies. It is important to me that we make it easy to duplicate the minimum amount of code necessary for each use-case: minimal duplication shouldn't be an “extreme edge [case]”, it should be the default mode of operation.

It's not realistic to expect folks to manually apply replace directives for partial vendoring, or to rewrite import paths. It's certainly possible, but it's extremely tedious (see #30241 and #27542). It isn't, and shouldn't be, a default mode of operation. If that were the only alternative to vendoring the full tree, folks wouldn't do it: instead, they'll fall back to duplicating all of the dependencies all over again.

The point of vendoring in module mode is not to provide an alternative to using modules. It is to provide a complementary feature set for the cases that modules cannot address well: namely, the distribution of proprietary code.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 15, 2019

That said, let's think about that sticky-pattern problem. I don't buy the “full vendoring as a default” argument, but there is a more general case that really ought to work.

Suppose that I run go mod vendor golang.org/x/.... I should reasonably expect any further dependencies matching golang.org/x/... to be vendored.

If we support that, then we can view go mod vendor without arguments as equivalent to go mod vendor all, and that will provide sticky full-vendoring.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 15, 2019

So how about this alternative. For a given module pattern,

  • go mod vendor <pattern>
    • adds # <pattern> to vendor/modules.txt, and
    • vendors in the go.mod files (and any packages found in the import graph) for modules matching <pattern>, adding individual comments to vendor/modules.txt for those modules.
  • go mod vendor -d <pattern> removes from vendor/modules.txt:
    • <pattern> itself, if present;
    • all modules matching <pattern>;
    • and finally, all further patterns that match the removed modules.

And then go mod vendor is defined to be equivalent to go mod vendor all.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 15, 2019

@thepudds

Could you comment on the interplay with -mod=readonly, and/or options to disable automatic downloads for people who would prefer to fail if vendor is missing something?

Under this proposal, -mod=readonly would continue to disable updates to the go.mod file, but any imports already listed in vendor/modules.txt that are found during a go build would be copied into the vendor directory.

-mod=vendor would continue to exist, and would mean “do not resolve imports that are not found in either GOROOT or vendor”. However, since we would now vendor in go.mod files as well, go -mod=vendor would produce more accurate results from subcommands like list, mod why, and mod graph that examine the structure of the module graph.

@rasky
Copy link
Member

rasky commented Feb 15, 2019

So how about this alternative. For a given module pattern,

  • go mod vendor <pattern>

    • adds # <pattern> to vendor/modules.txt, and
    • vendors in the go.mod files (and any packages found in the import graph) for modules matching <pattern>, adding individual comments to vendor/modules.txt for those modules.
  • go mod vendor -d <pattern> removes from vendor/modules.txt:

    • <pattern> itself, if present;
    • all modules matching <pattern>;
    • and finally, all further patterns that match the removed modules.
      And then go mod vendor is defined to be equivalent to go mod vendor all.

I think this works very well for me, thanks. I couldn't reason through all the cases you listed in your original post (I'll try to go through them over the weekend), but surely this command line API looks good and the sticky mode is really good.

Is there really a need to introduce a third metadata file (vendor/modules.txt), after go.mod and go.sum? Did you think of adding a vendor command to go.mod?

@thepudds
Copy link
Contributor

thepudds commented Feb 15, 2019

@bcmills In addition to the proposed new behavior described above, is the thinking that this would also land in 1.13:

If so, under the latest proposal, is this an example of what a module author could do if they want to fail if vendor is incomplete:

  1. in the author's own builds or in their CI, they could run with -mod=vendor to fail if vendor is incomplete
  2. for consumers, the author does not have control over what consumers do (and relying on a README stating "please set -mod=vendor" is not a desired solution). However, if the author runs go mod vendor (no args), that provides a complete vendor directory on an on-going basis based on the proposed automatic tracking behavior, and in addition the author could run go mod verify -vendor (or go mod vendor -verify or whatever incantation) to verify that vendor is both correct and complete? And if go mod verify -vendor is successful (say, prior to releasing a new version of a module), the author would have confidence that a consumer would never automatically download new code to populate vendor (even if the consumer is not running with -mod=vendor or -mod=readonly)?

@bcmills
Copy link
Contributor Author

bcmills commented Feb 15, 2019

Is there really a need to introduce a third metadata file (vendor/modules.txt), after go.mod and go.sum? Did you think of adding a vendor command to go.mod?

I hadn't really considered it: I think @rsc added vendor/modules.txt in 1.11, and given that it's already there I figured we could keep using it.

I suppose that we could record the patterns in go.mod instead, but I have a mild aesthetic preference for keeping them in modules.txt. I'm certainly open to arguments to the contrary, though. 🙂

@bcmills
Copy link
Contributor Author

bcmills commented Feb 15, 2019

Updated the proposal to incorporate sticky patterns (#30240 (comment)).

@ianthehat
Copy link

We've considered that, but it really doesn't work well with version control systems: the diffs are incomprehensible and the blobs can end up consuming a lot more space than they ought to (depending on the encoding).

If your use case is because the code cannot live in a public proxy, why do you care about the diff, you would not see the diff if it was in the public proxy. It's also trivial to fix, use a non compressed text archive. This also fixes the space issue.

@ianthehat
Copy link

Part of the point of this proposal is to avoid the need for a distinct “vendor mode”. Modules are integrated into the normal go workflow, and if we're serious about supporting vendoring, then I would argue that vendoring should be integrated too.

I think we ought to start by enumerating the actual problems we are hoping to solve with vendoring, and checking it is the right solution to those problems. Vendoring comes with a lot of serious problems, it needs to be worth the cost.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 15, 2019

If your use case is because the code cannot live in a public proxy, why do you care about the diff, you would not see the diff if it was in the public proxy.

For a start, if you're vendoring the code because it is proprietary, you want to be sure that you are shipping only what was actually promised to the customer.

(In contrast, if the module is already publicly available, you probably don't care which parts you're re-publishing in your vendor directory.)

It's also trivial to fix, use a non compressed text archive.

That is essentially what the vendor directory is: it just happens to be text archive format that can also be consumed by pre-module versions of the go command.

@bcmills bcmills modified the milestones: Go1.13, Go1.14 Apr 16, 2019
@roblillack
Copy link

roblillack commented Apr 26, 2019

@bcmills: What do you think about an alternative solution, where you'd flip a switch in go.mod to turn on "auto vendor" mode. When a module is in "auto vendor" mode, the following things would happen:

  • All changes to the dependencies (go get ...) would automatically be vendored to /vendor/...
  • All respective commands (go build/go run/go test) would always run with -mod=vendor

I feel like this would be the most sane solution for me, and pretty much comparable to dep or glide workflow which worked a treat for us for a long time.

Edit:
To be more specific about my comment above: I'd prefer it, if having a /vendor directory would be sufficient enough to signal the Go tools that I want 100% of my dependencies vendored all the time and that all tools should run in -mod=vendor mode. But I understand, that this approach is not really something the Go team considers, so maybe having a setting in go.mod is.

@selslack
Copy link

selslack commented May 8, 2019

@bcmills I want to add a recent story of how the proper vendoring saved us a lots of time: https://success.docker.com/article/docker-hub-user-notification.

As a part of security review after receiving this notification -- we performed an audit of all the dependencies in Java, NPM, etc.

Auditing our Go code took exactly 0 seconds, because we have all the dependencies committed and we don't go online during build process at all.

@MOZGIII
Copy link

MOZGIII commented May 15, 2019

@bcmills: What do you think about an alternative solution, where you'd flip a switch in go.mod to turn on "auto vendor" mode. When a module is in "auto vendor" mode, the following things would happen:

  • All changes to the dependencies (go get ...) would automatically be vendored to /vendor/...
  • All respective commands (go build/go run/go test) would always run with -mod=vendor

I feel like this would be the most sane solution for me, and pretty much comparable to dep or glide workflow which worked a treat for us for a long time.

Edit:
To be more specific about my comment above: I'd prefer it, if having a /vendor directory would be sufficient enough to signal the Go tools that I want 100% of my dependencies vendored all the time and that all tools should run in -mod=vendor mode. But I understand, that this approach is not really something the Go team considers, so maybe having a setting in go.mod is.

I agree.

I think we should consider actually removing -mod=vendor flag, and moving it to a per-project configuration of some sort.
With that flag as it currently is, we're not going to solve the problem that is outlined as a part of this issue's original posting: configuration per user via GOFLAGS. It would still be required for some projects as long as -mod=vendor has the meaning that is actually depending on what are you trying to do in a project (current task) rather than on what project you're doing it (global task).
To be more specific, I'd still want to have -mod=vendor enabled all the time to force go to never try to load anything from the network without explicit go mod vendor invocation in some of the projects I'm working on.

@myitcv
Copy link
Member

myitcv commented Jul 31, 2019

Following a good discussion at GopherCon with @ChrisHines, we concluded that a key reason for needing vendor today (in his situation at least) is touched on by the second bullet point in @bcmills' description:

Saved module caches do not interoperate well with version-control and code-review tools.

Put another way: vendor is used because there isn't a better alternative to reviewing dependency changes alongside (and as part of the same process as) changes to one's own code.

We further concluded that all other aspects of his requirements (including reproducible builds, self-contained CI runs etc) could be satisfied by alternative means, not least, for example, an approach similar to #27618. None of these alternatives are currently as polished/easy as the vendor workflow, but they would do the job (and could become more polished).

Back to the point on reviewing dependency changes. This point is obviously critical. Not only for those people who prefer the vendor flow because they can easily solve this problem as part of their existing work flow, but for everyone who uses Go modules. We can't, today, point to a tool that helps us achieve this.

That said, to avoid the problems of only having parts of modules "vendored", I think this points towards a solution where entire modules are "vendored" with a directory structure similar (identical?) to that found under $GOPATH/pkg/mod. Whether it's all modules or some I defer to others. This keeps modules intact (important in keeping the solution simple for tools etc) and retains the current benefits of being able to review dependency changes alongside one's own changes. Whether this is achieved by implicit replace directives or any other means, I again defer.

Apologies if I'm late to the party on all of this: I just wanted to stress/highlight that this point on code review has a life well beyond this issue that is of much wider interest.

@myitcv
Copy link
Member

myitcv commented Aug 5, 2019

On that back of my last comment I've just raised #33466

@bcmills bcmills changed the title proposal: cmd/go: automatic vendoring in module mode proposal: cmd/go: automatic and partial vendoring in module mode Aug 8, 2019
@myitcv
Copy link
Member

myitcv commented Aug 12, 2019

I think this points towards a solution where entire modules are "vendored" with a directory structure similar (identical?) to that found under $GOPATH/pkg/mod

Just to slightly row back on this point: whether we need the entire module to be "vendored" is actually something I defer to Bryan (and Ian) on. If cmd/go can make things work in a partial way then I don't actually see a reason to "vendor" the entire module (indeed, given the code review point there is good reason not to). Because go/packages et al will just "work" because cmd/go "works"

I previously wrote "entire module" because I read (perhaps incorrectly) that this had become necessary. But Bryan/Ian are the authorities on that point, so restating that point for clarity.

@gopherbot gopherbot removed the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Aug 16, 2019
@bcmills
Copy link
Contributor Author

bcmills commented Aug 26, 2019

I am withdrawing this proposal in favor of #33848. My reasoning is as follows:

Should vendored dependencies be updated automatically?

Here I have proposed that go commands should update and/or add to the contents of the vendor directory automatically.

However, in the time since then, we have observed that users are often confused by the implicitness of updates to the go.mod file. Given that, we probably should not overwrite existing contents in the vendor directory without explicit user intervention — that style of automatic vendoring would add yet another layer of substantial changes driven by the same implicit mechanism, and while diffing and reverting changes in the go.mod file is relatively easy, diffing and reverting unexpected changes in the vendor directory is not.

I now believe that we should not make such updates automatically.

Should we allow vendoring of only a subset of packages?

Here I have proposed that go mod vendor should accept patterns to allow users to vendor only a subset of modules.

I still think that's a good idea in concept, particularly for repositories that contain multiple interdependent modules and replace directives, but it adds enough complexity that it should be considered separately from — and presumably after — changes to automatically use and/or maintain the vendor directory.

@artemgavrilov
Copy link

It would great to vendor a single module. We have a library that has a directory with .yaml files(openAPI types common for multiple services). These files are used by other projects, they include these files in their own API specifications. Now we vendor all dependencies, and can invoke a command from makefile that generates go code from service api spec and types from lib (can reference them ./vendor/someRepo/file.yaml )

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge modules Proposal
Projects
None yet
Development

No branches or pull requests