Skip to content
This repository has been archived by the owner on Sep 9, 2020. It is now read-only.

Absorb prune into ensure #944

Closed
1 task
sdboyer opened this issue Aug 4, 2017 · 36 comments
Closed
1 task

Absorb prune into ensure #944

sdboyer opened this issue Aug 4, 2017 · 36 comments

Comments

@sdboyer
Copy link
Member

sdboyer commented Aug 4, 2017

dep prune currently exists as a separate command. This is not optimal; we want to incorporate its behavior into dep ensure directly, obviating the need to run the extra command.

Projects relying on pruning will necessarily have to give up on (fast) vendor/ verification - #121, as that verification will need to rely on a hash of each dependency project's entire tree, prior to pruning. That's just going to be a cost of doing business.

This will involve a few things:

  1. Create a set of flags (express as a bitset, e.g. type PruneOptions uint8) in gps representing the orthogonal, combinable pruning modes:
    i. Remove *_test.go files
    ii. Remove non-go files (except LICENSE, COPYING)
    iii. Remove unused packages
    iv. Remove nested vendor dirs (note that gps needs to have this as a flag, but dep unconditionally sets it - we do not give the user the option to NOT strip vendor directories)
  2. Introduce a new top-level section in Gopkg.toml that allows the user to specify each of these separately. I'm imagining something like this:
    [prune]
      non-go = true
      test-go = true
      unused-packages = true
    (All default to false)
  3. Modify gps.WriteDepTree() to take a struct with a PruneOptions bitfield. (Keep an eye on [WIP] Parallelised gps.WriteDepTree #903, this overlaps with that)
  4. Hack up the logic currently in cmd/dep/prune.go, moving it over into unexported functions in gps that gps.WriteDepTree() calls into itself to perform the pruning.

#120 is absurdly long and less relevant now, so I'm going to close it in favor of this.

  • Remember to update the README when we're done with this!
@VojtechVitek
Copy link

Can we keep LICENSE and COPYING files with a non-go option?

@sdboyer
Copy link
Member Author

sdboyer commented Aug 4, 2017

@VojtechVitek 💯 yes, will update OP, i think it'd be needlessly creating pain not to do that

@ibrasho
Copy link
Collaborator

ibrasho commented Aug 4, 2017

I assume you meant gps.WriteDepTree instead of gps.WriteVendorTree. 😉

I already did 1, 2 & 3, I can send a PR (it can even be merged before 4 is completed since it doesn't impact behavior). I'm still working on 4 though.

@ibrasho
Copy link
Collaborator

ibrasho commented Aug 4, 2017

@sdboyer Just to check, after reading the comments in #903 I decided to update the signature of gps.WriteDepTree and make it a method of gps.SourceManager. The new signature is:

func (sm SourceManager) WriteDepTree(basedir string, l Lock, options PruneOptions) error

I've moved the current implementation to SourceMgr and added a dummy implementation for the other dummy sms.

@sdboyer
Copy link
Member Author

sdboyer commented Aug 4, 2017

I assume you meant gps.WriteDepTree instead of gps.WriteVendorTree. 😉

derp, yes. updated OP. once, long ago, it was called WriteVendorTree(), and i still have that stuck in my mind

I already did 1, 2 & 3 already, I can send a PR (it can even be merged before 4 is completed since it doesn't impact behavior)

🎉 🎉 🎊 🎊 killin it!

@karrick
Copy link
Contributor

karrick commented Aug 4, 2017

@sdboyer The code I just uploaded to my version of dep creates a prune tree while it's verifying dependency hashes.

It might be advantageous to look at this and #121 together.

@sdboyer
Copy link
Member Author

sdboyer commented Aug 4, 2017

@karrick there's definitely a lot of overlap...or at least, adjacency...eventually between pruning and verification, but if the prune tree is the one we discussed in #121 (i haven't looked yet), then i think that's a separate kind of pruning from what's going on here.

now, it's possible that we might be able to join these together. however, my current plan (as noted in the OP) is that if you do the kind of pruning this issue is focused on, then you don't get vendor/ verification. that's probably a bridge we COULD cross, but i'd want that to be a separate thing, as there's a long-term goal of having the hash digests we put into Gopkg.lock be the same as the ones that end up being used for content-addressability in the "third space".

@docmerlin
Copy link

I'd like a flag to turn off pruning, for situations where speed is essential such as CI.

@pierrre
Copy link

pierrre commented Aug 14, 2017

@docmerlin if you care about speed:

  • vendor your dependencies
  • don't run dep on CI

@sdboyer
Copy link
Member Author

sdboyer commented Aug 14, 2017

i don't think pruning, at least once absorbed into ensure, is ever likely to contribute much to slowdowns. i'd need to see evidence to the contrary before i could consider adding another flag.

@VojtechVitek
Copy link

I'd like a flag to turn off pruning, for situations where speed is essential such as CI.

Yeah, pruning will likely be even faster, since we can avoid all the extra I/O operations.

Let's not add any more flags. This should be the default behavior imho.

@sdboyer
Copy link
Member Author

sdboyer commented Aug 14, 2017

Yeah, pruning will likely be even faster, since we can avoid all the extra I/O operations.

for now, at least, we generally have to write out the full tree, then prune items out, so we can't realize this gain right now. but you're certainly right that we have the headroom to be able to explore avoiding writing files that will be pruned in the first place.

@GlenDC
Copy link

GlenDC commented Sep 8, 2017

Can we please also have some more fine grained options for when needed.

For example when using a dependency such as https://github.com/capnproto/go-capnproto2, I would like to still vendor the https://github.com/capnproto/go-capnproto2/tree/master/capnpc-go directory (and all its content) and the https://github.com/capnproto/go-capnproto2/blob/master/std/go.capnp file. This is useful as those files are still dependencies for our builds, even though they are not used go packages in our project or go-files.

In such cases it could be great if we could add some exception filters or w/e on a per dependency basis. Perhaps this scenarios is rare. But without such a possibility I would be resorted to use some shell script on top of dep to copy those files over afterwards somehow (during my bi-weekly dependency upgrade cycle).

@pxue
Copy link

pxue commented Sep 8, 2017

question about pruning / might be related, is there similar configuration as the govendor ignore build tag flag? use case is to ignore all *_test.go and appengine related code from vendor. The ignored config in the toml doesn't seem to be working as expected.

@karrick
Copy link
Contributor

karrick commented Sep 8, 2017

I'm not too certain what we're trying to solve with dep prune.

It seems like there's risk in pruning files that need to be around for a project dependency to build.

@sdboyer
Copy link
Member Author

sdboyer commented Sep 9, 2017

I'm not too certain what we're trying to solve with dep prune.

my initial take was more along these lines, but many folks consider this a hard requirement for using dep, as they want to keep the size of their vendor directory minimal. so, i relented - see #120 for the history there.

question about pruning / might be related, is there similar configuration as the govendor ignore build tag flag? use case is to ignore all *_test.go and appengine related code from vendor. The ignored config in the toml doesn't seem to be working as expected.

ah, that's a separate thing. pruning is strictly about removing excess files from vendor/. what you're after, i think, goes deeper, as it speaks to what paths we actually follow when analyzing code. it's on the roadmap, but backburnered, as it's not something people have been really coming out of the woodwork to demand - #291

Can we please also have some more fine grained options for when needed.

it's possible that we could add some more rules, but we need to be careful about this. we're going to have vendor/ verification coming pretty soon, which is going to be a drastic performance improvement (and reliability as well) - we'll no longer need to rewrite all of vendor/ on each dep ensure run. that's going to be predicated on having a new hash digest value in Gopkg.lock - something like tree-digest this:

[[projects]]
  branch = "master"
  name = "github.com/foo/bar"
  packages = ["."]
  revision = "cd8b52f8269e01eb486dfeef29f8fe4d5b397e0b"
  tree-digest = "0ba2aa4b3860c5fcc1219b31afaf30c1fcf3b97cbb0a5116b498cafedbb06a1b"

that'll be a hash digest of the contents of the original, unpruned tree. awesome if you're not pruning, but useless if you are - the digests won't match. we need another hash of the tree AFTER pruning for it to be useful:

  tree-digest = "0ba2aa4b3860c5fcc1219b31afaf30c1fcf3b97cbb0a5116b498cafedbb06a1b"
  pruned-digest = "1ed417a0bec57ffe988fae1cba8f3d49934fb873394d61844e0b3c96d69573fe"

if we're going to do that, though, then we need new ways of answering the question of "are imports/Gopkg.toml in sync with Gopkg.lock?" - because if the pruning options in Gopkg.toml change, then the pruned-digest in Gopkg.lock will need to change, as will the contents of vendor. so, we can't name it pruned-digest, as that obfuscates the pruning options that went into it.

the main reason i proposed pruning be controlled via a set of boolean flags is because it would make a comprehensible way of capturing what the applied pruning options are. say V represents vendor stripping, U unused packages, T test files, N non-go files. we might then represent the tree digests like this:

  [tree-digests]
    V = "0ba2aa4b3860c5fcc1219b31afaf30c1fcf3b97cbb0a5116b498cafedbb06a1b"
    VUTN = "1ed417a0bec57ffe988fae1cba8f3d49934fb873394d61844e0b3c96d69573fe"

note that i'm not saying this will be the final form - maybe we use full words, maybe we express as the integer value of the bitfield; maybe we only include the explicit pruned value, rather than always having just the V type. lots of possibilities, i'm just sketching it out. point is, simply by virtue of what key(s) are present, we know which pruning options were set in Gopkg.toml, and thus can tell when we need to sync those up.

if we provide anything other than boolean controls here, then we'd likely have to just hash the prune options. i think that would probably work, but it'd be a hash key pointing to a hash value - utterly useless for understanding what's going on.

we can certainly think about adding more boolean options - for example, we could divide "non-go" files into strictly *.go files vs. all the other types of possible files (.c, cc, .h, etc...) files. or something else ¯\_(ツ)_/¯.

now, maybe those wouldn't cover your *.capnp example. if so, you'd have to just not set the N flag for that particular dependency. but, i think it's preferable for us to walk this line of comprehensibility and stick with flags, rather than giving more arbitrary control. any individual user can always choose to not care about dep's verification&fast paths, and implement their own, more fine-grained pruner. but minimizing vendor size is an exercise with seriously diminishing returns, and i'll happily trade some small efficiency losses there for greater comprehensibility.

@ibrasho
Copy link
Collaborator

ibrasho commented Sep 9, 2017

@GlenDC It's worth it to reaffirm that pruning (beyond removing nested vendor dirs) is opt-in and not forced. So in the worst case scenario, you can just keep all options off and now files will be deleted.

I was thinking of adding an exception list to prune options, but that would complicate vendor verification in addition to the points @sdboyer mentioned.

@vergenzt
Copy link

What's the status of this? It looks like some relevant PRs have been merged, but I can't tell whether this is nearing completion or not. Does it seem close? :)

@ibrasho
Copy link
Collaborator

ibrasho commented Nov 11, 2017

@vergenzt Almost there. After #1226 gets merged, I'll push one more commit to complete #1219 and then we should have the initial implementation of this feature in dep.

@ixdy
Copy link

ixdy commented Jan 9, 2018

Any update? Both #1226 and #1219 have merged.

@sdboyer
Copy link
Member Author

sdboyer commented Jan 9, 2018 via email

@sdboyer
Copy link
Member Author

sdboyer commented Jan 24, 2018

and we are finally, FINALLY ready.

@sdboyer
Copy link
Member Author

sdboyer commented Jan 24, 2018

It's merged in, closing this. Release coming presently...

@sdboyer sdboyer closed this as completed Jan 24, 2018
@sttts
Copy link

sttts commented Jan 24, 2018

/cc @fejta

bpicode added a commit to bpicode/fritzctl that referenced this issue Feb 10, 2018
The 'prune' command has been absorbed into 'ensure',
see golang/dep#944. Also
it received some makeover, allowing tests to be pruned.
'dep ensure' now got rid of a bunch of files for us.
bpicode added a commit to bpicode/depbot that referenced this issue Feb 10, 2018
'prune' is now part of 'ensure' and configured on per-project-level in
Gopkg.toml. See golang/dep#944.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.