-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What's the intended workflow of the tarball cache? #393
Comments
You mean the offline-mirror setting? |
@bestander yeah |
So we use it for offline installs: CI and internal projects.
|
Personally, this is the single reason why I joined the effort on this project :) |
This is particularly useful in a monorepo setting (we have one at Exponent and then of course FB had theirs). All the packages and applications in the repo can share this same module cache. It makes CI/CD (especially monorepo CI/CD) great again. :-) |
Seems legit. 😄 Just so I'm clear, the primary goal is to make it possible to take a Is that correct? |
That is correct |
As my colleague, @kentaromiura, pointed out. |
So I know the NPM guys intentionally made life difficult because they used the resolved field (when present) as the complete and only identity for a package. Their reasoning had to do with private registries. Somebody not targeting a public registry at all may have complete module name overlap. Even more ominously, a package caching server (so far the only sane way to deal with trying to use npm in a high dependability environment) could have been configured to selectively shadow some packages (or provide them past the point in time when the original source had deleted them). To complete the chaos, the npm registry owners proved that they are willing, under appropriate pressure, to themselves reassign a module name to a different project, as happened with kik prompting the famed left-pad disaster. This makes the logic around upgrading a previously locked package awfully tricky. Npm basically doesn't allow it. Either you keep everything locked exactly as it is, or you lose all your locked down versions at the same time. Or you manually chop out a big chunk of shrinkwrap and splicing in an updated version. o_O How are we handling this? Do we see ourselves as having a certain contract with the user like, say, never accidentally at upgrade time replacing an installed package with something that is from a completely different codebase? |
The reason I asked this question is that I want to propose a slightly different workflow that I think satisfies the requirements that people had when designing it in the first place, but with fewer rough edges. The basic idea is that the (In Cargo, the mirroring configuration happens at a level above the individual sources, as I describe below) This is how we designed the mirror feature in Cargo, and it has a few nice properties:
@conartist6 I'm trying to understand the problem you're describing.
The way bundler and Cargo handle (what I believe you mean by) this problem is by using a "precise" version for every dependency in the lockfile that includes enough information to precisely identify it (and its source) but not including mirror information (which is supplied by configuration). In Cargo, mirrors are required to share precisely the same sha as the original upstream source, and any replacements that change the source code are specified in [replace]
"foo:0.1.0" = { git = 'https://github.com/example/foo' }
"bar:1.0.2" = { path = 'my/local/bar' } This means: "if you see
In Cargo (and bundler's) case, we also require that replacements share a name and version number with the original package they're replacing, and the feature is largely used for emergency patches or things like "the bug is fixed on master but the author hasn't gotten around to publishing it yet". I'm not entirely sure whether any of that directly targets the issue you're talking about. Can you clarify it a bit? |
Yes, yes it definitely does target the issue I'm describing. Npm lacks hashes, and with that restriction they were forced to treat source URLs as the best guarantee of authenticity. The setup that you are describing sounds quite attractive because it understands (on multiple levels) the difference between a cached copy and an override. Npm, infuriatingly, can't, which is why upgrading a cached package is such a nightmare. |
@wycats do you propose moving |
For large single-repo projects the experience looks like this. The developer writes
The new dependency is added to package.json and yarn.lock file and the tarball is downloaded to The nice thing about this approach is in simplicity, it is easy to review and easy to connect the dots. If there is another project in the repository and another developer does
Then existing tarball will be reused and yarn.lock will refer it. What would be different with the proposed approach? |
If I understand what @wycats is saying, nothing in the workflow your describe is different for the user. The major difference is what data is stored in yarn.lock. I understand your earlier response to suggest that yarn.lock would contain something concrete like:
This is the npm approach. The suggestion here is to, in yarn.lock, store:
This way the cache directory is searched instead of being directly referenced, which means it is trivial to change the cache directory configuration, either as a one-off or between dev/prod/test. It also means that the program can easily know that if the user says |
Not quite. The With no configuration, we'd use the "default remote" for a particular package. If you configure a mirror in
The main difference so far is that the way to specify
I agree, it's nice 😄
The main distinction is that the You can still look at the integrity information in the For people who are not using mono-repos, it makes it possible to use the same feature for production deploys without disturbing the normal development workflow, as well as paves the way for other kinds of mirrors that can work together with the in-repo mirror strategy. In other words, it's just a more general way of describing the same thing. Finally, It also helps to rationalize what's going on with Generally, decoupling the "original source + unique identification" from "where we actually get the packages in practice" makes interactions between mirrors, links, and other similar features more reliable, but doesn't really change any fundamental capabilities. |
That does make sense, it would also solve #394. In yarn.lock we use strings like |
@bestander the way Cargo works is that there is a notion of "package id", which is a fully qualified package name that is guaranteed to be unique (each resolver gets to decide what is required for uniqueness). Here's an example [package]
name = "ohai"
version = "0.1.0"
authors = ["Yehuda Katz <[email protected]>"]
[dependencies]
libc = "*" And here's the lockfile Cargo generates: [root]
name = "ohai"
version = "0.1.0"
dependencies = [
"libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "libc"
version = "0.2.16"
source = "registry+https://github.com/rust-lang/crates.io-index" Cargo uses the word "source" to mean roughly the same thing as Yarn uses the word "resolver" for. In this case, since the registry doesn't allow people to mutate existing crates, the fully resolved name of the registry, plus the package's name and version are sufficient. For illustration, let me add another package to the [package]
name = "ohai"
version = "0.1.0"
authors = ["Yehuda Katz <[email protected]>"]
[dependencies]
libc = "*"
docopt = { git = "https://github.com/docopt/docopt.rs" } Here's the output from $ cargo build
Updating git repository `https://github.com/docopt/docopt.rs`
Updating registry `https://github.com/rust-lang/crates.io-index`
Compiling lazy_static v0.2.1
Compiling regex-syntax v0.3.5
Compiling utf8-ranges v0.1.3
Compiling memchr v0.1.11
Compiling winapi-build v0.1.1
Compiling aho-corasick v0.5.3
Compiling kernel32-sys v0.2.2
Compiling strsim v0.5.1
Compiling rustc-serialize v0.3.19
Compiling winapi v0.2.8
Compiling thread-id v2.0.0
Compiling thread_local v0.2.7
Compiling regex v0.1.77
Compiling docopt v0.6.83 (https://github.com/docopt/docopt.rs#be283ce2)
Compiling ohai v0.1.0 (file:///C:/Code/ohai)
Finished debug [unoptimized + debuginfo] target(s) in 89.36 secs And the updated [root]
name = "ohai"
version = "0.1.0"
dependencies = [
"docopt 0.6.83 (git+https://github.com/docopt/docopt.rs)",
"libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "aho-corasick"
version = "0.5.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "docopt"
version = "0.6.83"
source = "git+https://github.com/docopt/docopt.rs#be283ce2a00305998e89d98122cdad06e59dede4"
dependencies = [
"lazy_static 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 0.1.77 (registry+https://github.com/rust-lang/crates.io-index)",
"rustc-serialize 0.3.19 (registry+https://github.com/rust-lang/crates.io-index)",
"strsim 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "kernel32-sys"
version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"winapi 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi-build 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "lazy_static"
version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "libc"
version = "0.2.16"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "memchr"
version = "0.1.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "regex"
version = "0.1.77"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"aho-corasick 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)",
"regex-syntax 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
"thread_local 0.2.7 (registry+https://github.com/rust-lang/crates.io-index)",
"utf8-ranges 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "regex-syntax"
version = "0.3.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "rustc-serialize"
version = "0.3.19"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "strsim"
version = "0.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "thread-id"
version = "2.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"kernel32-sys 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "thread_local"
version = "0.2.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"thread-id 2.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "utf8-ranges"
version = "0.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "winapi"
version = "0.2.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "winapi-build"
version = "0.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
[metadata]
"checksum aho-corasick 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)" = "ca972c2ea5f742bfce5687b9aef75506a764f61d37f8f649047846a9686ddb66"
"checksum docopt 0.6.83 (git+https://github.com/docopt/docopt.rs)" = "<none>"
"checksum kernel32-sys 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)" = "7507624b29483431c0ba2d82aece8ca6cdba9382bff4ddd0f7490560c056098d"
"checksum lazy_static 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)" = "49247ec2a285bb3dcb23cbd9c35193c025e7251bfce77c1d5da97e6362dffe7f"
"checksum libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)" = "408014cace30ee0f767b1c4517980646a573ec61a57957aeeabcac8ac0a02e8d"
"checksum memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)" = "d8b629fb514376c675b98c1421e80b151d3817ac42d7c667717d282761418d20"
"checksum regex 0.1.77 (registry+https://github.com/rust-lang/crates.io-index)" = "64b03446c466d35b42f2a8b203c8e03ed8b91c0f17b56e1f84f7210a257aa665"
"checksum regex-syntax 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)" = "279401017ae31cf4e15344aa3f085d0e2e5c1e70067289ef906906fdbe92c8fd"
"checksum rustc-serialize 0.3.19 (registry+https://github.com/rust-lang/crates.io-index)" = "6159e4e6e559c81bd706afe9c8fd68f547d3e851ce12e76b1de7914bab61691b"
"checksum strsim 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)" = "50c069df92e4b01425a8bf3576d5d417943a6a7272fbabaf5bd80b1aaa76442e"
"checksum thread-id 2.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "a9539db560102d1cef46b8b78ce737ff0bb64e7e18d35b2a5688f7d097d0ff03"
"checksum thread_local 0.2.7 (registry+https://github.com/rust-lang/crates.io-index)" = "8576dbbfcaef9641452d5cf0df9b0e7eeab7694956dd33bb61515fb8f18cfdd5"
"checksum utf8-ranges 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)" = "a1ca13c08c41c9c3e04224ed9ff80461d97e121589ff27c753a16cb10830ae0f"
"checksum winapi 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)" = "167dc9d6949a9b857f3451275e911c3f44255842c1f7a76f33c55103a909087a"
"checksum winapi-build 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "2d315eee3b34aca4797b2da6b13ed88266e6d612562a0c46390af8299fc699bc" The github package we added added the following entry (plus all of its dependencies, of course): [[package]]
name = "docopt"
version = "0.6.83"
source = "git+https://github.com/docopt/docopt.rs#be283ce2a00305998e89d98122cdad06e59dede4"
dependencies = [
"lazy_static 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 0.1.77 (registry+https://github.com/rust-lang/crates.io-index)",
"rustc-serialize 0.3.19 (registry+https://github.com/rust-lang/crates.io-index)",
"strsim 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)",
] We include the name and version of course, but also a fully qualified source name, which in the case of git repositories, includes the precise revision at the point where the lockfile was generated. Also note that all of the package versions in the lockfile are precise versions, rather than a version range, which makes the dependency graph easier to work with. This also allows users to tighten versions (from The bottom of the lockfile is a series of checksums in a single, non-source-specific form (SHA256): [metadata]
"checksum aho-corasick 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)" = "ca972c2ea5f742bfce5687b9aef75506a764f61d37f8f649047846a9686ddb66"
"checksum docopt 0.6.83 (git+https://github.com/docopt/docopt.rs)" = "<none>"
"checksum kernel32-sys 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)" = "7507624b29483431c0ba2d82aece8ca6cdba9382bff4ddd0f7490560c056098d"
"checksum lazy_static 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)" = "49247ec2a285bb3dcb23cbd9c35193c025e7251bfce77c1d5da97e6362dffe7f"
"checksum libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)" = "408014cace30ee0f767b1c4517980646a573ec61a57957aeeabcac8ac0a02e8d"
"checksum memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)" = "d8b629fb514376c675b98c1421e80b151d3817ac42d7c667717d282761418d20"
"checksum regex 0.1.77 (registry+https://github.com/rust-lang/crates.io-index)" = "64b03446c466d35b42f2a8b203c8e03ed8b91c0f17b56e1f84f7210a257aa665"
"checksum regex-syntax 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)" = "279401017ae31cf4e15344aa3f085d0e2e5c1e70067289ef906906fdbe92c8fd"
"checksum rustc-serialize 0.3.19 (registry+https://github.com/rust-lang/crates.io-index)" = "6159e4e6e559c81bd706afe9c8fd68f547d3e851ce12e76b1de7914bab61691b"
"checksum strsim 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)" = "50c069df92e4b01425a8bf3576d5d417943a6a7272fbabaf5bd80b1aaa76442e"
"checksum thread-id 2.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "a9539db560102d1cef46b8b78ce737ff0bb64e7e18d35b2a5688f7d097d0ff03"
"checksum thread_local 0.2.7 (registry+https://github.com/rust-lang/crates.io-index)" = "8576dbbfcaef9641452d5cf0df9b0e7eeab7694956dd33bb61515fb8f18cfdd5"
"checksum utf8-ranges 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)" = "a1ca13c08c41c9c3e04224ed9ff80461d97e121589ff27c753a16cb10830ae0f"
"checksum winapi 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)" = "167dc9d6949a9b857f3451275e911c3f44255842c1f7a76f33c55103a909087a"
"checksum winapi-build 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "2d315eee3b34aca4797b2da6b13ed88266e6d612562a0c46390af8299fc699bc" We added this after the initial release of Cargo, and it ensures that we have a secure hash for any source, even though there are theoretical risks associated with the hashing strategy used by git, for example. Cargo also has a command that you can use to get the fully qualified name of a package in the $ cargo pkgid docopt
https://github.com/docopt/docopt.rs#docopt:0.6.83 This package id contains just enough information to uniquely identify a package in the dependency graph (it's the identifier used in the dependency graph structure, in fact). When describing a replacement, it's always fine to use a more general name (like |
@wycats, thanks for giving some background info. In the lock file we have name (implied), version and where it gets resolved to.
Would that be in par with Cargo features? |
ping @wycats |
I'm unclear from my limited use of yarn how much of a role the The specific flow we wanted was that I would have my development machine point at the public registry, but CI would go via a proxy. I think the way @wycats describes storing a reference to the package source separately from the package location would help enable this workflow. As a strawman, something along the lines of this might work: The checked-in lock file states the expected source and a hash
Those sources would have default locations, and then separate environment-specific not-checked in config could override source locations - possibly giving an ordered list? |
I've been testing custom yarn-cache folder for a few weeks, but I'm encountering a lot of
|
Somewhat related but maybe a little stray from the topic, is there any thoughts about dealing with node module installation scripts? There are plenty of node modules download additional codes during installation and thus the results of |
The problem is that Node.js install scripts can execute any bash script, there is no way to reliably achieve offline mode without authors' cooperation. |
How about caching the post-install results instead of pre-install results? I understand that will not work well with any codes with native platform dependencies, but that is not a problem any of our current solutions address either. The worst can happen is that we can not use those badly behaving modules, which will be as bad as current situation. |
Then it is as good as saving node_modules somewhere, for example, checking them into source control. |
I agree that will be equivalent, with added benefits that offline mirror currently provides:
|
Well, it might work but it may be complex. Yarn is already tracking a diff between caches and whatever happens after install scripts, see phantomFiles https://github.com/yarnpkg/yarn/blob/master/src/package-linker.js#L124 and beforeFiles in https://github.com/yarnpkg/yarn/blob/master/src/package-install-scripts.js#L280. If you feel that you could make sense of it and have some sort of offline storage for build artifacts go ahead, send an RFC. |
@UnrememberMe, better discuss this in a separate issue/RFC |
Agreed. Will open a separate issue/RFC. |
@UnrememberMe did you happen to open one? Could you link to it? |
I have started but have not finished the RFC yet. I should submit the pull request for RFC no later than Thursday 2/16/17. @gregsheremeta |
Is there a concern about the offline mirror will start to bloat if it is stored in source control? Every minor version change will leave the old .tgz files. |
Yeah, there is an RFC for that already.
An opt-in cleanup feature is coming
…On 16 February 2017 at 21:13, jackhamburger ***@***.***> wrote:
Is there a concern about the offline mirror will start to bloat if it is
stored in source control? Every minor version change will leave the old
.tgz files.
Is there a plan for a clean up command that empties the offline mirror of
module .tgzs that are not used in the yarn.lock?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#393 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ACBdWOsJV4qvTYkyeRvvDMkbRyzNNPCPks5rdLwUgaJpZM4KDHLg>
.
|
@gregsheremeta The RFC was posted on Feb 16, 2017. |
@UnrememberMe do you have a link to it? I don't see it in https://github.com/yarnpkg/rfcs |
@gregsheremeta I updated the title for yarnpkg/rfcs#50 The initial RFC title was not correct. |
That's something I'm interested in too. I'll leave my thoughts here: AFAIK, current yarn workflow is the following:
Each step could be improved:
How is it related to tarball cache feature? Not much. Intended use case of it is to completely avoid yarn install during CI and saving all dependencies in the repository. As we can see, local tarball cache would resolve all CI issues but somewhat complicate developer workflow issues. Ideally, we'd want online repository but cacheable. TL;DR It would be nice if workflow wouldn't change while achieving performance and stability improvement. |
Yeap, I'm aware of that. That comment was back in April. Still not clear whether you should use |
Closing this issue since it seems to be resolved, mostly by #2970 but possibly with other PRs. Please create a new issue if you want to propose more features of fixes around this. |
The last part of this blog post is referencing this ticket. Is it still valid or should the blog post be updated? |
I understand how the feature works, but as I'm not using it, I'm not quite sure what the exact intended workflow is.
@bestander @skevy @kittens
The text was updated successfully, but these errors were encountered: