Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update knit proposal #73

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions accepted/0000-link-options.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
- Start Date: 2017-01-13
- RFC PR: https://github.com/yarnpkg/rfcs/pull/41
- Yarn Issue: https://github.com/yarnpkg/yarn/issues/1213

# Summary

This is a proposal to improve upon `yarn link` so that developers can more accurately test in-development versions of their libraries from their apps or other libraries.

# Motivation

`yarn link` (and `npm link` before it) have several problems when working on code bases of non-trivial sizes, especially with multiple apps. The current `link` command doesn't isolate `node_modules` between apps (especially problematic with the advent of Electron), it doesn't allow for working on multiple versions of a library, and it produces a `node_modules` hierarchy that is not faithful to the one produced after the library is published.

# Detailed design

## Desired behavior

The `yarn link` workflow should mimic publishing a package (ex: `dep`) to npm and then installing it in a dependent (ex: `app`), and keep this constraint while you're making changes to the first package. Concretely, `yarn link` should make it so that when you save a change to `dep`, the resulting state is as if you:
1. Ran `npm publish` in `dep` (assume that it can clobber an existing version, and that you're publishing to a local registry on just your computer).
2. Ran `yarn add dep` in `app`.

## Why this behavior is great

This solves several problems that "yarn link" has today:

#### Isolating `node_modules` correctly

You can install `dep` in two different apps without sharing the `node_modules` of `dep`. This is a problem with Electron apps, whose V8 version is different than Node's and uses a different ABI. If you have `node-app` and `electron-app` that both depend on `dep`, the native dependencies of `dep` need to be recompiled separately for each app; `node-app/n_m/dep/n_m` must not be the same as `electron-app/n_m/dep/n_m`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the proposal will solve this issue. Node.js calls realpath on every "module". So even when files are linked instead of the whole folder, any lookup origining from the linked module will resolve in the realpath -> in the node_modules folder of the linked dependency and not in the app's node_modules.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not when run with --preserve-symlinks: https://nodejs.org/api/cli.html#cli_preserve_symlinks. See drawbacks section. I think webpack already supports such an option as well, and I tried to patch browserify. browserify/browserify#1742

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, that makes sense.


#### Working on multiple versions

You can be developing multiple different versions of `dep`. Say you have two directories, `dep-1` and `dep-2`, which have your v1 and v2 branches checked out, respectively. With "yarn link" it's not possible to make both of these directories linkable at the same time.

This is a problem when you are developing & testing `dep-1` with `old-app` and `dep-2` with `new-app`. You don't want to be going back and forth between `dep-1` and `dep-2` running "yarn link" each time you switch which app you're testing.

#### Faithfully representing the `node_modules` hierarchy

Currently `yarn link` symlinks the entire package directory, which brings along its `node_modules` subdirectory with it. With dependency deduping and flattening, bringing in `dep/node_modules` wholesale usually produces a different `node_modules` hierarchy than running `yarn install` in `app` and installing everything from npm. This isn't a problem most of the time but it does go against Yarn's spirit of consistency and the lockfile.

## A practical proposal

This is a proposal that solves all of the problems above and isn't too hard to implement or understand. Conceptually, we find all the files we'd normally publish to npm, pack them up using symlinks instead of copies of the files, publish the pack to a local registry (just a directory), and then when installing we look up packages in the local registry directory instead of npm.

### Running "yarn link --pack" inside of `dep`

This is the step that simulates publishing `dep`. Running `yarn link --pack` in `dep` finds all the files that "yarn publish" would pack up and upload to npm. Crucially, this excludes `node_modules`, and would follow the same algorithm as "yarn publish" such as reading package.json's `files` field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The disadvantage is that you need to run yarn link --pack every time you want to iterate.
What do you think of running the --pack at the time of link consumption?

  1. yarn link would register dep as "linkable"
  2. yarn link dep --pack would run the packing and installation

Copy link
Author

@devongovett devongovett Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. You have to run yarn link --pack once to set up, and only again if you add/remove a file in the root. If you add a file to a subdirectory (e.g. src/) it should automatically show up everywhere since we symlink directories in the root. I think this is reasonable. I definitely don't want to have to run a command every time any change is made: might as well use yarn pack + yarn install in that case.

Copy link
Member

@bestander bestander Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I've missed that point.

I understand that your goal for this change is to improve iteration speed when you are actively developing dep that is used in one of your projects.

I think the goal should be to achieve correctness of dep and simulate full publish/install cycle.
We could think of performance tweaks in the next iteration and people who absolutely need speed will just set up their environment to use vanilla yarn link.

Am I right you want to have this setup?

/path/to/app/node_modules/dep ->(symlink) ~/.yarn-links/dex-x.y.z ->(multiple symlinks) /path/to/dep

I think you won't be able to achieve the right isolation with symlinks inside ~/.yarn-links/dep-x.y.z -> dep. You'll end up cherry-picking symlinks of files and folders while respecting rules in .yarnignore/.npmingore.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the right idea. The dir in the app symlinks to a global directory which contains symlinks to individual files/directories in the root of the dependency.

My goal for this feature is to have something that works as closely to yarn link as possible but avoids the problem of having multiple copies of dependencies. This is solved by not including node_modules in the list of symlinks, and installing the dependencies locally in the project you're linking into.

It's a benefit that it happens to be closer to what you'd actually publish, but not the main goal for me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of multiple symlinks that work around node_modules and .yarnignore.
That would be quite hard to debug in non trivial cases IMHO.

And yeah, it goes the other direction with the --pack idea which should be about simulating a publish.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason it make it global is so when you re-"publish" (e.g. after adding a new file) you don't also have to re-"install" in every project that uses that dependency. This is because those projects point to the global directory, not to the individual files, so updating the global cascades the updates everywhere.

As for cleaning up references, we have a couple options:

  1. We could clean them up when you run yarn link --pack again - we'd blow away everything inside the global dir for the package and create new symlinks. This way anything that didn't exist anymore would go away.

  2. We could do (1) but across all packages in the global dir as a cleanup step: find any broken links and delete them.

Copy link
Member

@bestander bestander Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense if you have multiple projects refer the same package.
Is it a common use case for you?
I though that yarn link is something that you do temporarily while testing a dependency and don't leave as a link for long.

Copy link
Author

@devongovett devongovett Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah at work we tend to build our apps as a composition of a bunch of libraries, and we have several different apps which all use some of the same libraries. Setting up and maintaining the links between all of our repos has been a huge pain for us, which is why I'm invested in this feature. Our devs work on different parts of our apps which live across repos all the time, so we want to keep things linked generally. I've written some automation to setup the repos locally for everyone and link everything together using the POC of this yarn feature, and it seems to work pretty well. Ideally we'd use workspaces, but since we have many libraries that are used across different applications, it would be hard to go the monorepo route for us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying and investing in this feature, @devongovett.
I agree in general that Yarn is in the position to solve your problem but before merging your RFC I want us to sort out a few things that bother me.

  1. I want to keep knit proposal separate from this one.
    I think knit is about running prepublish scripts and making files available to projects that use a dep .
    This proposal is about improving link command with a trick around symlink so that node_modules and dev files don't leak from dep into a project using it.

  2. Semantics.
    After thinking more, --pack seems misleading.
    I would say yarn link --exclude-ignored-files or something like that, I don't insist on the actual flag name but it should not imply that some "packing" or prepublish will be happening.

  3. I am still not convinced that symlinks should be created in ~/yarn-links or any other global folder.
    Symlinks are cheap to create and any issues with the implementation would be contained inside a project that uses this feature.

  4. Another thing is this issue [docs] create a page for working with non npm dependencies website#573.
    We have quite a few way of working with npm packages locally and I am concerned that we are getting too many very similar solutions that approach the problem from different angles.
    I suggest hiding this RFC implementation behind a flag, similar to "workspaces-experimental true", and running with it for a while as an opt-in feature.
    So that if we find a better solution we could remove or change it without concerns of breaking the world.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the linking behavior is what we (Expo) want, whatever the name of the feature ends up being. These are our needs:

(a) we have a repo with multiple apps and multiple libraries
(b) we want the apps to be able to use the latest code in the libraries
(c) we want the libraries to be installed/linked as if we had published them to npm and then installed them in the apps. npm link/yarn link today has very different behavior than installing published code and we can't trust those commands.
(d) we want to be able to write code in the libraries and easily see how it affects an app. symlinks seem like a good way to do this.


Separately, we also do have a need for packing up a library (to simulate publishing) and installing in an app -- this happens in our continuous integration systems. This behavior is even closer to publishing+installing a library, which is why we use it in CI. We run something like: cd library && yarn pack && cd app && yarn add library.tgz.

However, in CI we don't need a tight feedback loop between editing library code and seeing the effects of those edits in an app. During development, we want to edit library code and see the app change immediately, which is why symlinks are appealing. We have a solution for CI already (yarn pack && yarn add) but no good one for development today.


Then it simulates publishing `dep`: it creates a directory named `dep-X.Y.Z` (where `X.Y.Z` is the version of `dep` in its package.json) inside of a global directory like `~/.yarn-link`. A symlink is created for each file or directory that `yarn publish` would normally have packed up. This step shares some conceptual similarities with publishing to a registry, except it uses symlinks and it's local on your computer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use cache folder for this instead?
If we opt in to a new ~/.yarn-link it will fall out of OS cache radars and becomes a risk of growing out of control.

How about <yarn cache>/links/?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially, though there should only be symlinks in the global registry anyway, so I don't think it should get that large. Putting it in the cache folder might cause your linked modules to suddenly stop working e.g. after a reboot, which would be weird IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we'll defer this discussion once we agree on the model


### Running "yarn add dep --link" inside of `app`

This behaves like normal `yarn add dep` except that it looks at the versions of `dep` that are in the global `~/.yarn-link` folder and takes the latest one rather than installing from a normal registry. (You also could run "yarn add [email protected] --link" if you wanted a more specific version, as usual.)

It then runs most of the same installation steps that `yarn add dep` would. It symlinks `app/node_modules/dep` to `~/.yarn-link/dep-X.Y.Z` as `yarn link dep` does. Then it installs the dependencies of `dep` as usual by fetching them from npm. Finally it runs postinstall scripts.

### Running "yarn install --link" inside of `app`

This behaves like normal `yarn install` but automatically links any dependencies that have been set up locally with `yarn link`. This is very useful when you have an application with many locally developed dependencies so that you don't have to manually run `yarn link dep` for each one inside your app.

# Comparison with other Yarn features

There are already several ways of linking dependencies in Yarn. This section is a comparison of them, and explains why these new options are still valuable to add.

* `yarn link` without the `--pack` option is similar to `yarn link --pack` but links directories instead of files. This means that any `node_modules` inside linked dependencies are also included, breaking the flattened and deduped tree that yarn normally provides. Many other differences have already been discussed in this RFC. Ideally, the behavior of `yarn link` would be what the `--pack` option provides, but for backwards compatibility and to match the behavior of `npm link`, `yarn link` would stay around as is.

* `link://` dependencies are similar to `yarn link dep` but actually closer to what `yarn add dep --link` would provide. `link://` dependencies also cause the linked module's dependencies to be installed locally, so as long as there is no `node_modules` folder inside the linked dependency, it would work identically to `yarn add dep --link`. The existing implementation of `link://` dependencies could be used to implement the `--link` option.

* `file://` dependencies cause files to be copied instead of linked. This is not as useful because it means you must re-install every time a change is made to the dependency.

* `yarn pack` + `yarn install dep.tgz` is similar to `file://` dependencies. The pack + install process must be re-run every time a change is made. It does correctly dedupe dependencies, however, as `node_modules` are excluded by `yarn pack`.

* [Workspaces](https://github.com/yarnpkg/rfcs/pull/66) are similar but solve a different problem. Where workspaces are great for a tree of related modules (e.g. a monorepo), these options are for linking together modules in separate trees, e.g. things that might be shared between multiple workspaces.

# How We Teach This

This proposal is mostly additive and affects only how people work on libraries that they are using in their apps. We would want to document the new options in the "CLI Commands" section of the docs and perhaps add a new section to "The Yarn Workflow".

`yarn link` would stay around, so people migrating from the npm client wouldn't have to learn anything new at first.

# Drawbacks

One issue with this proposal is that it's not clear what to put in the lockfile after running `yarn add dep --link` since we don't have an npm URL for the dep yet -- it hasn't been published to npm.

Another issue is that if you change package.json in `dep`, namely changing a dependency or modifying the `files` entry, you have to run `cd dep; yarn link --pack`. Same if you add a file at the top level of the dependency since each file is linked individually. This isn't so bad as those changes are probably more rare than saving changes to existing files.

Also, if you update the code in dep and bump its version, say from 1.0.0 to 1.1.0, the symlinks in ~/.yarn-link/dep-1.0.0 will still point to the code in your working directory, which now contains 1.1.0 code.

The symlinks might break but I think that's mostly OK since at that point you're done working on dep and have published it to npm and it's easy to go run `yarn add dep` in app and not use the symlinks anymore.

If you want to truly pin the versions of linked packages then you'd need to have a different working directory for each version. (Git worktrees are great for this use case actually. Worktrees let you check out a repo once and then magically create semi-clones of it in separate directories, with the constraint that the worktrees need to be on different branches, which is totally OK in this scenario. The worktrees all share the same Git repo though, so if you commit in one worktree you can cherry pick that commit within another worktree.)

Finally, another issue is with the way the node require resolution algorithm works. Dependencies of symlinked modules are resolved relative to the realpath, not the symlink. This means that you'll still get duplicates if both modules depend on a third dependency, or errors if that dependency is not installed in either place. This is solved by the recently added runtime option for node `--preserve-symlinks`, which skips getting the realpath when resolving modules. Something similar would need to be added to browserify/webpack to solve this there as well. I recently opened a [PR for browserify](https://github.com/substack/node-browserify/pull/1742) to support the same option.

# Unresolved questions
77 changes: 0 additions & 77 deletions accepted/0000-yarn-knit.md

This file was deleted.