Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(linking): Dont integrity check build artifacts #5470

Merged
merged 16 commits into from
Apr 5, 2018

Conversation

rally25rs
Copy link
Contributor

@rally25rs rally25rs commented Mar 7, 2018

fixes #932

Summary

If an install script modified a file that existed in the package, then yarn would view the file as stale on every command, mark the package as "fresh", and re-run the install scripts.

With this change, known build artifacts from the integrity file are checked, and if this package file is a
build artifact, it's size and timestamp are not checked.

In addition it was discovered that on Linux with node 8's fs.copyFile() timestamps are not preserved (they are on Win and OSX). This would cause yarn to always re-copy all files for all packages when doing a yarn add or yarn remove. Yarn will now detect this by checking the first file copy's timestamps, and if the first file is wrong then timestamps will be corrected using fs.futimes() for all file copies.

Test plan

Added test in __tests__/commands/add.js

If an install script modified a file that existed in the package, then yarn would view the file as
stale on every command, mark the package as "fresh", and re-run the install scripts.  With this
change, known build artifacts from the integrity file are checked, and if this package file is a
build atrifact, it's size and timestamp are not checked.

yarnpkg#932
@rally25rs rally25rs requested a review from bestander March 7, 2018 02:07
@buildsize
Copy link

buildsize bot commented Mar 7, 2018

File name Previous Size New Size Change
yarn-[version].noarch.rpm 911.89 KB 912.9 KB 1.01 KB (0%)
yarn-[version].js 3.96 MB 3.97 MB 4.3 KB (0%)
yarn-legacy-[version].js 4.11 MB 4.12 MB 4.49 KB (0%)
yarn-v[version].tar.gz 917.05 KB 918.16 KB 1.11 KB (0%)
yarn_[version]all.deb 677.26 KB 677.86 KB 616 bytes (0%)

@bestander
Copy link
Member

I think it is consistent with Yarn's promises.
Install scripts are evil from package manager's point of view, they can do anything outside of its control anyway.
So probably we could instruct Yarn to ignore the artifacts and not delete them.

Copy link
Member

@bestander bestander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding this, @rally25rs.
Lots of people will be happy

@bestander
Copy link
Member

Are test failures related?

@rally25rs
Copy link
Contributor Author

The CircleCI test is related. I'll take a look at that. 🤔
Strange that they pass on Appveyor (the one that is failing is an unrelated timeout that we see often).

@rally25rs
Copy link
Contributor Author

rally25rs commented Mar 8, 2018

@bestander @arcanis I've added some logging and verified that in the case of the failing test, it's actually taking the correct code branch for my fix. It seems to be failing for some reason that I haven't discovered yet.

However the weird thing is that it consistently fails only on CircleCI Linux Node 8. All other environments pass. 😕

Do you know what linux type test-linux-node8 is? Or does it run a docker image? I'm not really familiar with the CircleCI build config... I want to set up a local env and see if it fails locally, isntead of pushing a million log statements into this branch just to debug the CI tests.

edit:

I reproduced this in Ubuntu 16. It's wild... On Ubuntu it passes in Node v6 and fails in Node v8.10. On OSX it passes in Node 6 and 8. I guess I'll continue digging into this as I have time...

@BYK
Copy link
Member

BYK commented Mar 8, 2018

@rally25rs - I think the problem stems from here:

async bailout(patterns: Array<string>, workspaceLayout: ?WorkspaceLayout): Promise<boolean> {
const lockfileCache = this.lockfile.cache;
if (!lockfileCache) {
return false;
}
const match = await this.integrityChecker.check(patterns, lockfileCache, this.flags, workspaceLayout);
const haveLockfile = await fs.exists(path.join(this.config.lockfileFolder, constants.LOCKFILE_FILENAME));
if (match.integrityFileMissing && haveLockfile) {
// Integrity file missing, force script installations
this.scripts.setForce(true);
}
return false;
}

The bailout function in install.js returns with true when the integrity has not changed whereas the one from add always returns false, no matter what. I think this triggers the regular lifecycle events for all modules causing postinstall script to be run, thus changing the file output.

So it is not the copying, it is the postinstall script running and modifying it.

@rally25rs
Copy link
Contributor Author

Hmm, interesting. Then why the difference between node versions?

@rally25rs
Copy link
Contributor Author

rally25rs commented Mar 8, 2018

@BYK so get this... on linux with node 8, when files are copied from the cache to node_modules, the timestamps are updated. yarn decides the files are different and marks the reference fresh.
This only seems to happen on linux node 8 though.

image

Look at the stat timestamps on the copy in node_modules... all 3 timestamps are changed to the Change Time.

This is the same commands run in node 6:

image

This is going to cause a lot of churn in yarn on linux with node 8. It's going to re-copy every file every time. 😿

Edit:

Oh, this is because Node v8.5 introduced fs.copyFile which made copies much faster. IIRC that pipes down to the native filesystem copy, so that would explain why it works differently across OSs and only in node 8. Our fallback when fs.copyFile doesn't exist does a fs.futimes call to fix the timestamps. I think we might need to add that to the modern fs.copyFile implementation too.

I'll explore this option...

@rally25rs
Copy link
Contributor Author

sigh it looks like calling futimes makes things fail on AppVeyor (windows).

Error: EPERM: operation not permitted, futimes

how were we getting around this lack of futimes before? In master it is always called in older node versions https://github.com/yarnpkg/yarn/blob/master/src/util/fs.js#L69

@rally25rs
Copy link
Contributor Author

🎉 tests pass. The one failing on CircleCI node6 is a random timeout and not related. It passed in a couple of the previous runs.

It seems that on OSX and Linux, you can open a file with 'read' access and still edit the timestamps with fs.futimes() but on windows you have to open the file with 'write' access... good times... 🙄

@BYK
Copy link
Member

BYK commented Mar 9, 2018

Our fallback when fs.copyFile doesn't exist does a fs.futimes call to fix the timestamps. I think we might need to add that to the modern fs.copyFile implementation too.

I tried this and it caused a significant performance drop due to the extra file I/O. fs.copyFile was supposed to keep the timestamps with the original file. I really wouldn't want to do this :(

Copy link
Member

@BYK BYK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about the performance impact with the additional file I/O here. Can you measure before and after on Node 8?

Would love to figure out a different way to do this.

@BYK
Copy link
Member

BYK commented Mar 9, 2018

Seems like this is a bug in Node itself and explains why I haven't seen the issue on Windows: nodejs/node#15793 (comment) (and you haven't seen it on Windows and OS X).

Maybe we can limit the extra operation to Linux only which is the fastest anyway so the performance hit won't be too much of an issue? Alternatively, we may look at the creation time instead of modification time which seems to be consistent across platforms? (need to verify this last part before acting)

@rally25rs
Copy link
Contributor Author

I had a thought to just copy 1 file, and after that first copy check the timestamps. If they changed, set a flag to do the futimes on all subsequent copies.


however, this is still broken is some weird way. Somehow when copying files to the cache, after the copy completes the file still doesn't exist. This seems to happen if you have a file: dependency to a directory that is also a git repo.

verbose 2.163 Error: EACCES: permission denied, open '/Users/jvalore/Library/Caches/Yarn/v1/npm-cxcore-common-ui-0.1.2-9d4812bc-23b7-4d80-8253-121c0da71c33-1520603428055/.git/objects/03/9813c7ac8f77e7e8a849f7002cc37b0a75d6aa'
error An unexpected error occurred: "EACCES: permission denied, open '/Users/jvalore/Library/Caches/Yarn/v1/npm-cxcore-common-ui-0.1.2-9d4812bc-23b7-4d80-8253-121c0da71c33-1520603428055/.git/objects/03/9813c7ac8f77e7e8a849f7002cc37b0a75d6aa'".
info If you think this is a bug, please open a bug report with the information provided in "/Users/jvalore/Projects/cxcore-system-ui/yarn-error.log".
info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.

This error comes from trying to fs.open a file after fs.copyFile finished.


Eventually I think we need to just stop using timestamps as part of the check, but I'm not sure what else to do. Getting a checksum or hash for every file is probably much slower.

@rally25rs
Copy link
Contributor Author

@BYK I did a bunch more work on this and fixed it up a bit. There are a lot of edge cases (read my code comments) so the code looks ugly, but...

On the first file copy, check the timestamp to see if it's wrong. If it is already corect, then disable the calls to futimes. Because of the way file copy batching works (multiple in parallel), it actually does the check about 10 times, but at least its not every file.

I discovered that my error above was because I was opening the file in write/append mode to call futimes, because you have to on Windows. However if the file is read-only (as is the case with .git objects files) then that fails. On OSX you can call futimes on a file that is open in read-mode, so I try to re-open the file in read mode.

Performance hit... quantifiable but not bad. The first run (yarndev) is this branch. This is in a project with >1,400 packages totaling >25,000 files, so it is quite a large project. Also this is node 8.10 so fileCopy is in use, and on OSX so futimes is not called for every file.

~/Projects/cxcore-system-ui (master) 🐒   rm -rf node_modules/

~/Projects/cxcore-system-ui (master) 🐒   yarndev
yarn install v1.6.0-0
[1/5] 🔍  Validating package.json...
[2/5] 🔍  Resolving packages...
[3/5] 🚚  Fetching packages...
[4/5] 🔗  Linking dependencies...
[5/5] 📃  Building fresh packages...
✨  Done in 18.20s.

~/Projects/cxcore-system-ui (master) 🐒   rm -rf node_modules/

~/Projects/cxcore-system-ui (master) 🐒   yarn
yarn install v1.5.1
[1/5] 🔍  Validating package.json...
[2/5] 🔍  Resolving packages...
[3/5] 🚚  Fetching packages...
[4/5] 🔗  Linking dependencies...
[5/5] 📃  Building fresh packages...
✨  Done in 17.72s.

Performance on Ubuntu with node 8, where futimes is needed. I didn't have the same project to test, so I just used jest because it is pretty big... (518 packages, 7,800 files)

$ yarn install
this branch: 5.3s
yarn 1.5.1: 4.8s

so noticeably slower, but subsequent adds won't re-copy files...

$ yarn add {another package}
this branch: 3.6s
yarn 1.5.1: 5.1s

So you take a perf hit on initial install to save time on later adds.

whatcha think? 👍 / 👎

Copy link
Member

@arcanis arcanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Left a few comments here and there but the PR looks good.

src/util/fs.js Outdated
if (err) {
reject(err);
} else {
fixTimes(0, dest, data).then(() => resolve(err)).catch(ex => reject(ex));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the err in resolve(err) is a mistake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I just preserved what the original code did:

new Promise((resolve, reject) => fs.copyFile(src, dest, flags, err => (err ? reject(err) : resolve(err))))

The code I was replacing would resolve(err) too. I didn't want to make the assumption that it wasn't intentional. 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's ... weird. I think we can just fix this and check that everything is still green.

src/util/fs.js Outdated
if (err) {
reject(err);
} else {
fixTimes(0, dest, data).then(() => resolve(err)).catch(ex => reject(ex));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why the first parameter is 0. The fd 0 is stdin, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was 0 just because it's falsy. In the case of not using copyFile we already have a file open that we can use, instead of having to re-open the file. If the fd is falsey, then fixTimes opens the file. I was going to pass undefined but then Flow gets mad. I'll see if I can make it more understandable...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should be able to use null, then in the declaration of fixTime just use fd: ?number (instead of : number). Flow should then accept === null without complaining.

src/util/fs.js Outdated
// On OSX you can open with read permissions and still call `fs.futimes`.
// We first try to open the file in write mode because that works across all OSs.
// However if the file is read-only (not even the owner has write perms) then that will fail.
// We can try to reopen in read mode, which will work on OSX.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit convoluted. Would it make sense to use a third-party package for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a 3rd party package that will handle this? I agree that it's ugly. Maybe I can move it to it's own package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it there for now 👍

@arcanis arcanis self-assigned this Mar 21, 2018
@rally25rs
Copy link
Contributor Author

@arcanis I did a bit of refactoring and moved all the stuff in fs.js that had OS or Node version specific stuff in it into a new fs-normalized.js file. This at least puts all that stuff in one place instead of making fs.js so messy, and should put it in a better position for future removal. I opened #5554 to capture some additional work that could be done to replace this file with a different library. I just don't have time to actually test out a couple libraries across 3 OS's right now.

Hopefully this at least improves the situation...

@arcanis
Copy link
Member

arcanis commented Mar 24, 2018

LGTM, I'll fix the conflict monday 👍

Copy link

@james-rabbit james-rabbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look great and I am looking forward to this being merged.

I am a bit concerned about some of the errors being caught and silenced in fs-normalized though, some of the errors may be important for debugging strange cases down the line. If the errors are really not important, can a comment be made where they are silenced, justifying it so the reasoning is clear?

openfd = await open(dest, 'r', data.mode);
} catch (err) {
// We can't even open this file for reading.
return;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is catching er and err without logging or throwing anything here.
Issues with this may be difficult to debug.

Copy link
Contributor Author

@rally25rs rally25rs Mar 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be any issues if it fails. We just won't correct the timestamps (because we can't) and yarn will behave the same way it does now. The comment below tries to explain this, but maybe I didn't do a great job of explaining it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not critical, but if this code is not working on some machine it will look like a fairly big slowdown without a good way to debug it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rally25rs, I agree with @james-rabbit, it makes sense to put verbose logs that open -x failed for the future when we might be debugging why install on one OS is slower than the other.
Would you add it?

}
} catch (er) {
// If `futimes` throws an exception, we probably have a case of a read-only file on Windows.
// In this case we can just return. The incorrect timestamp will just cause that file to be recopied

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we verify that the file is read-only and throw or log this er if that is not the case?

Copy link
Contributor Author

@rally25rs rally25rs Mar 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no logger passed down to the fs-normalized module, so I didn't have any easy way of logging these issues. I was trying to avoid even bigger changes by passing the reporter down another level. I also didn't want fixTimes() to ever cause a yarn install to fail. If we can't fix the time stamps, it will effect performance a bit, but won't really break anything. It'll just work like it does currently where sometimes modules might get rebuilt every time (#932 ) but that's better than failing the install. If I can bubble up an error without failing the install, I'll try to add a line to the verbose log. Otherwise I'd rather not restructure the code any more than I already have.

@rally25rs
Copy link
Contributor Author

@BYK could you re-review so we can unblock merging? thanks!

@bestander
Copy link
Member

@rally25rs, could you fix the merge conflict?
I'll ping the team in chat.

@rally25rs
Copy link
Contributor Author

@bestander yep, working on it now...

@bestander
Copy link
Member

So we see 3% perf reduction for first installs but we get a significant win on consequent installs which I think is a more common case.
Let's merge it

@bestander
Copy link
Member

@rally25rs, this is a huge win! Thank you.
Let's merge, just one nit.

@arcanis arcanis merged commit d4299be into yarnpkg:master Apr 5, 2018
@arcanis
Copy link
Member

arcanis commented Apr 5, 2018

🎉

@Haroenv
Copy link
Member

Haroenv commented Apr 5, 2018

I just waited four times for already-installed native modules today, so ready for this PR !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Native packages rebuilt every time
6 participants