Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update: use fs.copyFile when available #4486

Merged
merged 8 commits into from
Sep 18, 2017
Merged

Update: use fs.copyFile when available #4486

merged 8 commits into from
Sep 18, 2017

Conversation

BYK
Copy link
Member

@BYK BYK commented Sep 16, 2017

Summary

Fixes #4331. Supersedes #3290. Uses the newly added fs.copyFile
on Node 8.5 hen available and falls back to the old buffer based
method otherwise. This patch also refactors the file copy code a
bit making it more efficient. Here are the durations on my computer
with this package.json:

master w/o copyFile w/ copyFile
~23s ~19s ~14s

This is with yarn.lock in place and w/o node_modules.

Test plan

CI should pass.

@BYK BYK changed the title Update: use fs.copyFile when awailable Update: use fs.copyFile when available Sep 16, 2017
**Summary**

Fixes #4331. Supercedes #3290. Uses the newly added `fs.copyFile`
on Node 8.5 hen available and falls back to old buffer based
method otherwise. This patch also refactors the file copy code a
bit making it more efficient. Here are the durations on my computer
with [this package.json](https://github.com/sciolist/fcopy/blob/c0ded8b82bbaa3f4cdb0a67e951eb3f8b0f97fb7/perf/project/alottafiles/package.json):

| master | w/o fsCopy | w/ fsCopy |
| - | - | - |
| ~23s | ~19s | ~15s |

This is with `yarn.lock` in place and w/o `node_modules`.

**Test plan**

CI should pass.
src/util/fs.js Outdated
@@ -527,6 +552,15 @@ export function copy(src: string, dest: string, reporter: Reporter): Promise<voi
return copyBulk([{src, dest}], reporter);
}

const copyFileWithAttributes = async function(data: CopyFileAction, cleanup: Function): Promise<void> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a terrible name, I should probably change this to safeCopyFile with an explanation of unlink and cleanup parts. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name looks fine, but a comment to explain the difference with copyFile would be useful.

I also feel like the arguments are also kinda obfuscated because of the custom CopyFileAction type.

@buildsize
Copy link

buildsize bot commented Sep 16, 2017

This change will increase the build size from 9.69 MB to 9.69 MB, an increase of 2.27 KB (0%)

File name Previous Size New Size Change
yarn-[version].noarch.rpm 834.67 KB 834.92 KB 258 bytes (0%)
yarn-[version].js 3.69 MB 3.69 MB 748 bytes (0%)
yarn-legacy-[version].js 3.74 MB 3.74 MB 874 bytes (0%)
yarn-v[version].tar.gz 838.74 KB 838.96 KB 227 bytes (0%)
yarn_[version]all.deb 634.81 KB 635.03 KB 222 bytes (0%)

src/util/fs.js Outdated
: async (src, dest, flags, data) => {
const fd = await open(dest, 'w', data.mode);
try {
await write(fd, await readFileBuffer(src));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not writeFile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeFile and utimes expect file paths and use open() and close() behind the scenes. Using open, write and close manually here along with futimes which expects a file descriptor, makes the operation more efficient by avoiding an extra file open/close.

I think this is why the patch is faster than master even when fs.copyFile is not used.

Should I add a comment about this in the code?

src/util/fs.js Outdated
@@ -527,6 +552,15 @@ export function copy(src: string, dest: string, reporter: Reporter): Promise<voi
return copyBulk([{src, dest}], reporter);
}

const copyFileWithAttributes = async function(data: CopyFileAction, cleanup: Function): Promise<void> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name looks fine, but a comment to explain the difference with copyFile would be useful.

I also feel like the arguments are also kinda obfuscated because of the custom CopyFileAction type.

src/util/fs.js Outdated
while ((writePromise = currentlyWriting[data.dest])) {
async (data: CopyFileAction): Promise<void> => {
const writePromise = currentlyWriting.get(data.dest);
if (writePromise) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't replace the while by a if, otherwise multiple waiting instances will all get released all at once. The while ensures that a single one will ever be able to cross the condition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this but here's my reasoning:

  • Event loop is single-threaded
  • If I do await promise the rest of the code is essentially gets into promise.then()
  • Since all code after this point is synchronous and since the event loop is single threaded, there can't be a new entry in the map between when the promise get's released and we add a new promise there.

Makes sense, or am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the event loop is single threaded, because all iterations will wait on the same promise, all the promises will be released at once (but sequentially) because they won't re-check to make sure they are the first promise to wake up.

That being said, I just saw that this call is inside a promise.queue call, so maybe this utility will already take care of this (but then why await at all?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see! Yes, you're right. For the promise queue part, I don't know why we simply don't return the existing promise. Doesn't make much sense to me since right now we are essentially writing the same file more than 1 time once the promsie is released. I'll change the code accordingly.

@BYK
Copy link
Member Author

BYK commented Sep 17, 2017

Oh boy, dat speed:

src/util/fs.js Outdated
@@ -40,8 +40,34 @@ export const chmod: (path: string, mode: number | string) => Promise<void> = pro
export const link: (src: string, dst: string) => Promise<fs.Stats> = promisify(fs.link);
export const glob: (path: string, options?: Object) => Promise<Array<string>> = promisify(globModule);

const CONCURRENT_QUEUE_ITEMS = 4;

const CONCURRENT_QUEUE_ITEMS = fs.copyFile ? 16 : 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you pick these numbers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I borrowed them from #3290. I did some testing myself with various numbers going as high as 64 and as low as 1. 1 definitely hurts but anything more than 8 doesn't make a huge difference. I think coupling this with the CPU count would be a better idea. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about an elaborate code comment on how you came up with this truth? Numbers tend to be arbitrary in code unless we give explicit meaning to them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll experiment a bit more and then see if I can come up with a better logic. I think simply tying this tp process count should be fine and self-documenting. Otherwise, I'll add the comment as you suggested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad you brought this up @cpojer! I did some repeated testing for powers of 2 and found that 128 is consistently the best concurrency level. Updated the code and added a comment about this

const copier = safeCopyFile(data, () => currentlyWriting.delete(data.dest));
currentlyWriting.set(data.dest, copier);
events.onProgress(data.dest);
return copier;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read: cpojer. lol

@cpojer
Copy link
Contributor

cpojer commented Sep 17, 2017

Wow, awesome speedup for Yarn!

@langpavel
Copy link

Does node support shallow copy for copy on write filesystems like btrfs? https://btrfs.wiki.kernel.org/index.php/Deduplication
It can not only speed up operation but also save disk space

@BYK
Copy link
Member Author

BYK commented Sep 17, 2017

@langpavel I think the new fs.copyFile will be able to leverage that, yeah.

@BYK
Copy link
Member Author

BYK commented Sep 17, 2017

Another record after setting concurrency to 128 on Linux & Node 8: TravisCI test run time

@vjeux
Copy link
Contributor

vjeux commented Sep 17, 2017

I'm curious, for node < 8.5 could you shell out to "cp" and if you do, do you get the same wins?

@BYK
Copy link
Member Author

BYK commented Sep 17, 2017

I'm curious, for node < 8.5 could you shell out to "cp" and if you do, do you get the same wins?

Why you hate cross-platform? :( (Windows)

@BYK
Copy link
Member Author

BYK commented Sep 17, 2017

@vjeux also "shelling out" would probably come with its own costs which may make this option a lot less desirable.

@sciolist
Copy link
Contributor

sciolist commented Sep 17, 2017

I did give shelling out to cp a look, and you could get similar speed benefits.. but you'd pretty much need to prepare one massive bash (+ powershell for windows) script with all the copies you want to run and then execute that, to offset the startup cost.. especially on Windows where it's very, very slow. Terribly ugly.

@Daniel15
Copy link
Member

@langpavel

Does node support shallow copy for copy on write filesystems like btrfs?

Not yet, but coming soon! See libuv/libuv#1491 and the discussion on libuv/libuv#1465

src/util/fs.js Outdated
* `data` contains target file attributes like mode, atime and mtime. Built-in copyFile copies these
* automatically but our polyfill needs the do this manually, thus needs the info.
*/
const safeCopyFile = async function(data: CopyFileAction, cleanup: Function): Promise<void> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use () => void instead of Function for the type of cleanup.

// fs.copyFile uses the native file copying instructions on the system, performing much better
// than any JS-based solution and consumes fewer resources. Repeated testing to fine tune the
// concurrency level revealed 128 as the sweet spot on a quad-core, 16 CPU Intel system with SSD.
const CONCURRENT_QUEUE_ITEMS = fs.copyFile ? 128 : 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if 128 will essentially lock up the system on slower hard drives (particularly if the user is multitasking)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't since this is still single threaded and we leave the scheduling job to the OS and Node as far as I understand. The reason for having a much lower limit for the fallback is mostly old versions of Node having buffering issues with pipe sometimes and possible memory pressure after #3539 since it now reads the whole file contents into memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this becomes a problem we can always make this a configurable variable.

@BYK BYK merged commit 0ee1c55 into master Sep 18, 2017
@BYK BYK deleted the copyFile branch September 18, 2017 12:47
@cpojer
Copy link
Contributor

cpojer commented Sep 18, 2017

🎉

@bestander
Copy link
Member

john-wesley-shipp-the-flash

bors bot added a commit to jser/jser.github.io that referenced this pull request Sep 19, 2017
441: 2017-09-19のJS: Node.js 8.5.0、MSEdge 16の変更点、CoffeeScript 2 r=azu a=azu

Node

- [Node v8.5.0 (Current) | Node.js](https://nodejs.org/en/blog/release/v8.5.0/)
- [Update: use fs.copyFile when available by BYK · Pull Request #4486 · yarnpkg/yarn](yarnpkg/yarn#4486)

Edge

- [What's Coming in EdgeHTML 16 - Microsoft Edge Development | Microsoft Docs](https://docs.microsoft.com/en-us/microsoft-edge/dev-guide/whats-new/edgehtml-16)
- [Making the web smoother with independent rendering - Microsoft Edge Dev BlogMicrosoft Edge Dev Blog](https://blogs.windows.com/msedgedev/2017/08/17/making-web-smoother-independent-rendering/#d8sy0bGIZkZ6gF01.97)
- [Microsoft Edge at Build 2017 - Microsoft Edge Dev BlogMicrosoft Edge Dev Blog](https://blogs.windows.com/msedgedev/2017/05/19/microsoft-edge-at-build-2017/#CwDc45Kwq1K8M2VW.97)
- [独立レンダリングパイプラインを大幅改良:Windows 10 Fall Creators Updateの「Microsoft Edge」は「EdgeHTML 16」アップデートで何がどれだけ速くなる? - @IT](http://www.atmarkit.co.jp/ait/articles/1708/22/news068.html)

Coffee 

- [Announcing CoffeeScript 2](http://coffeescript.org/announcing-coffeescript-2/)
BYK added a commit that referenced this pull request Sep 28, 2017
**Summary**

Follow up to #4486 which reverted the while loop that waits on
potential multiple copies of the same file. This seems to have
some random breakages and needs more investigation for optimizing.

**Test plan**

N/A
BYK added a commit that referenced this pull request Sep 28, 2017
**Summary**

Follow up to #4486 which reverted the while loop that waits on
potential multiple copies of the same file. This seems to have
some random breakages and needs more investigation for optimizing.

**Test plan**

N/A
joaolucasl pushed a commit to joaolucasl/yarn that referenced this pull request Oct 27, 2017
**Summary**

Fixes yarnpkg#4331. Supersedes yarnpkg#3290. Uses the newly added `fs.copyFile` 
on Node 8.5 hen available and falls back to the old buffer based 
method otherwise. This patch also refactors the file copy code a 
bit making it more efficient. Here are the durations on my computer 
with [this package.json](https://github.com/sciolist/fcopy/blob/c0ded8b82bbaa3f4cdb0a67e951eb3f8b0f97fb7/perf/project/alottafiles/package.json):

| master | w/o copyFile | w/ copyFile |
| - | - | - |
| ~23s | ~19s | ~14s |

This is with `yarn.lock` in place and w/o `node_modules`.

**Test plan**

CI should pass.
joaolucasl pushed a commit to joaolucasl/yarn that referenced this pull request Oct 27, 2017
**Summary**

Follow up to yarnpkg#4486 which reverted the while loop that waits on
potential multiple copies of the same file. This seems to have
some random breakages and needs more investigation for optimizing.

**Test plan**

N/A
@langpavel
Copy link

For CoW file systems support see #5456

@droslean
Copy link

@BYK I found this PR while trying to understand the memory usage of yarn. Is there any way to disable this feature and not use any cache at all? Instead, I would assume yarn to download the packages directly to the node_modules directory instead of using the default cache folder and then copy-paste millions of files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use fs.copyFile
9 participants