-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update: use fs.copyFile when available #4486
Conversation
**Summary** Fixes #4331. Supercedes #3290. Uses the newly added `fs.copyFile` on Node 8.5 hen available and falls back to old buffer based method otherwise. This patch also refactors the file copy code a bit making it more efficient. Here are the durations on my computer with [this package.json](https://github.com/sciolist/fcopy/blob/c0ded8b82bbaa3f4cdb0a67e951eb3f8b0f97fb7/perf/project/alottafiles/package.json): | master | w/o fsCopy | w/ fsCopy | | - | - | - | | ~23s | ~19s | ~15s | This is with `yarn.lock` in place and w/o `node_modules`. **Test plan** CI should pass.
src/util/fs.js
Outdated
@@ -527,6 +552,15 @@ export function copy(src: string, dest: string, reporter: Reporter): Promise<voi | |||
return copyBulk([{src, dest}], reporter); | |||
} | |||
|
|||
const copyFileWithAttributes = async function(data: CopyFileAction, cleanup: Function): Promise<void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a terrible name, I should probably change this to safeCopyFile
with an explanation of unlink
and cleanup
parts. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name looks fine, but a comment to explain the difference with copyFile would be useful.
I also feel like the arguments are also kinda obfuscated because of the custom CopyFileAction type.
This change will increase the build size from 9.69 MB to 9.69 MB, an increase of 2.27 KB (0%)
|
src/util/fs.js
Outdated
: async (src, dest, flags, data) => { | ||
const fd = await open(dest, 'w', data.mode); | ||
try { | ||
await write(fd, await readFileBuffer(src)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not writeFile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writeFile
and utimes
expect file paths and use open()
and close()
behind the scenes. Using open
, write
and close
manually here along with futimes
which expects a file descriptor, makes the operation more efficient by avoiding an extra file open/close.
I think this is why the patch is faster than master even when fs.copyFile
is not used.
Should I add a comment about this in the code?
src/util/fs.js
Outdated
@@ -527,6 +552,15 @@ export function copy(src: string, dest: string, reporter: Reporter): Promise<voi | |||
return copyBulk([{src, dest}], reporter); | |||
} | |||
|
|||
const copyFileWithAttributes = async function(data: CopyFileAction, cleanup: Function): Promise<void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name looks fine, but a comment to explain the difference with copyFile would be useful.
I also feel like the arguments are also kinda obfuscated because of the custom CopyFileAction type.
src/util/fs.js
Outdated
while ((writePromise = currentlyWriting[data.dest])) { | ||
async (data: CopyFileAction): Promise<void> => { | ||
const writePromise = currentlyWriting.get(data.dest); | ||
if (writePromise) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't replace the while by a if, otherwise multiple waiting instances will all get released all at once. The while ensures that a single one will ever be able to cross the condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this but here's my reasoning:
- Event loop is single-threaded
- If I do
await promise
the rest of the code is essentially gets intopromise.then()
- Since all code after this point is synchronous and since the event loop is single threaded, there can't be a new entry in the map between when the promise get's released and we add a new promise there.
Makes sense, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if the event loop is single threaded, because all iterations will wait on the same promise, all the promises will be released at once (but sequentially) because they won't re-check to make sure they are the first promise to wake up.
That being said, I just saw that this call is inside a promise.queue call, so maybe this utility will already take care of this (but then why await at all?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see! Yes, you're right. For the promise queue part, I don't know why we simply don't return the existing promise. Doesn't make much sense to me since right now we are essentially writing the same file more than 1 time once the promsie is released. I'll change the code accordingly.
src/util/fs.js
Outdated
@@ -40,8 +40,34 @@ export const chmod: (path: string, mode: number | string) => Promise<void> = pro | |||
export const link: (src: string, dst: string) => Promise<fs.Stats> = promisify(fs.link); | |||
export const glob: (path: string, options?: Object) => Promise<Array<string>> = promisify(globModule); | |||
|
|||
const CONCURRENT_QUEUE_ITEMS = 4; | |||
|
|||
const CONCURRENT_QUEUE_ITEMS = fs.copyFile ? 16 : 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you pick these numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I borrowed them from #3290. I did some testing myself with various numbers going as high as 64 and as low as 1. 1
definitely hurts but anything more than 8 doesn't make a huge difference. I think coupling this with the CPU count would be a better idea. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about an elaborate code comment on how you came up with this truth? Numbers tend to be arbitrary in code unless we give explicit meaning to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll experiment a bit more and then see if I can come up with a better logic. I think simply tying this tp process count should be fine and self-documenting. Otherwise, I'll add the comment as you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad you brought this up @cpojer! I did some repeated testing for powers of 2 and found that 128 is consistently the best concurrency level. Updated the code and added a comment about this
const copier = safeCopyFile(data, () => currentlyWriting.delete(data.dest)); | ||
currentlyWriting.set(data.dest, copier); | ||
events.onProgress(data.dest); | ||
return copier; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read: cpojer. lol
Wow, awesome speedup for Yarn! |
Does node support shallow copy for copy on write filesystems like btrfs? https://btrfs.wiki.kernel.org/index.php/Deduplication |
@langpavel I think the new |
I'm curious, for node < 8.5 could you shell out to "cp" and if you do, do you get the same wins? |
Why you hate cross-platform? :( (Windows) |
@vjeux also "shelling out" would probably come with its own costs which may make this option a lot less desirable. |
I did give shelling out to cp a look, and you could get similar speed benefits.. but you'd pretty much need to prepare one massive bash (+ powershell for windows) script with all the copies you want to run and then execute that, to offset the startup cost.. especially on Windows where it's very, very slow. Terribly ugly. |
Not yet, but coming soon! See libuv/libuv#1491 and the discussion on libuv/libuv#1465 |
src/util/fs.js
Outdated
* `data` contains target file attributes like mode, atime and mtime. Built-in copyFile copies these | ||
* automatically but our polyfill needs the do this manually, thus needs the info. | ||
*/ | ||
const safeCopyFile = async function(data: CopyFileAction, cleanup: Function): Promise<void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use () => void
instead of Function
for the type of cleanup
.
// fs.copyFile uses the native file copying instructions on the system, performing much better | ||
// than any JS-based solution and consumes fewer resources. Repeated testing to fine tune the | ||
// concurrency level revealed 128 as the sweet spot on a quad-core, 16 CPU Intel system with SSD. | ||
const CONCURRENT_QUEUE_ITEMS = fs.copyFile ? 128 : 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if 128 will essentially lock up the system on slower hard drives (particularly if the user is multitasking)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't since this is still single threaded and we leave the scheduling job to the OS and Node as far as I understand. The reason for having a much lower limit for the fallback is mostly old versions of Node having buffering issues with pipe
sometimes and possible memory pressure after #3539 since it now reads the whole file contents into memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this becomes a problem we can always make this a configurable variable.
🎉 |
441: 2017-09-19のJS: Node.js 8.5.0、MSEdge 16の変更点、CoffeeScript 2 r=azu a=azu Node - [Node v8.5.0 (Current) | Node.js](https://nodejs.org/en/blog/release/v8.5.0/) - [Update: use fs.copyFile when available by BYK · Pull Request #4486 · yarnpkg/yarn](yarnpkg/yarn#4486) Edge - [What's Coming in EdgeHTML 16 - Microsoft Edge Development | Microsoft Docs](https://docs.microsoft.com/en-us/microsoft-edge/dev-guide/whats-new/edgehtml-16) - [Making the web smoother with independent rendering - Microsoft Edge Dev BlogMicrosoft Edge Dev Blog](https://blogs.windows.com/msedgedev/2017/08/17/making-web-smoother-independent-rendering/#d8sy0bGIZkZ6gF01.97) - [Microsoft Edge at Build 2017 - Microsoft Edge Dev BlogMicrosoft Edge Dev Blog](https://blogs.windows.com/msedgedev/2017/05/19/microsoft-edge-at-build-2017/#CwDc45Kwq1K8M2VW.97) - [独立レンダリングパイプラインを大幅改良:Windows 10 Fall Creators Updateの「Microsoft Edge」は「EdgeHTML 16」アップデートで何がどれだけ速くなる? - @IT](http://www.atmarkit.co.jp/ait/articles/1708/22/news068.html) Coffee - [Announcing CoffeeScript 2](http://coffeescript.org/announcing-coffeescript-2/)
**Summary** Follow up to #4486 which reverted the while loop that waits on potential multiple copies of the same file. This seems to have some random breakages and needs more investigation for optimizing. **Test plan** N/A
**Summary** Follow up to #4486 which reverted the while loop that waits on potential multiple copies of the same file. This seems to have some random breakages and needs more investigation for optimizing. **Test plan** N/A
**Summary** Fixes yarnpkg#4331. Supersedes yarnpkg#3290. Uses the newly added `fs.copyFile` on Node 8.5 hen available and falls back to the old buffer based method otherwise. This patch also refactors the file copy code a bit making it more efficient. Here are the durations on my computer with [this package.json](https://github.com/sciolist/fcopy/blob/c0ded8b82bbaa3f4cdb0a67e951eb3f8b0f97fb7/perf/project/alottafiles/package.json): | master | w/o copyFile | w/ copyFile | | - | - | - | | ~23s | ~19s | ~14s | This is with `yarn.lock` in place and w/o `node_modules`. **Test plan** CI should pass.
**Summary** Follow up to yarnpkg#4486 which reverted the while loop that waits on potential multiple copies of the same file. This seems to have some random breakages and needs more investigation for optimizing. **Test plan** N/A
For CoW file systems support see #5456 |
@BYK I found this PR while trying to understand the memory usage of yarn. Is there any way to disable this feature and not use any cache at all? Instead, I would assume yarn to download the packages directly to the node_modules directory instead of using the default cache folder and then copy-paste millions of files. |
Summary
Fixes #4331. Supersedes #3290. Uses the newly added
fs.copyFile
on Node 8.5 hen available and falls back to the old buffer based
method otherwise. This patch also refactors the file copy code a
bit making it more efficient. Here are the durations on my computer
with this package.json:
This is with
yarn.lock
in place and w/onode_modules
.Test plan
CI should pass.