-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs: speed up fs.WriteStream#_write #2944
Conversation
@@ -1859,7 +1873,8 @@ WriteStream.prototype._write = function(data, encoding, cb) { | |||
}); | |||
|
|||
var self = this; | |||
fs.write(this.fd, data, 0, data.length, this.pos, function(er, bytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is what this PR is all about: avoiding all the stuff in fs.write that we already know does not apply to this use-case.
This currently breaks |
function wrapper(err, written) { | ||
// Retain a reference to buffer so that it can't be GC'ed too soon. | ||
callback(err, written || 0, buffer); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trevnorris Any news on the solution to these wrappers? If we could avoid a closure for each of these write calls everywhere, I think that would improve performance (or lower memory usage) a bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You won't see much if any gain in terms of performance. The change is more for code cleanliness. Will have to make the change of passing buffer
to wrapper()
so it can be hoisted. Though that shouldn't be difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what the plan is, but are you saying we would still need a wrapper? Perhaps you can elaborate a bit on what this is really all about? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the buffer has left JS scope then GC will clean it up. Even though it's still being used. One alternative is that the buffer bubbles through the oncomplete
callback. I still need to look into it. While the wrapper is ugly, it doesn't add any noticeable performance overhead in this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, thanks.
Thanks for starting on this. |
return; | ||
} | ||
|
||
this.fd = fd; | ||
this.emit('open', fd); | ||
}.bind(this)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good place to put an arrow function. I'm curious about the perf vs this bind
and const self = this
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afaik bind
is notoriously slow (both at bind time and at call time), so wherever we can remove it (which is everywhere), we most definitely should. Arrow functions, I don't believe perform quite well enough (or prevents optimization?) to start using all over the place I believe. Perhaps @mraleph can chime in on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wary about them as well, but after a bunch of testing found that they're just as fast as normal functions in the same situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, bind()
is still the devil's tool. Fortunately arrow functions don't play by the same rules, thus aren't placed under the same performance constraints.
@trevnorris Thanks. I'm quite invested in good file system performance right now, so you can expect me to continue to contribute here. Btw, CI is too soon, as one test is currently broken because it replaces fs.write with its own version (but our stream no longer depends on fs.write). Advice on this would be appreciated (especially if possible today, as I'm about to go offline for 10 days on vacation). |
49a704d
to
de90649
Compare
@ronkorving Is the offending test still relevant? If not then just delete it. If it is then we'll figure out a way to rewrite it. If you have to leave before finishing the test then I can look into finishing that for you. |
@trevnorris To be honest, I fear I'm running short on time. Travel preparations will take up most of my time tomorrow (typing this from a phone, won't see my computer again today), so by all means have a look at it and do as you please :) Alternatively, advise me but allow this to sit until I'm back on Oct 3rd. Either is fine by me. |
The failing test was introduced in d65832c. Reason for it failing isn't obvious to me yet. Will take a look later. |
@trevnorris it fails because the test depends on fs.write being used by WriteStream which is no longer the case in this PR. |
@ronkorving The description of the original commit is "Close the file descriptor when a write operation fails." Does that still apply? If so, can a similar test be created? Either way it seems like this failing test will need to be removed. |
why not do a quick backoff and retry (maybe after a small setTimeout) first before closing the fd? |
@@ -1836,30 +1836,42 @@ fs.FileWriteStream = fs.WriteStream; // support the legacy name | |||
|
|||
|
|||
WriteStream.prototype.open = function() { | |||
const self = this; | |||
|
|||
fs.open(this.path, this.flags, this.mode, function(er, fd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latest node supports arrow functions so why not use them?
Ref: http://jsperf.com/arrow-functions-vs-bind
Ref: http://jsperf.com/arrow-vs-bind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's my call to make. I would like to see the Node core members formally embrace it before throwing my warm and fuzzy ES6 feelings into each PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know about "embracing" it, but we're not going to enforce it. It's a stylistic call to save a few chars in a scenario like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so we're clear on future PRs, are arrow functions permitted in core now? I know they're supported by V8, but is there any reason NOT to use them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're permitted. Performance is on par w/ normal functions. Only place I'd suggest they be used is if this
needs to propagate. Other than that it's nothing more than a style thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to my benchmark, arrow functions actually seem to be reliably faster than regular functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brendanashworth Try reordering the runs. ;)
$ node /tmp/arrow-bench.js
function took 87.64109233 ns / op
regular took 93.22382201 ns / op
$ node /tmp/arrow-bench.js
regular took 87.63654512 ns / op
function took 93.23333455 ns / op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sigh, v8 is confusing sometimes.
Is fixing |
I think so. I'll try to give that some love soon, but have been a bit occupied I'm afraid. |
The test ensures that if there are errors passed to the callback (write failures), the stream closes the file descriptor and emits the error. That seems like a good test to have, but it achieves this by creating a fake write error in fs.write (which is public, so it can do that). In the new WriteStream implementation, it no longer uses fs.write, so I'm not sure how I could inject a fake error. Suggestions welcome (as I continue looking into this). |
Here's a diff for a fix: diff --git a/test/parallel/test-fs-write-stream-err.js b/test/parallel/test-fs-write-stream-err.js
index bc289d4..7d64014 100644
--- a/test/parallel/test-fs-write-stream-err.js
+++ b/test/parallel/test-fs-write-stream-err.js
@@ -2,6 +2,7 @@
var common = require('../common');
var assert = require('assert');
var fs = require('fs');
+var binding = process.binding('fs');
common.refreshTmpDir();
@@ -10,21 +11,18 @@ var stream = fs.createWriteStream(common.tmpDir + '/out', {
});
var err = new Error('BAM');
-var write = fs.write;
+var writeBuffer = binding.writeBuffer;
var writeCalls = 0;
-fs.write = function() {
+binding.writeBuffer = function() {
switch (writeCalls++) {
case 0:
- console.error('first write');
// first time is ok.
- return write.apply(fs, arguments);
+ console.error('first write');
+ return writeBuffer.apply(this, arguments);
case 1:
// then it breaks
console.error('second write');
- var cb = arguments[arguments.length - 1];
- return process.nextTick(function() {
- cb(err);
- });
+ return process.nextTick(arguments[5].oncomplete, err);
default:
// and should not be called again!
throw new Error('BOOM!'); |
Smart! Why didn't I think of overriding the methods on the binding object? Thanks @trevnorris :) |
@trevnorris I'm a bit new to this, so please bear with me. I find this very strange, with your patch I can make the test fail or succeed, depending on how I run it: Success:
Failure:
Any idea how this could happen or how the test can behave in 2 completely different ways depending on how its run? PS: I've pushed the test update to this PR, so you can try it out. |
This avoids a number of type checks and a maybeCallback call that was being done by fs.write before.
de90649
to
65b0241
Compare
friendly poke @trevnorris |
I'm not sure. @bnoordhuis I believe you were the one to introduce this test. have any thoughts? |
What happens with |
@bnoordhuis thanks. that doesn't trigger it directly but the following seems to:
|
By changing
|
Ah yes. diff --git a/test/parallel/test-fs-write-stream-err.js b/test/parallel/test-fs-write-stream-err.js
index 7d64014..b97fef9 100644
--- a/test/parallel/test-fs-write-stream-err.js
+++ b/test/parallel/test-fs-write-stream-err.js
@@ -14,6 +14,9 @@ var err = new Error('BAM');
var writeBuffer = binding.writeBuffer;
var writeCalls = 0;
binding.writeBuffer = function() {
+ if (arguments[0] != stream.fd)
+ return writeBuffer.apply(this, arguments);
+
switch (writeCalls++) {
case 0:
// first time is ok. Thanks much @bnoordhuis for the help. |
Wow, thanks so much guys. I wouldn't have been able to figure that one out on my own. |
@ronkorving ... any updates on this one? |
I need to get back on this one. No updates at this time, hope to get back to it soon. |
7da4fd4
to
c7066fb
Compare
Because of the changes that have happened to fs, I'll restart this later. |
This avoids a number of type checks and a maybeCallback call that was being done by fs.write before.
Benchmarking this is not an exact science. I tend to see some improvement, especially when dealing with the small and countless buffers (like in #2167, it's probably best to focus on size=2 and size=1024 for benchmark consistency). In any case, looking at the code change, it should be fairly obvious that it should be at least somewhat faster.
Previous benchmark:
With this PR:
After #2167, you may wonder why this would have any affect on this benchmark at all. But that's because pretty much each _writev is followed by a _write and vice versa. When tracking whether _writev or _write is called, you see both pop up almost equally. For that reason I thought it would be worthwhile to optimize fs.WriteStream#_write. Independently, it would be interesting to try and figure out if we can optimize Stream.Writable to chunk more through _writev when possible.