Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major performance improvement #3

Merged
merged 3 commits into from
May 12, 2017
Merged

Major performance improvement #3

merged 3 commits into from
May 12, 2017

Conversation

mramato
Copy link
Contributor

@mramato mramato commented May 10, 2017

Reader.prototype.addChunk was calling Buffer.concat constantly, which increased garbage collection and just all-around killed performance. The exact implications of this is documented in brianc/node-postgres#1286, which has a test case for showing how performance is affected.

Rather than concatenating buffers to the new buffer size constantly, this version uses a growth strategy that doubles the size of the buffer each time and tracks the functional length in a separate chunkLength variable. This significantly reduces the amount of allocation and provides a 25x
performance in my test cases, the larger the amount of data the query is returning, the greater improvement of performance.

Since this uses a doubling buffer, it was important to avoid unbounded growth, so I also added a reclamation strategy which reduces the size of the buffer by half whenever more than half of the data has been read.

I wasn't sure if it was okay to delete Reader.prototype._save outright, since it now just always returns false in all cases.

I didn't add any new unit tests because the existing ones should cover the refactored code, but let me know if you would like me to add something specific.

Also, thanks for node-postgres, it's awesome and hopefully this helps makes it even more awesome. 😄

`Reader.prototype.addChunk` was calling `Buffer.concat` constantly, which
increased garbage collection and just all-around killed performance. The
exact implications of this is documented in
brianc/node-postgres#1286, which has a test case
for showing how performance is affected.

Rather than concatenating buffers to the new buffer size constantly, this
change uses a growth strategy that doubles the size of the buffer each
time and tracks the functional length in a separate `chunkLength` variable.
This significantly reduces the amount of allocation and provides a 25x
performance in my test cases, the larger the amount of data the query
is returning, the greater improvement of performance.

Since this uses a doubling buffer, it was important to avoid growing
forever, so I also added a reclaimation strategy which reduces the size
of the buffer wever time more than half of the data has been read.
Copy link
Owner

@brianc brianc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This great! Thank you! The perf improvements are really exciting! 💃 I have been doing this for as long as node-postgres with outgoing packets: https://github.com/brianc/node-buffer-writer/blob/master/index.js#L12 I think I've just overlooked the incoming packets for one reason or another.

The only thing I'm not really feeling is resizing the buffer back down doing copies from larger to smaller buffers. I think there's a way to do it more efficiently, while still reclaiming memory at a reasonable pace. Postgres packets are length prefixed and so frequently this.offset === this.chunk.length will be true meaning the entire response will be consumed. I would check for this condition at the top of addChunk and if the entire existing buffered input has been consumed, just set this.chunk = chunk and return. That way the buffer grows to the entire length of a huge incoming packet with your efficient doubling algorithm, but is only shrunk once the whole thing is consumed. I think in practice this will be simpler and hopefully even more performant.

index.js Outdated
@@ -8,32 +8,48 @@ var Reader = module.exports = function(options) {
options = options || {}
this.offset = 0
this.lastChunk = false
this.chunk = null
this.chunk = Buffer.alloc(4);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently node-postgres supports node @v0.10.x onward - does this exist in older versions of node?

note to self: I need to set up travis on this repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized I can just go back to assigning this to null and just keeping a handle to the first chunk added.

index.js Outdated
this.chunk.copy(newBuffer, 0, this.offset);
this.chunk = newBuffer;
this.chunkLength -= this.offset;
this.offset = 0;
}
}

Reader.prototype._save = function() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah if this method is only returning false and doing nothing else we might as well inline it (though v8 probably is anyway)

@mramato
Copy link
Contributor Author

mramato commented May 11, 2017

Thanks for the review, @brianc, working on the requested changes now.

@mramato
Copy link
Contributor Author

mramato commented May 11, 2017

Postgres packets are length prefixed and so frequently this.offset === this.chunk.length will be true meaning the entire response will be consumed.

Thanks, I was actually hoping you might have some insight into this area (I know nothing about the postgres wire protocol), I just didn't want to open a PR for what I thought may end up as unbounded memory growth. Your suggestion is a lot cleaner and predictable.

1. Fix Node @v0.10.x by removing `Buffer.alloc` usage.
2. Remove uneeded Reader.prototype._save/
3. Reset buffer at the end of the packet instead of shrinking dynamically.
4. Remove some semi-colons to keep style consistent.
@mramato
Copy link
Contributor Author

mramato commented May 11, 2017

@brianc, I believe I've addressed all of your review comments. I also tested on v0.10.48 just to confirm I didn't break anything there. Please let me know if there are any other changes you'd like. Thanks again for the review and for pg in general.

@brianc
Copy link
Owner

brianc commented May 12, 2017

This PR is amazing & I really appreciate the way you approached everything! 👏 I'm super excited to test this out & get it merged. I've got a ton on my plate right now, but I'll do everything I can to get this tested and merged in by EOD tomorrow. If for some reason I get sandbagged so badly at work I can't get to this, know it'll be in by the end of the weekend!

If I had travis set up on this repo it'd be a lot easier actually, but this thing has changed so little in the past 5 years, and when I initially set it up I didn't have the rigor I do now....so I'll try & get that set up tomorrow too!

@mramato
Copy link
Contributor Author

mramato commented May 12, 2017

Thank you for the kind words. By the weekend would be awesome, but I'm just glad to know it's on your radar.

If you uncover anything during your testing, just let me know and I can fix it (or if it's easier for you, feel free to just tweak things yourself, I don't mind at all). Thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants