-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(NODE-6246): Use buffer pool for ObjectId to significantly improve memory usage #707
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SeanReece Looks awesome, thanks again for the contribution, just a few improvements to line up with our practices.
One thing that is not directly part of our API but is something I want to discuss with the team is how this affects deep equality. While we have tests that are failing because it is no longer true for ObjectIds that do not share the same pool that is not the equality we really care about. What matters is that two separate ObjectIds that contain the same representation will turn into equal BSON sequences, that is why our ObjectId.equals method returns true for a number of object shapes/types.
I think this optimization is worth that change, but we'll see what the consensus is among maintainers.
@nbbeeken I've made some more updates with your suggestions.
As for deep equality with ObjectIds, what I've done is added a test function Let me know what you think 😄 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only some small things remain, thanks for this marathon contribution!
@nbbeeken Thanks for bearing with me on this marathon 😄 As always, I appreciate the thorough reviews. Hope we're looking good now 🚀 |
LGTM! I will have the team take a final look over then I think we're good to go! |
Hey @SeanReece, just wanted to update we haven't forgotten about this, we are considering a couple of ways around the deep equality changes and weighing the performance implications of each while still working on our planned work so we appreciate you bearing with us. If it is of any interest we're looking at making the pool/offset non-enumerable, javascript |
@nbbeeken Thanks for the heads up! Just an FYI I noticed a significant hit to memory usage and performance when I tried using private ( FWIW I think defaulting poolSize to 1 is a good choice. Still allows users to configure based on use case, but leaves the deep equality by default. 😃 |
Thanks for looking into that, we were considering real
|
@nbbeeken I believe the performance impact is from some extra operations required for the pool implementation, namely having to copy the buffer into the pool when creating ObjectId from buffer, then when retrieving the ObjectId If poolSize=1
Let me run some benchmarks and see what we can do. |
When you get a chance can you rebase this on main so we can get the same benchmark fixes here? TIA :) |
@nbbeeken Updated 👍 I've been able to get the benchmarks running locally in docker, but haven't had the time to do proper comparison of the results yet. Let me know what the results you're seeing in the benchmarks :) Edit: I can now run bson-bench natively after rebasing on main. Seeing the regressions on my end - think I've got a good fix, just doing some cleanup so I should be able to commit soon. In the meantime @nbbeeken what do you think about adding a |
Hey @SeanReece sorry about the radio silence (it has been a busy week). Thanks for continuing to get the benchmarks running on your end, those results do corroborate what I am seeing; I also still saw a regression in our objectid performance tests but it is smaller than before so we're on the right track. I wasn't aware of how |
Just want to share the current concerns from the team here This is a great contribution, and we are super interested in incorporating it. However, our team has a fully planned quarter ahead, so we won't be able to conduct a thorough review or make final design decisions until next quarter (begins Nov). The PR is currently held up on the following points:
We intend to tackle these issues from our end and don’t expect you to continue work on this if you no longer feel inclined to unless the feedback inspires specific ideas you’d like to propose/implement. We appreciate your patience and continued work on this. cc @dariakp @baileympearson @W-A-James @durran @aditi-khare-mongoDB @addaleax |
Hey @SeanReece, can you rebase/merge (either is fine) main when you get the chance? I have some changes proposed in SeanReece#1 |
@nbbeeken Awesome! Does this mean we can enable the pool by default? :) Or we still want it to be opt-in? |
Great question! Let me pull in poolSize=1000 to the driver's benchmarks to get a holistic picture of the perf results |
Description
This PR significantly improves memory usage for ObjectId by introducing a buffer/Uint8Array pool. This is similar to how NodeJS
Buffer
uses an internal pool when usingallocUnsafe
, but the difference here is instead of instantiating a Buffer/Uint8Array for each ObjectId, each ObjectId holds an offset to it's pool.Why?
Buffer (or TypedArray) is inefficient at storing small amount of data
Buffer of size 12 takes 96 bytes in v8! An ObjectId with buffer is 128 bytes
A pointer and a Number take 16 bytes. An ObjectId using pool is 48 bytes + pool
1000 ObjectId using pool = 60,048 bytes
1000 ObjectIds without pool = 128,000 bytes
~53% reduction in memory usage
Lots more discussion in #703
Memory improvements
Using MFlix sample database. 1 Million documents
Before: 655MB
Now: 369MB
~44% reduction
Performance tests
Since we're still making use of Buffer/Uint8Array the performance is mostly the same. There is a slight regression creating from
Buffer
(since we are copying instead of just saving the reference), but an improvement when creating from Uint8Array (since we do not need to convert to Buffer).Performance ideas
Now that ObjectId allocates its own Buffer/Uint8Array, we technically never need to call
toLocalBufferType
since we know we're always operating on the correct buffer type. Removing that check is outside the scope of this change but would increase performance.Notes
ObjectId.poolSize = 1
essentially disables the pool (Each ObjectId will have its own buffer)buffer
property which returns a Buffer/Uint8Array.Is there new documentation needed for these changes?
Adding
ObjectId.poolSize
to allow users to set their preferred ObjectId pool size. This references the # of ObjectIds that will be stored in each pool, the actual pool allocated with be 12 * poolSize. This defaults to 1000 as that seemed reasonable.What is the motivation for this change?
Release Highlight
Improve memory usage by pooling ObjectId buffers
@SeanReece brought to our attention the fact that
Uint8Array
requires a surprising amount of overhead. The 12 bytes ofObjectId
data actually takes up 96 bytes of memory. You can find an excellent breakdown of what all that overhead is here.In this release, @SeanReece has significantly improved the memory used for each
ObjectId
by introducing aUint8Array
pool of memory. The idea is similar to how Node.js'Buffer
uses an internal pool when usingallocUnsafe
orfrom
APIs that overwrite the returned memory. EachObjectId
uses the current shared pool and stores an offset to where its bytes begin, all operations internal to the class use this offset so there is no impact to the public API.Crunching the numbers
The default
ObjectId.poolSize
is 1000. Since it is measured in ObjectIds this means it has a default size of 12,000 bytes plus the same overhead that eachObjectId
was incurring.128 B
128 KB
128 MB
40 B
+ ~12,000 B
40 KB
+ ~12 KB
40 MB
+ ~12.08 MB
As you can see the new cost for a single
ObjectId
is higher than before but as your data set grows the shared pools lead to huge savings in memory. If the initial cost is too high or your data sets are even larger the pool's size is configurable!ObjectId.poolSize
The shared pool is not created until the first
ObjectId
is constructed so you can modify thepoolSize
to fit your needs:Tip
Setting the poolSize to 1 essentially disables this change.
Tip
You can change the
poolSize
at any time during your program, the next time the current pool's size runs out a new one will be created using the currentpoolSize
value.Thank you so much to @SeanReece for a very thorough and informative bug report and for working so hard on this improvement!
A note about deep equality
ObjectId
s will no longer respond to naive deep equality checks by default. APIs like util.deepStrictEqual and lodash.isEqual ignore what is public or private API and make assumptions about what properties are considered a part of two values' equality.Workarounds
Set
ObjectId.poolSize = 1;
disabling the optimization so now each ObjectId will have its own buffer.Use
objectid.equals(otherOid)
wherever possible.lodash.isEqualWith
allows you to define a customizer that can be used to call the ObjectId'sequals
method.Use
BSON.serialize
if applicable. If you pass the two objects you wish to compare to the BSON serializer, the bytes returned will be exactly equal if the documents are the same on the server.Double check the following
npm run check:lint
scripttype(NODE-xxxx)[!]: description
feat(NODE-1234)!: rewriting everything in coffeescript