Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer/Uint8Array using lots of memory #173

Closed
SeanReece opened this issue Jun 13, 2024 · 6 comments
Closed

Buffer/Uint8Array using lots of memory #173

SeanReece opened this issue Jun 13, 2024 · 6 comments

Comments

@SeanReece
Copy link

SeanReece commented Jun 13, 2024

Tested in NodeJS Versions: v22.3.0, v20.14.0, v18.20.3, v16.20.2

It appears as though Buffer/Uint8Array consumes much more memory than I would expect. This is particularly obvious with many small instances.

For example:

const data = new Uint8Array(12) <-- I would think this would consume ~12bytes

It appears to have a shallow size of 96bytes and retains 196bytes
image

I'm not sure if this is a V8 issue but when I try the same in Chrome 126 I see a similar issue but it uses slightly less memory

image

Why this is an issue

I stumbled on this while trying to profile memory issues while pulling large amounts of MongoDB documents into memory, even projecting the documents to just return 2 ObjectIds each (we're building potentially large graphs in memory from the links).

A BSON ObjectId is 12 bytes. So we estimated ~24MB per million edges. (maybe a bit more for object overhead etc)
In reality this uses almost 500MB

At first I thought this was an issue with BSON's implementation but this can be recreated using Uint8Array directly.

Try it out

const arr = []

const heapBefore = process.memoryUsage().heapUsed
for (let i = 0; i < 2000000; i++) {
  arr.push(new Uint8Array(12)) // Same with Buffer.alloc(12)
}
const heapAfter = process.memoryUsage().heapUsed   // Not super accurate but illustrates the issue
const size = Math.round((heapAfter - heapBefore) / 1024 / 1024)
console.log(`Used ${size}MB to store ${arr.length} Uint8Array(12)`)
// Used 473MB to store 2000000 Uint8Array(12)

It doesn't appear that the memory used increases much with the size of the Uint8Array. Doubling the size of each Uint8Array from 12 -> 24 only increases the memory usage to 485MB in the above test. This tells me there's probably some overhead in the data structure itself than some data being duplicated or something.

Curiously, when I try the same thing with Buffer.from(new Uint8Array(12)) it only outputs ~240MB. I assume this is because buffer doesn't keep a reference to something(?) and GC happens sometime before capturing heapUsed.

See below when using Buffer.from(new Uint8Array(12)) it retains 100bytes less 🤔
Screenshot 2024-06-13 at 2 15 53 PM

Thanks

Big thanks to the Node.js Performance Team in advance. You're doing amazing work 👍 Please let me know if this is an issue with V8 directly or if this is completely expected behaviour. It really caught me off guard.

@lemire
Copy link
Member

lemire commented Jun 13, 2024

const data = new Uint8Array(12) <-- I would think this would consume ~12bytes

I would not make this assumption. I would expect at least, say, 48 bytes and up to 256 bytes even for an empty Uint8Array instance.

Have a look at these blog posts:

Merely storing a single integer in a set in C++ can take 32 bytes !!!

There is just no way that creating a whole new Uint8Array instance is nearly free even if it were empty.

Now, if you create sizeable Uint8Array (e.g., 128 bytes), you should expect that the array buffers would grow by roughly 128 bytes, but even there, you are discarding the instance overhead.

Can you run the following code and tell me what you get?

var arr = new Array();
let count = 0;
let unit = 128;
for(let i = 0; i < 10000; i++) {
  arr.push(new Uint8Array(unit));
  count += unit;
  console.log(count+" "+process.memoryUsage().arrayBuffers+" "+process.memoryUsage().arrayBuffers/count);
}

I stumbled on this while trying to profile memory issues while pulling large amounts of MongoDB documents into memory, even projecting the documents to just return 2 ObjectIds each (we're building potentially large graphs in memory from the links). A BSON ObjectId is 12 bytes. So we estimated ~24MB per million edges. (maybe a bit more for object overhead etc)
In reality this uses almost 500MB

I would allocate a buffer new Uint8Array(24000000) and then store my ObjectIds at index 0, 12, 24, ...

@H4ad
Copy link
Member

H4ad commented Jun 13, 2024

Buffer.from uses internal pool to avoid allocating many small buffers, maybe this is helping reducing the memory allocation a little bit.

@SeanReece
Copy link
Author

Thanks for the info @lemire. You're correct that there seems to be lots of overhead for each TypedArray created, and creating a single large typed array really does only consume the memory I was expecting.

I've been doing some digging and found this interesting explanation from a V8 developer:

https://stackoverflow.com/questions/45803829/memory-overhead-of-typed-arrays-vs-strings/45808835#45808835

I also tried the same with ArrayBuffers + DataView with very slightly better memory efficiency. But that is somewhat moot since ObjectIds can be represented as a 24 character hex string, which only consumes 40 bytes in V8, which is much better than Buffer consuming 96 bytes to represent the same raw 12 bytes.

I would allocate a buffer new Uint8Array(24000000) and then store my ObjectIds at index 0, 12, 24, ...

I don't really have much control over this in our implementation since bson is instantiating lots of Buffers under the hood.

Do you know of any good libraries for managing disparate data within a large arrayBuffer? There's some complexity around removing unused elements and redistributing the available space.

Thanks again for your insight here. I think we can close this since it does not seem to be an NodeJS issue directly.

@lemire
Copy link
Member

lemire commented Jul 1, 2024

I don't really have much control over this in our implementation since bson is instantiating lots of Buffers under the hood.

You can grab the returned buffer and copy it to your own larger buffer.

There's some complexity around removing unused elements and redistributing the available space.

Your project does end up looking like you are trying to build your own custom database engine... which is unavoidably going to require some engineering effort.

@joyeecheung
Copy link
Member

joyeecheung commented Jul 1, 2024

FWIW when I investigated nodejs/node#53579 I noticed that even an empty array buffer in V8 takes 88 bytes, which is surprisingly big if you ask me. But that also has something to do with us not turning on pointer compression + V8 sandbox (otherwise it would've been ~44 bytes). Also not all the fields are strictly necessary for all array buffers but they are there in advance, or there should've been some clever ways to encode them to save space. But that could incur additional code complexity in V8 that makes it not worth it, and it's mostly a V8 issue.

@lemire
Copy link
Member

lemire commented Jul 1, 2024

@joyeecheung So an empty buffer is made of 11 pointers? That sounds like a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants