Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Buffer equals when comparing ObjectId #287

Closed
wants to merge 2 commits into from

Conversation

siboulet
Copy link

@siboulet siboulet commented Nov 6, 2018

This PR updates ObjectId #equals to use internal Node.js buffer #equals function when comparing two ObjectId objects.

While tracing performance issue in our MongoDB based app, I found #equals was calling toString() when comparing two ObjectId, which turns out to be quite inefficient. In our own testing this change appears to be much, much faster when comparing two ObjectIds.

Update ObjectId #equals to use internal Node.js buffer #equals function
when comparing two ObjectId objects.
Compare last random bytes from ObjectId prior to doing a full ObjectId
comparison.
@siboulet
Copy link
Author

siboulet commented Nov 6, 2018

I'm not sure how far we want to go with this in term of optimization, but I've just pushed an updated version that checks the last random bytes from ObjectId before doing a full equality check. In my testing, the first version was about ~200% faster (doing thousands of comparisons), and the refined version that adds the 12th byte check is another ~200% faster.

@daprahamian
Copy link
Contributor

daprahamian commented Nov 12, 2018

@siboulet how much of a performance impact are you seeing? Is the toString call taking up a significant part of your execution?

@siboulet
Copy link
Author

@daprahamian i'm seeing a huge impact ! 200%+ slowdown when dealing with thousands of documents. toString() was the culpit. Why toString() when we can use Buffer equals?

@daprahamian
Copy link
Contributor

@siboulet I'm mostly asking about the performance impact in ms. It may take twice as long, but a 2x increase on a negligible number can still be negligible.

The main hesitation on our part is that we need bson to run in the browser, where Buffer is a polyfill. Do you have performance numbers for this change in the browser version of bson?

@siboulet
Copy link
Author

@daprahamian OK let me try to come up with some benchmark. My use case is on Node.js side doing filtering of a large array of ObjectId.

@mbroadst
Copy link
Member

@siboulet could you also give us a little more details about your actual use case here? Specifically, how do you find yourself comparing object id's this often? Perhaps a small code same could illuminate the case for us all!

@kYem
Copy link

kYem commented May 10, 2020

Any update on this one?

The performance difference is huge, and I though it was quite common to filter, find something by id.

Sample test

const ObjectId = require("bson").ObjectId;

const data = [];
const id = "5c9031000000000000000000";
const lastId = new ObjectId(id);

const runs = 1000000;
for (let i = 1; i < runs; i++) {
  data.push(new ObjectId());
}
data.push(lastId);
const results = [];

function runFilter(name, cb) {
  const now = Date.now();
  let length = data.filter(cb).length;
  results.push({ name, length, time: Date.now() - now });
}

console.log(`Running tests on ${data.length} ids array`);
runFilter("Filter toString to string", s => s.toString() === id);
runFilter("Filter toStringBoth", s => s.toString() === lastId.toString());
runFilter("Filter String() Both", s => String(s) === String(lastId));
runFilter("Filter equals objectId", s => s.equals(lastId));
runFilter("Filter equals to string", s => s.equals(id));
runFilter("Filter direct id", s => s.id === lastId.id);
runFilter(
  "Filter buffer equals",
  s => s.id[11] === lastId.id[11] && s.id.equals(lastId.id)
);
console.log(JSON.stringify(results, null, 2));

Results:

[
  {
    "name": "Filter toString to string",
    "length": 1,
    "time": 1905
  },
  {
    "name": "Filter toStringBoth",
    "length": 1,
    "time": 2903
  },
  {
    "name": "Filter String() Both",
    "length": 1,
    "time": 3126
  },
  {
    "name": "Filter equals objectId",
    "length": 1,
    "time": 3030
  },
  {
    "name": "Filter equals to string",
    "length": 1,
    "time": 2103
  },
  {
    "name": "Filter direct id",
    "length": 1,
    "time": 20
  },
  {
    "name": "Filter buffer equals",
    "length": 1,
    "time": 26
  }
] 

Sandbox example: https://codesandbox.io/s/mystifying-cookies-7tlv3?file=/src/index.js

We are talking about ~20x-100x on suggested current equals implementation.
In nodejs its closer to 10x-20x improvements.

It has been a real performance issue where _id conversion to string was taking significant % of the execution time, to the point that we were trying to use other properties for filter/find functionality in nodejs.

@pooooodles pooooodles added the tracked-in-jira There is a ticket in Mongo's Jira instance tracking this issue/PR label Apr 8, 2021
@pooooodles
Copy link

https://jira.mongodb.org/browse/NODE-3015

@Uzlopak
Copy link
Contributor

Uzlopak commented Dec 29, 2021

What happens if you compare byte by byte instead of using equals? Would it slow down? If it is working properly it could be merged as it is not only working in node

@Uzlopak
Copy link
Contributor

Uzlopak commented Jan 19, 2022

const ObjectId = require("./lib/objectid").ObjectId;

const data = [];
const id = new ObjectId().toString();
const lastId = new ObjectId();

const runs = 10000000;
for (let i = 1; i < runs; i++) {
data.push(newObjectId());
}
data.push(lastId);
const results = [];

function runFilter(name, cb) {
constnow = Date.now();
letlength = data.filter(cb).length;
results.push(\{ name, length, time:Date.now() - now });
}

console.log(`Running tests on ${data.length} ids array`);
runFilter("Filter toString to string", s => s.toString() === id);
runFilter("Filter toStringBoth", s => s.toString() === lastId.toString());
runFilter("Filter String() Both", s => String(s) === String(lastId));
runFilter("Filter equals objectId", s => s.equals(lastId));
runFilter("Filter equals to string", s => s.equals(id));
runFilter("Filter direct id", s => s.id === lastId.id);
runFilter(
"Filter buffer equals",
s=>Buffer.compare(s.id, lastId.id) === 0
);
runFilter(
"Check all bytes in order",
s=> (
// first two timestamp bytes
s.id[0] === lastId.id[0] &&
s.id[1] === lastId.id[1] &&
// last two timestamp bytes
s.id[2] === lastId.id[2] &&
s.id[3] === lastId.id[3] &&
// three bytes machine id
s.id[4] === lastId.id[4] &&
s.id[5] === lastId.id[5] &&
s.id[6] === lastId.id[6] &&
// two bytes process id
s.id[7] === lastId.id[7] &&
s.id[8] === lastId.id[8] &&
// last three incrementor bytes
s.id[9] === lastId.id[9] &&
s.id[10] === lastId.id[10] &&
s.id[11] === lastId.id[11]
)
);
runFilter(
"Check all bytes reverse",
s=> (
// last three incrementor bytes
s.id[11] === lastId.id[11] &&
s.id[10] === lastId.id[10] &&
s.id[9] === lastId.id[9] &&
// two bytes process id
s.id[8] === lastId.id[8] &&
s.id[7] === lastId.id[7] &&
// three bytes machine id
s.id[6] === lastId.id[6] &&
s.id[5] === lastId.id[5] &&
s.id[4] === lastId.id[4] &&

// last two timestamp bytes
s.id[3] === lastId.id[3] &&
s.id[2] === lastId.id[2] &&
// first two timestamp bytes
s.id[1] === lastId.id[1] &&
s.id[0] === lastId.id[0]
)
);
runFilter(
"Check all bytes in significance order",
// last three incrementor bytes
s=> (
s.id[11] === lastId.id[11] &&
s.id[10] === lastId.id[10] &&
s.id[9] === lastId.id[9] &&

// last two timestamp bytes
s.id[3] === lastId.id[3] &&
s.id[2] === lastId.id[2] &&
s.id[1] === lastId.id[1] &&
s.id[0] === lastId.id[0] &&
// two bytes process id
s.id[8] === lastId.id[8] &&
s.id[7] === lastId.id[7] &&
// three bytes machine id
s.id[6] === lastId.id[6] &&
s.id[5] === lastId.id[5] &&
s.id[4] === lastId.id[4]
// first two timestamp bytes
)
);
runFilter(
"Check the most significant bytes first",
s=> (
// last two timestamp bytes
s.id[3] === lastId.id[3] &&
// s.id[2] === lastId.id[2] &&
 
// last three incrementor bytes
s.id[11] === lastId.id[11] &&
s.id[10] === lastId.id[10] &&
s.id[9] === lastId.id[9] &&
 
Buffer.compare(s.id, lastId.id) === 0
)
);
runFilter(
"Filter buffer using equals",
s=>s.id[11] === lastId.id[11] && s.id.equals(lastId.id)
);
runFilter(
"Filter buffer using compare",
s=>s.id[11] === lastId.id[11] && Buffer.compare(s.id, lastId.id) === 0
);
console.log(JSON.stringify(results, null, 2));

And the result:

Running tests on 10000000 ids array
[
  {
    "name": "Filter toString to string",
    "length": 0,
    "time": 1650
  },
  {
    "name": "Filter toStringBoth",
    "length": 1,
    "time": 3018
  },
  {
    "name": "Filter String() Both",
    "length": 1,
    "time": 4747
  },
  {
    "name": "Filter equals objectId",
    "length": 1,
    "time": 2997
  },
  {
    "name": "Filter equals to string",
    "length": 0,
    "time": 9798
  },
  {
    "name": "Filter direct id",
    "length": 1,
    "time": 104
  },
  {
    "name": "Filter buffer equals",
    "length": 1,
    "time": 972
  },
  {
    "name": "Check all bytes in order",
    "length": 1,
    "time": 188
  },
  {
    "name": "Check all bytes reverse",
    "length": 1,
    "time": 175
  },
  {
    "name": "Check all bytes in significance order",
    "length": 1,
    "time": 177
  },
  {
    "name": "Check the most significant bytes first",
    "length": 1,
    "time": 177
  },
  {
    "name": "Filter buffer using equals",
    "length": 1,
    "time": 172
  },
  {
    "name": "Filter buffer using compare",
    "length": 1,
    "time": 173
  }
]

@Uzlopak
Copy link
Contributor

Uzlopak commented Jan 20, 2022

Created a followup PR #478

@dariakp
Copy link
Contributor

dariakp commented Jan 20, 2022

Closing this in favor of #478 that does not have merge conflicts

@dariakp dariakp closed this Jan 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tracked-in-jira There is a ticket in Mongo's Jira instance tracking this issue/PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants