-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Buffer equals when comparing ObjectId #287
Conversation
Update ObjectId #equals to use internal Node.js buffer #equals function when comparing two ObjectId objects.
Compare last random bytes from ObjectId prior to doing a full ObjectId comparison.
I'm not sure how far we want to go with this in term of optimization, but I've just pushed an updated version that checks the last random bytes from ObjectId before doing a full equality check. In my testing, the first version was about ~200% faster (doing thousands of comparisons), and the refined version that adds the 12th byte check is another ~200% faster. |
@siboulet how much of a performance impact are you seeing? Is the toString call taking up a significant part of your execution? |
@daprahamian i'm seeing a huge impact ! 200%+ slowdown when dealing with thousands of documents. toString() was the culpit. Why toString() when we can use Buffer |
@siboulet I'm mostly asking about the performance impact in ms. It may take twice as long, but a 2x increase on a negligible number can still be negligible. The main hesitation on our part is that we need |
@daprahamian OK let me try to come up with some benchmark. My use case is on Node.js side doing filtering of a large array of ObjectId. |
@siboulet could you also give us a little more details about your actual use case here? Specifically, how do you find yourself comparing object id's this often? Perhaps a small code same could illuminate the case for us all! |
Any update on this one? The performance difference is huge, and I though it was quite common to filter, find something by id. Sample test const ObjectId = require("bson").ObjectId;
const data = [];
const id = "5c9031000000000000000000";
const lastId = new ObjectId(id);
const runs = 1000000;
for (let i = 1; i < runs; i++) {
data.push(new ObjectId());
}
data.push(lastId);
const results = [];
function runFilter(name, cb) {
const now = Date.now();
let length = data.filter(cb).length;
results.push({ name, length, time: Date.now() - now });
}
console.log(`Running tests on ${data.length} ids array`);
runFilter("Filter toString to string", s => s.toString() === id);
runFilter("Filter toStringBoth", s => s.toString() === lastId.toString());
runFilter("Filter String() Both", s => String(s) === String(lastId));
runFilter("Filter equals objectId", s => s.equals(lastId));
runFilter("Filter equals to string", s => s.equals(id));
runFilter("Filter direct id", s => s.id === lastId.id);
runFilter(
"Filter buffer equals",
s => s.id[11] === lastId.id[11] && s.id.equals(lastId.id)
);
console.log(JSON.stringify(results, null, 2)); Results: [
{
"name": "Filter toString to string",
"length": 1,
"time": 1905
},
{
"name": "Filter toStringBoth",
"length": 1,
"time": 2903
},
{
"name": "Filter String() Both",
"length": 1,
"time": 3126
},
{
"name": "Filter equals objectId",
"length": 1,
"time": 3030
},
{
"name": "Filter equals to string",
"length": 1,
"time": 2103
},
{
"name": "Filter direct id",
"length": 1,
"time": 20
},
{
"name": "Filter buffer equals",
"length": 1,
"time": 26
}
] Sandbox example: https://codesandbox.io/s/mystifying-cookies-7tlv3?file=/src/index.js We are talking about ~20x-100x on suggested current equals implementation. It has been a real performance issue where _id conversion to string was taking significant % of the execution time, to the point that we were trying to use other properties for filter/find functionality in nodejs. |
What happens if you compare byte by byte instead of using equals? Would it slow down? If it is working properly it could be merged as it is not only working in node |
const ObjectId = require("./lib/objectid").ObjectId;
const data = [];
const id = new ObjectId().toString();
const lastId = new ObjectId();
const runs = 10000000;
for (let i = 1; i < runs; i++) {
data.push(newObjectId());
}
data.push(lastId);
const results = [];
function runFilter(name, cb) {
constnow = Date.now();
letlength = data.filter(cb).length;
results.push(\{ name, length, time:Date.now() - now });
}
console.log(`Running tests on ${data.length} ids array`);
runFilter("Filter toString to string", s => s.toString() === id);
runFilter("Filter toStringBoth", s => s.toString() === lastId.toString());
runFilter("Filter String() Both", s => String(s) === String(lastId));
runFilter("Filter equals objectId", s => s.equals(lastId));
runFilter("Filter equals to string", s => s.equals(id));
runFilter("Filter direct id", s => s.id === lastId.id);
runFilter(
"Filter buffer equals",
s=>Buffer.compare(s.id, lastId.id) === 0
);
runFilter(
"Check all bytes in order",
s=> (
// first two timestamp bytes
s.id[0] === lastId.id[0] &&
s.id[1] === lastId.id[1] &&
// last two timestamp bytes
s.id[2] === lastId.id[2] &&
s.id[3] === lastId.id[3] &&
// three bytes machine id
s.id[4] === lastId.id[4] &&
s.id[5] === lastId.id[5] &&
s.id[6] === lastId.id[6] &&
// two bytes process id
s.id[7] === lastId.id[7] &&
s.id[8] === lastId.id[8] &&
// last three incrementor bytes
s.id[9] === lastId.id[9] &&
s.id[10] === lastId.id[10] &&
s.id[11] === lastId.id[11]
)
);
runFilter(
"Check all bytes reverse",
s=> (
// last three incrementor bytes
s.id[11] === lastId.id[11] &&
s.id[10] === lastId.id[10] &&
s.id[9] === lastId.id[9] &&
// two bytes process id
s.id[8] === lastId.id[8] &&
s.id[7] === lastId.id[7] &&
// three bytes machine id
s.id[6] === lastId.id[6] &&
s.id[5] === lastId.id[5] &&
s.id[4] === lastId.id[4] &&
// last two timestamp bytes
s.id[3] === lastId.id[3] &&
s.id[2] === lastId.id[2] &&
// first two timestamp bytes
s.id[1] === lastId.id[1] &&
s.id[0] === lastId.id[0]
)
);
runFilter(
"Check all bytes in significance order",
// last three incrementor bytes
s=> (
s.id[11] === lastId.id[11] &&
s.id[10] === lastId.id[10] &&
s.id[9] === lastId.id[9] &&
// last two timestamp bytes
s.id[3] === lastId.id[3] &&
s.id[2] === lastId.id[2] &&
s.id[1] === lastId.id[1] &&
s.id[0] === lastId.id[0] &&
// two bytes process id
s.id[8] === lastId.id[8] &&
s.id[7] === lastId.id[7] &&
// three bytes machine id
s.id[6] === lastId.id[6] &&
s.id[5] === lastId.id[5] &&
s.id[4] === lastId.id[4]
// first two timestamp bytes
)
);
runFilter(
"Check the most significant bytes first",
s=> (
// last two timestamp bytes
s.id[3] === lastId.id[3] &&
// s.id[2] === lastId.id[2] &&
// last three incrementor bytes
s.id[11] === lastId.id[11] &&
s.id[10] === lastId.id[10] &&
s.id[9] === lastId.id[9] &&
Buffer.compare(s.id, lastId.id) === 0
)
);
runFilter(
"Filter buffer using equals",
s=>s.id[11] === lastId.id[11] && s.id.equals(lastId.id)
);
runFilter(
"Filter buffer using compare",
s=>s.id[11] === lastId.id[11] && Buffer.compare(s.id, lastId.id) === 0
);
console.log(JSON.stringify(results, null, 2)); And the result:
|
Created a followup PR #478 |
Closing this in favor of #478 that does not have merge conflicts |
This PR updates ObjectId #equals to use internal Node.js buffer #equals function when comparing two ObjectId objects.
While tracing performance issue in our MongoDB based app, I found #equals was calling toString() when comparing two ObjectId, which turns out to be quite inefficient. In our own testing this change appears to be much, much faster when comparing two ObjectIds.