Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slower than JSON.parse #28

Open
dalisoft opened this issue Apr 3, 2020 · 20 comments
Open

Slower than JSON.parse #28

dalisoft opened this issue Apr 3, 2020 · 20 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed performance issue Performance of project is being affected

Comments

@dalisoft
Copy link

dalisoft commented Apr 3, 2020

Hi @luizperes

I know this library was made to handle large JSON files, but there i occurred to some performance stranges when tried to parse my json and benchmarked, this library is slow, up to 6-x slower.

Here result:

json.parse - large: 985.011ms
simdjson - large: 4.756s

Also, lazyParse does not return expected result for me or i'm doing something wrong, and even using lazyParse, performance still slow. How we can improve this?

Code to test

const simdJson = require("simdjson");

const bench = (name, fn) => {
  console.time(name);
  for (let i = 0; i < 200000; i++) {
    fn();
  }
  console.timeEnd(name);
};

// Create large JSON file
let JSON_BUFF_LARGE = {};
for (let i = 0; i < 20; i++) { // 20 is close to 0.5Kb which very small, but you can increase this value
  JSON_BUFF_LARGE["key_" + i] = Math.round(Math.random() * 1e16).toString(16);
}
JSON_BUFF_LARGE = JSON.stringify(JSON_BUFF_LARGE);

console.log(
  "JSON buffer LARGE size is ",
  parseFloat(JSON_BUFF_LARGE.length / 1024).toFixed(2),
  "Kb"
);

bench("json.parse - large", () => JSON.parse(JSON_BUFF_LARGE));
bench("simdjson - large", () => simdJson.parse(JSON_BUFF_LARGE));
@dalisoft
Copy link
Author

dalisoft commented Apr 3, 2020

Tested on AVX supported device too

Benchmark

@luizperes
Copy link
Owner

Hi @dalisoft,
simdjson.parse is always slower as expected. Please take a look at issue #5 and the Documentation.md file.

Thank you so much for making your tests available to me so I could save some time. As you mentioned, simdjson is not doing better than the standard JSON for your case. It could be that there is something related to parsing numbers (which is a little slow in simdjson, but it should still be faster). In your case you are generating random numbers and your json string JSON_BUFF_LARGE could probably be difficult for parsing (in simdjson), but it shouldn't be the case. I am speculating that it could be a problem with the wrapper (only if there is something very wrong) or some sort of bug on the upstream (explanation below).

I changed the parameters of the code you asked me to test. Instead of a 0.5Kb file, I am using a +25kb file. just change replace i < 20 with i < 200000.

For all three functions of simdjson, here is the output I get (my machine is an AVX2):

simdjson.parse
JSON buffer LARGE size is  25.78 Kb
json.parse - large: 33672.126ms
simdjson - large: 159626.570ms
simdjon.lazyParse

For this case, lazyParse is faster (by around 30%) than the standard JSON.

JSON buffer LARGE size is  25.81 Kb
json.parse - large: 33321.596ms
simdjson - large: 21988.679ms
simjson.isValid

isValid is nearly the same thing as lazyParse, as lazyParse only validates the json but does not construct the JS object, so they both should be running in around the same speed. I will check that to see if this is a problem in the wrapper (likely) or the upstream (by running it without the wrapper and getting its perf stat)

JSON buffer LARGE size is  25.80 Kb
json.parse - large: 33484.594ms
simdjson - large: 5665.534ms

One interesting thing that you will see with simdjson is that it scales well and becomes much faster than regular state-machine-parsing algorithms. But as stated above, there is something wrong going on. I will only have time to check around the third week of April.

Thanks again for the contribution!

cc @lemire

@luizperes
Copy link
Owner

Oh, here is the usage of lazyParse:

const simdjson = require('simdjson');

const jsonString = "{ \"foo\": { \"bar\": [ 0, 42 ] } }";
const JSONbuffer = simdjson.lazyParse(jsonString); // external (C++) parsed JSON object
console.log(JSONbuffer.valueForKeyPath("foo.bar[1]")); // 42

See that it does not construct the actual JS object, it keeps an external pointer to the C++ buffer and for this reason, you can only access the keys with the valueForKeyPath function that is returned in the object.

@lemire
Copy link
Contributor

lemire commented Apr 4, 2020

It could be that there is something related to parsing numbers (which is a little slow in simdjson, but it should still be faster)

It is generally a challenging task but simdjson should still be faster than the competition.

@lemire
Copy link
Contributor

lemire commented Apr 4, 2020

I know this library was made to handle large JSON file

The simdjson library itself is faster even on small files.

@lemire
Copy link
Contributor

lemire commented Apr 4, 2020

cc @croteaucarine @jkeiser

@dalisoft
Copy link
Author

dalisoft commented Apr 4, 2020

Is i'm doing something wrong or at cost of bindings performance isn't what i want.

@luizperes

const simdjson = require('simdjson');

const jsonString = "{ \"foo\": { \"bar\": [ 0, 42 ] } }";
const JSONbuffer = simdjson.lazyParse(jsonString); // external (C++) parsed JSON object
console.log(JSONbuffer.valueForKeyPath("foo.bar[1]")); // 42

I see it's good, but i'm using for other case

const simdjson = require('simdjson');

// some code
// all below code is repeated a lot of times
const JSONbuffer = simdjson.lazyParse(req.body); // req.body - JSON string
console.log(JSONbuffer.valueForKeyPath("")); // To get all object

I want use this library within my backend framework for node.js as JSON.parse alternative for higher performance, but performance only slow downing

@luizperes
Copy link
Owner

Hi @dalisoft, I see your point now. There are only a few cases where you actually need the whole json, but for the root case, it should have the same performance for this wrapper.

I will think of new approaches to improve the library but will only be able to do it in the future. I also will take a close look to the repo https://github.com/croteaucarine/simdjson_node_objectwrap @croteaucarine. She’s working on improvements for the wrapper. I will leave this issue open until we fix it. Hopefully it won’t take (that) long. Cheers!

@dalisoft
Copy link
Author

dalisoft commented Apr 4, 2020

@luizperes Thanks, i'll wait :)

@luizperes
Copy link
Owner

Note to self: there are a few leads on PR #33

@luizperes luizperes self-assigned this Apr 10, 2020
@luizperes luizperes added enhancement New feature or request help wanted Extra attention is needed performance issue Performance of project is being affected labels Apr 10, 2020
@xamgore
Copy link

xamgore commented Dec 15, 2020

@luizperes did you consider the way node-expat has chosen?

@luizperes
Copy link
Owner

@xamgore can you elaborate your question?

@dalisoft
Copy link
Author

@luizperes Hi
For better debugging you can try https://github.com/nanoexpress/json-parse-expirement

@xamgore
Copy link

xamgore commented Dec 22, 2020

@luizperes with node-expat you can add js callbacks for events like "opening tag", "new attribute with name x", etc, so only the required properties are picked, copied and passed back to the javascript thread.

It's a contrary to the method of a smart proxy object, and still doesn't require a big amount of data to be passed between the addon and v8.

@RichardWright
Copy link

So just to confirm, if you want to get the entire object from a string(eg lazy usage isn't possible), this probably isn't the library to use in it's current state?

@lemire
Copy link
Contributor

lemire commented Jan 28, 2022

@RichardWright I cannot speak specifically for this library but one general concern is that constructing the full JSON representation in JavaScript, with all the objects, strings, arrays... is remarkably expensive. In some respect, that's independent from JSON.

cc @jkeiser

@RichardWright
Copy link

Passing around a buffer and using key access is the preferred method then?

@luizperes
Copy link
Owner

luizperes commented Jan 28, 2022

That is correct @RichardWright. My idea, as mentioned in other threads, would be to have simdjson implemented as the native json parser directly into the engine e.g V8. That would possibly speed up the process. At this moment I am finishing my masters thesis and don't have time to try it, so we will have to wait a little bit on that. :)

CC @jkeiser @lemire

@Uzlopak
Copy link

Uzlopak commented Jan 28, 2022

CC @mcollina

@RichardWright
Copy link

@luizperes cool, thanks for the response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed performance issue Performance of project is being affected
Projects
None yet
Development

No branches or pull requests

6 participants