Store parser directly instead of extracting document #36

jkeiser · 2020-04-16T17:47:06Z

This gets rid of the std::move(document) by just storing the parser. It obviously requires the parser to be new'd up. It definitely doesn't make up the difference and it's even pretty hard to tell if it helps--there's some things better and some things worse. But it could at least eliminate that as a source of confusion.

I'd love to see if other folks get more reproducible results out of this :/

This PR

filename	filesize (MB)	JSON.parse(ms)	simdjson.lazyParse (ms)	JSON.parse (GB/s)	simdjson.lazyParse (GB/s)	X faster
apache_builds.json	0.13	0.516	0.345	0.25	0.37	1.50
canada.json	2.25	26.683	10.003	0.08	0.23	2.67
citm_catalog.json	1.73	11.995	6.859	0.14	0.25	1.75
github_events.json	0.07	0.575	0.336	0.11	0.19	1.71
gsoc_2018.json	3.33	20.406	10.242	0.16	0.32	1.99
instruments.json	0.22	0.876	1.145	0.25	0.19	0.77
marine_ik.json	2.98	21.626	38.259	0.14	0.08	0.57
mesh_pretty.json	1.58	7.558	18.535	0.21	0.09	0.41
mesh.json	0.72	5.240	6.754	0.14	0.11	0.78
numbers.json	0.15	1.072	1.289	0.14	0.12	0.83
random.json	0.51	7.396	7.477	0.07	0.07	0.99
sf_citylots.json	189.78	2517.173	2196.714	0.08	0.09	1.15
twitter.json	0.63	7.515	5.221	0.08	0.12	1.44
twitterescaped.json	0.56	2.599	4.522	0.22	0.12	0.57
update_center.json	0.53	7.046	5.282	0.08	0.10	1.33

master

filename	filesize (MB)	JSON.parse(ms)	simdjson.lazyParse (ms)	JSON.parse (GB/s)	simdjson.lazyParse (GB/s)	X faster
apache_builds.json	0.13	0.520	0.326	0.24	0.39	1.60
canada.json	2.25	21.146	7.168	0.11	0.31	2.95
citm_catalog.json	1.73	9.960	6.146	0.17	0.28	1.62
github_events.json	0.07	0.480	0.324	0.14	0.20	1.48
gsoc_2018.json	3.33	16.266	9.655	0.20	0.34	1.68
instruments.json	0.22	0.707	1.104	0.31	0.20	0.64
marine_ik.json	2.98	20.904	22.975	0.14	0.13	0.91
mesh_pretty.json	1.58	6.048	7.506	0.26	0.21	0.81
mesh.json	0.72	4.118	7.391	0.18	0.10	0.56
numbers.json	0.15	0.903	1.210	0.17	0.12	0.75
random.json	0.51	6.169	5.681	0.08	0.09	1.09
sf_citylots.json	189.78	1886.632	1582.920	0.10	0.12	1.19
twitter.json	0.63	10.139	3.606	0.06	0.18	2.81
twitterescaped.json	0.56	2.237	2.889	0.25	0.19	0.77
update_center.json	0.53	5.976	4.634	0.09	0.12	1.29

luizperes · 2020-04-16T18:16:28Z

Hi @jkeiser,

I checked out your branch (locally) and the performance gets a little worse than before. Here is what the first item of the benchmark looks like:

with `std::move`

apache_builds.json#simdjson x 4,071 ops/sec ±11.20% (65 runs sampled) => 0.246ms

with this PR

apache_builds.json#simdjson x 2,843 ops/sec ±8.41% (60 runs sampled) => 0.352ms

what we we believe we can achieve

apache_builds.json#simdjson x 8,578 ops/sec ±0.56% (93 runs sampled) => 0.117ms

It seems that using a new on the parser is even more costly than before?

jkeiser · 2020-04-16T18:22:29Z

Yeah, it depends what file you're using. Some get worse, some get better. It actually might be real, and I have no explanation. I ran a few more times and threw out the highest results for each, and got this:

This PR

filename	filesize (MB)	JSON.parse(ms)	simdjson.lazyParse (ms)	JSON.parse (GB/s)	simdjson.lazyParse (GB/s)	X faster
apache_builds.json	0.13	0.516	0.345	0.25	0.37	1.50
apache_builds.json	0.13	0.520	0.345	0.24	0.37	1.51
canada.json	2.25	25.081	9.492	0.09	0.24	2.64
canada.json	2.25	24.650	10.805	0.09	0.21	2.28
citm_catalog.json	1.73	11.940	6.488	0.14	0.27	1.84
citm_catalog.json	1.73	11.588	6.507	0.15	0.27	1.78
github_events.json	0.07	0.575	0.336	0.11	0.19	1.71
github_events.json	0.07	0.583	0.330	0.11	0.20	1.76
gsoc_2018.json	3.33	20.303	10.065	0.16	0.33	2.02
gsoc_2018.json	3.33	18.715	7.681	0.18	0.43	2.44
instruments.json	0.22	0.876	1.145	0.25	0.19	0.77
instruments.json	0.22	0.874	1.479	0.25	0.15	0.59
marine_ik.json	2.98	21.626	38.259	0.14	0.08	0.57
marine_ik.json	2.98	20.934	29.120	0.14	0.10	0.72
mesh_pretty.json	1.58	7.558	18.535	0.21	0.09	0.41
mesh_pretty.json	1.58	7.542	16.423	0.21	0.10	0.46
mesh.json	0.72	5.240	6.754	0.14	0.11	0.78
mesh.json	0.72	4.772	7.805	0.15	0.09	0.61
numbers.json	0.15	1.072	1.289	0.14	0.12	0.83
numbers.json	0.15	1.107	0.887	0.14	0.17	1.25
random.json	0.51	7.396	7.477	0.07	0.07	0.99
random.json	0.51	7.192	9.362	0.07	0.05	0.77
sf_citylots.json	189.78	2517.173	2196.714	0.08	0.09	1.15
sf_citylots.json	189.78	2256.772	2363.868	0.08	0.08	0.95
twitter.json	0.63	7.515	5.221	0.08	0.12	1.44
twitter.json	0.63	5.951	6.967	0.11	0.09	0.85
twitterescaped.json	0.56	2.599	4.522	0.22	0.12	0.57
twitterescaped.json	0.56	2.798	3.508	0.20	0.16	0.80
update_center.json	0.53	7.046	5.282	0.08	0.10	1.33
update_center.json	0.53	7.062	5.715	0.08	0.09	1.24

master

filename	filesize (MB)	JSON.parse(ms)	simdjson.lazyParse (ms)	JSON.parse (GB/s)	simdjson.lazyParse (GB/s)	X faster
apache_builds.json	0.13	0.525	0.276	0.24	0.46	1.91
apache_builds.json	0.13	0.525	0.276	0.24	0.46	1.91
canada.json	2.25	24.391	8.598	0.09	0.26	2.84
canada.json	2.25	24.651	8.195	0.09	0.27	3.01
citm_catalog.json	1.73	11.660	7.404	0.15	0.23	1.57
citm_catalog.json	1.73	11.541	6.114	0.15	0.28	1.89
github_events.json	0.07	0.569	0.287	0.11	0.23	1.98
github_events.json	0.07	0.580	0.283	0.11	0.23	2.05
gsoc_2018.json	3.33	22.161	10.910	0.15	0.31	2.03
gsoc_2018.json	3.33	17.321	7.124	0.19	0.47	2.43
instruments.json	0.22	0.882	0.801	0.25	0.27	1.10
instruments.json	0.22	0.849	0.825	0.26	0.27	1.03
marine_ik.json	2.98	22.045	29.273	0.14	0.10	0.75
marine_ik.json	2.98	22.172	29.875	0.13	0.10	0.74
mesh_pretty.json	1.58	6.962	9.258	0.23	0.17	0.75
mesh_pretty.json	1.58	7.607	8.452	0.21	0.19	0.90
mesh.json	0.72	5.100	7.846	0.14	0.09	0.65
mesh.json	0.72	4.780	8.469	0.15	0.09	0.56
numbers.json	0.15	1.166	1.028	0.13	0.15	1.13
numbers.json	0.15	1.150	1.052	0.13	0.14	1.09
random.json	0.51	7.382	5.496	0.07	0.09	1.34
random.json	0.51	8.424	5.623	0.06	0.09	1.50
sf_citylots.json	189.78	2365.972	1848.542	0.08	0.10	1.28
sf_citylots.json	189.78	2458.404	1846.338	0.08	0.10	1.33
twitter.json	0.63	5.979	3.560	0.11	0.18	1.68
twitter.json	0.63	8.324	3.601	0.08	0.18	2.31
twitterescaped.json	0.56	2.922	4.038	0.19	0.14	0.72
twitterescaped.json	0.56	2.760	2.307	0.20	0.24	1.20
update_center.json	0.53	6.891	3.342	0.08	0.16	2.06
update_center.json	0.53	7.567	4.585	0.07	0.12	1.65

jkeiser · 2020-04-16T18:26:19Z

Regardless, this is what simdjson_nodejs used to do, and it is the only way to avoid std::move.

I don't recommend checking it in unless we see a more consistent win from it; I really think we want to be stealing the document from simdjson, at least to reduce memory pressure.

luizperes · 2020-04-16T18:32:42Z

I agree. Won't check it in for now. What is discussed on #35 might help this approach, since the cost now seems to be related to allocating a new parser.

jkeiser · 2020-04-16T18:33:16Z

what we we believe we can achieve

apache_builds.json#simdjson x 8,578 ops/sec ±0.56% (93 runs sampled) => 0.117ms

@luizperes where does this come from, BTW? Did simdjson_nodejs used to go this fast?

I went back and looked; the old code did std::move on the parser, which this change doesn't even do:

  ParsedJson *pjh = new ParsedJson(std::move(pj));
  Napi::External<ParsedJson> buffer = Napi::External<ParsedJson>::New(env, pjh,
    [](Napi::Env /*env*/, ParsedJson * data) {
      delete data;
    });

I know I must seem like some kind of std::move partisan here, I'm really not. I just suspect we're jumping at it because it's the most obscure thing in the code (which it definitely is!) :)

luizperes · 2020-04-16T19:02:43Z

@luizperes where does this come from, BTW? Did simdjson_nodejs used to go this fast?

@jkeiser I know it can (possibly) go that fast (and maybe even a little faster than that, will try to explain below). I got that value from the code below (using new document()):

Napi::External<dom::document> buffer = Napi::External<dom::document>::New(env, new dom::document(),
    [](Napi::Env /*env*/, dom::document * doc) {
      delete doc;
    });

simdjson_nodejs has three methods as I think you know (documented here): isValid, parse and lazyParse. While parse is slow, as explained on #5, isValid and lazyParse should have nearly the same cost, because they essentially do the same thing: validate the json, only the lazyParse keeps a reference to the document and exposes the fn valueForKeyPath.

Also, you see that, up to here, lazyParse is equivalent to isValid. When I benchmark the isValid instead of lazyParse, I get:

isValid

apache_builds.json#simdjson x 8,835 ops/sec ±0.54% (93 runs sampled) => 0.113ms
// updated it, value was wrong)

Keeping a reference should be very fast, but as there is some extra work (of course), I believe that the cost of doing a new dom::document() should be close to our final expected result.

Let me know if you see something wrong in my explanation! :)

jkeiser · 2020-04-16T22:42:56Z

I know it can (possibly) go that fast (and maybe even a little faster than that, will try to explain below). I got that value from the code below (using new document()):

I see what you're saying. I don't know that we've made a small enough change to say it's the std::move though. There are big things the compiler could do if it figures out that all Externaldom::document instances have identical documents with nullptrs in them. In particular, it could potentially compile the destructor down to nothing. My supposition to this point has been that attaching a destructor function to Napi::External<> is the thing that makes things slow (a lot of GC languages optimize the crap out of things with no destructors). I think to compare apples to apples we need to actually fill in the document.

I pushed up a jkeiser/std-move-experiment branch of simdjson_nodejs here that removes std::move(). It also makes clear what I believe std::move is doing under the covers by removing unique_ptr. Now the document is just two plain old pointers, and construction copies the pointers and nulls the original document's pointers. Might be a good starting point for experimentation, at least.

I don't see a significant difference between this and master. Maybe you will, though!

For fun, you can also go back one commit on that branch, and see what difference std::move makes
by itself (the previous commit just contains the changes to turn the document into raw pointers and get rid of the hidden work involved in unique_ptr).

While parse is slow, as explained on #5, isValid and lazyParse should have nearly the same cost, because they essentially do the same thing: validate the json, only the lazyParse keeps a reference to the document and exposes the fn valueForKeyPath.

Be careful here :) The compiler is quite capable of deleting huge amounts of code if it can figure out that you're not using its results. I once thought I'd doubled simdjson's speed with a change of mine, until I fixed a bug: I had forgotten to check the utf-8 validation's bool error when deciding whether to return an error result. At that point, the compiler simply didn't run utf-8 validation, even though it's deeply interwoven into the code! Well, I assume that's what happened; when I added if (utf8.error) { return UTF8_ERROR; } at the end of the parse, performance returned to normal.

I can't say that's what's happening here. But I can say that there's a lot of stuff the compiler could do given that it knows no one else could possibly be using the results of that parse.

luizperes · 2020-04-16T23:02:57Z

I can't say that's what's happening here. But I can say that there's a lot of stuff the compiler could do given that it knows no one else could possibly be using the results of that parse.

That is correct. It could be that after the compiler optimizations they were computationally equivalent codes. I will take a look at your experiments, thanks a lot!

luizperes · 2020-04-17T07:47:40Z

FYI @jkeiser,

when I do something like:

Napi::External<void> buffer = Napi::External<void>::New(env, static_cast<void *>(parser.doc.tape.release()),
    [](Napi::Env /*env*/, void * obj) {
      uint64_t *o = static_cast<uint64_t *>(obj);
      delete o;
    });

I get:
apache_builds.json#simdjson x 7,290 ops/sec ±0.76% (93 runs sampled) => 0.137ms

(Not a document, but it looks like the computation is bound by the copying time)

luizperes · 2020-04-17T08:08:41Z

As for your branch jkeiser/std-move-experiment, is there a way for doc to be a pointer to a dom::document? (I didn't change because I am not sure if that would be accepted in the upstream)

luizperes · 2020-04-17T08:13:03Z

Complementing my question, it is just that copying by reference would be much faster than copying by value. I think that keeping new document(parser.doc) is still copying the document by value and that is why it is still slow. When I do &parser.doc (invalid, of course), I see said performance improvements. Even if the document was an unique_ptr, I think I would be able to work with it (such as my example above)

jkeiser · 2020-04-18T01:42:48Z

As for your branch jkeiser/std-move-experiment, is there a way for doc to be a pointer to a dom::document? (I didn't change because I am not sure if that would be accepted in the upstream)

We're talking about doing something like that. It's worth seeing what the performance is, at least!

Note: you shouldn't really make changes to the simdjson.h and simdjson.cpp you have here, except if you're just experimenting like this.

Complementing my question, it is just that copying by reference would be much faster than copying by value. I think that keeping new document(parser.doc) is still copying the document by value and that is why it is still slow.

Yes, it's copying by value. a reference copy would be a single word write. This involves a 2-word malloc and 4 word writes (2 of them to null out the old pointer).

Copying by reference will absolutely be faster. However, it seems really, REALLY unlikely that a 2-word malloc and 3 stores are dominating the runtime, especially given that they happen exactly once per parse. It seems more likely to be a caching or inlining effect. But hypotheses are cheap :) It will come down to experimenting and measuring until it's nailed down.

When I do &parser.doc (invalid, of course), I see said performance improvements. Even if the document was an unique_ptr, I think I would be able to work with it (such as my example above)

Yep. Again, this could either be because of the 2-word malloc and 3 extra word stores, or an effect on inlining / cache since the pointers now live in two different places over the life of the document, or something else we haven't thought of.

jkeiser · 2020-04-26T16:46:49Z

I'm going to close this so it doesn't accidentally get merged, but am happy to continue talking about it :) Note I haven't forgotten about bindings, but I plan to focus on making streaming parsing work for a little bit before I come back to it :)

luizperes · 2020-04-27T04:12:50Z

Sounds good, thank you @jkeiser

Store parser directly instead of extracting document

318fd91

jkeiser mentioned this pull request Apr 16, 2020

Benchmark with GB/s columns #33

Merged

jkeiser closed this Apr 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store parser directly instead of extracting document #36

Store parser directly instead of extracting document #36

jkeiser commented Apr 16, 2020

luizperes commented Apr 16, 2020

jkeiser commented Apr 16, 2020

jkeiser commented Apr 16, 2020

luizperes commented Apr 16, 2020

jkeiser commented Apr 16, 2020

what we we believe we can achieve

luizperes commented Apr 16, 2020 •

edited

Loading

jkeiser commented Apr 16, 2020 •

edited

Loading

luizperes commented Apr 16, 2020

luizperes commented Apr 17, 2020

luizperes commented Apr 17, 2020

luizperes commented Apr 17, 2020

jkeiser commented Apr 18, 2020

jkeiser commented Apr 26, 2020

luizperes commented Apr 27, 2020

Store parser directly instead of extracting document #36

Store parser directly instead of extracting document #36

Conversation

jkeiser commented Apr 16, 2020

This PR

master

luizperes commented Apr 16, 2020

with std::move

with this PR

what we we believe we can achieve

jkeiser commented Apr 16, 2020

This PR

master

jkeiser commented Apr 16, 2020

luizperes commented Apr 16, 2020

jkeiser commented Apr 16, 2020

what we we believe we can achieve

luizperes commented Apr 16, 2020 • edited Loading

isValid

jkeiser commented Apr 16, 2020 • edited Loading

luizperes commented Apr 16, 2020

luizperes commented Apr 17, 2020

luizperes commented Apr 17, 2020

luizperes commented Apr 17, 2020

jkeiser commented Apr 18, 2020

jkeiser commented Apr 26, 2020

luizperes commented Apr 27, 2020

with `std::move`

luizperes commented Apr 16, 2020 •

edited

Loading

jkeiser commented Apr 16, 2020 •

edited

Loading