Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextDecoder.decode/JSON.parse vs makeReader Performance... #616

Closed
jfuehner opened this issue May 28, 2020 · 10 comments
Closed

TextDecoder.decode/JSON.parse vs makeReader Performance... #616

jfuehner opened this issue May 28, 2020 · 10 comments

Comments

@jfuehner
Copy link

jfuehner commented May 28, 2020

A follow up to my question regarding ion's laziness. I tried using an ion reader on a large object but I'm seeing a substantial performance difference between makeReader and TextDecoder.decode/JSON.parse.

The object structure below has very large numeric arrays in positions (178443), lineWidths (178443), lineColors (237924).

{
     data: {
          id: '123',
          featureCollections: [{
                id: '456',
                positions: [],
                lineWidths: [],
                lineColors: []
          }]
     }
}

With TextDecoder.decode/JSON.parse it takes ~65-150ms to convert an arrayBuffer to a JSON object. With the ion makeReader I'm seeing it taking ~565-765ms to create the ion reader.

Obviously the TextDecoder/JSON parsers are native. Was curious however if this drastic difference is to be expected with ion from a performance perspective?

@zslayton
Copy link
Contributor

zslayton commented Jun 1, 2020

It sounds like you're using Ion text, is that true? If so, please also try Ion binary. Its performance should be much better, especially for larger data sets.

With the ion makeReader I'm seeing it taking ~565-765ms to create the ion reader.

Could you elaborate on what you're doing? Does it take hundreds of seconds to "create the Ion reader", or to actually read all of the data with it? Could you share your benchmark code?

Obviously the TextDecoder/JSON parsers are native. Was curious however if this drastic difference is to be expected with ion from a performance perspective?

I won't be able to answer this without better understanding what's being measured.

@jfuehner
Copy link
Author

jfuehner commented Jun 1, 2020

@zslayton on the server-side I am using the Java Jackson Ion serializer to create the ion byte array that gets sent to the client...

ObjectMapper mapper = new IonObjectMapper();
byte[] encoded = mapper.writeValueAsBytes(data);

From there, the client receives an arrayBuffer via web socket which I pass directly to the ion makeReader.

const reader = ion.makeReader(buffer);

The line above takes ~565-767 milliseconds to complete...

@zslayton
Copy link
Contributor

zslayton commented Jun 1, 2020

Thanks for clarifying. From a skim of the Jackson source, it looks like you'll need to call setCreateBinaryWriters(true) to produce binary Ion.

const reader = ion.makeReader(buffer);

The line above takes ~565-767 milliseconds to complete...

That's definitely not expected. makeReader should just be doing some basic initialization work -- it doesn't actually perform any parsing. Would you be able to share an example document I could use for testing/benchmarking?

@jfuehner
Copy link
Author

jfuehner commented Jun 2, 2020

@zslayton Using the setCreateBinaryWriters(true) did the trick. After setting that the client's makeReader call went to ~0.44-0.75 milliseconds.

Now I just wish the reader was easier to work with ;-). Would be great if the DOM Api was able to just load the properties associated with a structure and only load the value of a property when accessed (i.e., lazily loading values). That would effectively give the DOM Api a similar feel to working with a reader... Or perhaps helper/util functions to make finding keys in a reader easier...

@zslayton
Copy link
Contributor

zslayton commented Jun 9, 2020

Now I just wish the reader was easier to work with ;-). Would be great if the DOM Api was able to just load the properties associated with a structure and only load the value of a property when accessed (i.e., lazily loading values). That would effectively give the DOM Api a similar feel to working with a reader... Or perhaps helper/util functions to make finding keys in a reader easier...

I've opened #617 to track this.

@zslayton zslayton closed this as completed Jun 9, 2020
@jfuehner
Copy link
Author

@zslayton with the same object structure/data mentioned above the ion.load DOM Api with a ion binary arrayBuffer takes ~8400-8700 milliseconds vs with TextDecoder.decode/JSON.parse it takes ~65-150ms. Is that to be expected?

It was my understanding from a previous question ion would only load up to the first value it encounters which in this case would be the data property? With that, I would have thought the performance would be quite speedy on the above structure...

@zslayton
Copy link
Contributor

It was my understanding from a previous question ion would only load up to the first value it encounters which in this case would be the data property? With that, I would have thought the performance would be quite speedy on the above structure...

I believe you're referring to this quote?:

ion.load() will eagerly load the first value it encounters in the provided Reader or serialized data source. (Compare this to ion.loadAll(), which will eagerly load all of the values in the data source.)

The example document you provided is considered a stream with a single (deeply nested) value. A longer stream might look like this:

{ // <-- The first value is this enclosing struct
     data: {
          id: '123',
          featureCollections: [{
                id: '456',
                positions: [],
                lineWidths: [],
                lineColors: []
          }]
     }
}
{ // <-- Value 2
     data: {
          id: 'abc',
          featureCollections: [{
                id: 'def',
                positions: [],
                lineWidths: [],
                lineColors: []
          }]
     }
}
{ // <-- Value 3
     data: {
          id: 'geh',
          featureCollections: [{
                id: 'ijk',
                positions: [],
                lineWidths: [],
                lineColors: []
          }]
     }
}
// ....

You can get the incremental parsing behavior you're describing by using the (less friendly) streaming Reader interface instead of the DOM.

@zslayton with the same object structure/data mentioned above the ion.load DOM Api with a ion binary arrayBuffer takes ~8400-8700 milliseconds vs with TextDecoder.decode/JSON.parse it takes ~65-150ms. Is that to be expected?

While I wouldn't expect a pure-JS (or TS) implementation like ours to outperform the native functions baked into the browser, that's a wider performance gap than I would hope for. However, without seeing an example document it will be hard to diagnose any performance problems. How large is your file? Are you able to share it?

You switched to binary Ion earlier in this thread, but just now you mentioned:

`vs with TextDecoder.decode/JSON.parse

Are you comparing the Ion text parser or the binary?

@jfuehner
Copy link
Author

jfuehner commented Jun 13, 2020

@zslayton I actually tested both ion binary and ion text with the load function of the DOM Api.

You can grab a copy of the sample data I have been testing with here

Slight correction to my original time measurements. I found the ion load function with the binary format taking ~6200 milliseconds. The ion load function with the text format took ~9000 milliseconds. Compared to the makeReader function with the binary format which took ~0.4 milliseconds

I then compared those measurements to TextDecoder.decode/JSON.parse with a encoded json string which took ~81 milliseconds.

I agree ion may not be able to beat serialization of native baked-in browser functions. I guess I was hoping ion would provide a smaller binary payload from the server to the client while being a competitive alternative to the native baked-in browser functions all while being schemaless...

@zslayton
Copy link
Contributor

zslayton commented Jun 15, 2020

Thanks for the example data!

I took some time to dig into this this morning and wanted to share some of my initial findings.

decimal vs float

When parsing Ion text, numeric values without an exponent (e0, for example) are considered decimal values rather than float values. It looks like a good amount of the slowness (though by no means all!) is coming from the fact that all of your numeric values are decimals, which require several JS object allocations (BigInts, among other things) and extra validation logic to parse compared to regular JS number values.

I made a second version of your file that converted all of the decimal values to float values by appending an e0 to the end of each one.

cat sample-data.json | sed 's#\([0-9]\),#\1e0,#g' > sample-data.floats.ion

I then made a binary version of both files using the ion-c CLI tool.

# Make binary Ion file with all decimal values
ion process --output-format binary sample-data.json > sample-data.10n
# Make binary Ion file with all float values
ion process --output-format binary sample-data.floats.ion > sample-data.floats.10n

(Note: Using floats actually made the converted files noticeably larger. In text we were adding two characters (e0) to each number, and in binary floats are always encoded as 4 or 8 bytes while decimal values can be smaller.)

I then wrote a quick Node program to get some timings:

let ion = require('ion-js/dist/es6/es6/Ion.js');
let fs = require('fs');
let path = require('path');

let fileName = process.argv[2];
let filePath = path.join(__dirname, fileName);
let buffer = fs.readFileSync(filePath);

let value = ion.load(buffer);

For each file, I ran

time node demo.js [fileName]

a few times to let the file system's cache warm up and took the median result from 3 runs. Here are some example timings:

file seconds
sample-data.json 4.426
sample-data.floats.ion 2.649
sample-data.10n 2.579
sample-data.floats.10n 1.868

While I'd like these results to be faster across the board, their relative performance aligns with my expectations; reading binary is faster than reading text, and reading floats is faster than reading decimals. If you don't require the precision offered by decimal values, encoding them as floats instead will save you some processing time (but not necessarily space!).

ion.makeReader()

We expect calls to makeReader() to be essentially free, since the code just constructs a new Reader object and doesn't do any parsing. However, as you've seen, there are some cases where it's strangely slow.

I've modified my demo code from above to this:

let ion = require('ion-js/dist/es6/es6/Ion.js');
let fs = require('fs');
let path = require('path');

let fileName = process.argv[2];
let filePath = path.join(__dirname, fileName);
let buffer = fs.readFileSync(filePath);

let reader = ion.makeReader(buffer);  // <--- Replaced ion.load()

Example timings:

file seconds
sample-data.json 0.741
sample-data.10n 0.110

The result for sample-data.10n is about what I'd expect for both of them -- a tenth of a second is about as long as it takes to start a node process and import ion. As you'd noticed, the path for sample-data.json is weirdly slow.

This is caused by the argument handling in makeReader. If the provided data source is a string, it's used to create a TextReader directly. If it's a buffer, it checks to see whether it starts with the binary Ion version marker. If so, it's used to create aBinaryReader directly. If not, then it decodes it first using a pure-JS custom implementation of decodeUtf8 to produce a string first.

If I bypass decodeUtf8 by either asking NodeJS to decode the file's bytes for me:

let text = fs.readFileSync(filePath, 'utf8');
let reader = ion.makeReader(text);
file seconds
sample-data.json 0.121
let buffer = fs.readFileSync(filePath);
let text = new TextDecoder().decode(buffer);
let reader = ion.makeReader(text);
file seconds
sample-data.json 0.133

This was originally done because some (pretty old) runtimes didn't universally support TextDecoder, but I suspect that's a restriction we can lift now. I've opened this issue to track that.

2+ seconds to load a 4.5MB json file into memory is still slower than we'd like. Hopefully I can find some time to do some profiling in the near future to find some other low-hanging fruit.

Tests were run on a 2015 macbook pro using node v12.17.0.

@desaikd
Copy link
Contributor

desaikd commented Dec 3, 2020

Modified the Unicode decode method used in ion-js as per the experiments mentioned in #618 with the changes in this PR. Here's the findings for the sample data provided in this issue:

Using current ion-js decode logic (decodeUtf8):
 0.43392964469999995
 Sample JSON data x 2.30 ops/sec ±15.98% (10 runs sampled)
Using TextDecoder.decode:
 0.00966902717236025
 Sample JSON data x 103 ops/sec ±4.29% (69 runs sampled)

From above it's clear using TextDecoder.decode improves performance significantly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants