Adds JSON/CBOR support and an Io-type option #243

cheqianh · 2023-03-10T02:10:14Z

Description:

This PR integrates JSON/CBOR into the benchmark CLI. It additionally adds an IO type option to decide which IO type of the Ion data will be benchmarked.

Details

In details, this PR adds below features/components

Enable the CLI tool to benchmark Ion data as well as modern JSON/CBOR libraries concurrently.
Adds an IO type option to allow the tool benchmark time for either read/write in-memory buffers or files.
A bunch of refactor and unit testing.

Output Example

Example One

Benchmark load API for all JSON libraries as well as ION

Command and Output:

python amazon/ionbenchmark/ion_benchmark_cli.py read --format json --format orjson --format simplejson --format ion_text --format ion_binary --format ujson

Example Two

Benchmark dumps API (--io-types buffer) for CBOR2, and Ion_binary

Command and Output:

python amazon/ionbenchmark/ion_benchmark_cli.py write --io-type buffer --format cbor2 --format ion_binary test_unit_int

Example Three

Benchmark dump (--io-types file) API for CBOR2, and Ion_binary

Command and Output:

python amazon/ionbenchmark/ion_benchmark_cli.py write --io-type file --format cbor2 --format ion_binary test_unit_int

Follow-up issues

Memory profiling enhancement
Currently, the tool only profile a memory usage peak. Due to the potential optimization Python may do, the memory usage metrics might be inaccurate after running different formats multiple execution times. Need more investigation for memory related metrics - Benchmark CLI memory profiling feature enhancement. #245
Needs a tool to convert different file formats.
Benchmark-cli read command should support --format option (format conversion feature). #234
Needs pretty printed format options
Right now, the options output in the output table is unclear. For example, (simpleion, ion_binary, file) represents that --api is simpleion, --format is ion_binary and --io-type is file. This needs improvement - Benchmark CLI needs pretty printed option log. #244.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…ype of sources.

cheqianh · 2023-03-10T09:07:22Z

amazon/ionbenchmark/Format.py

+def rewrite_file_to_format(file, format_option):
+    return file


The file format convert logic will go in here.

cheqianh

CI/CD failed due some json libraries does not support pypy. E.g., orjson. We can either

Treat orjson as regular built-in json library when using pypy interpreter, or
Remove orjson support on pypy.

Without that, it works as expected with known 10 failed tests - #249

amazon/ionbenchmark/ion_benchmark_cli.py

tests/test_benchmark_cli.py

tgregg · 2023-03-15T18:22:52Z

CI/CD failed due some json libraries does not support pypy. E.g., orjson. We can either
1. Treat `orjson` as regular built-in json library when using `pypy` interpreter, or

2. Remove orjson support on pypy.
Without that, it works as expected with known 10 failed tests - #249

Let's go with option 2

tgregg · 2023-03-15T18:28:40Z

ion-c

Since this introduces test failures that haven't yet been resolved, let's leave the ion-c submodule update for a separate PR, and revert it here.

tgregg

Since format conversion is left as a TODO, can you explain what is actually happening today in each of your examples in the description? Do those come from an actual execution?

Once we have data conversion, I'd expect the data size stat to be different for each of the formats, as it will reflect the size of the converted data, for both read and write.

tgregg · 2023-03-15T18:37:55Z

amazon/ionbenchmark/ion_benchmark_cli.py

For the --api option, the allowed values should be changed to something like load_dump and streaming. The option simple_ion is no longer accurate for the other formats, and looks weird in the options list when those formats are selected.

tgregg · 2023-03-15T18:39:05Z

amazon/ionbenchmark/ion_benchmark_cli.py

+
+
+# Generates benchmark code for json/cbor load/loads APIs
+def generate_read_test_code(file, memory_profiling, format_option, binary, io_type):


Is there a reason we can't follow the same path for Ion?

Merged them into one.

tgregg · 2023-03-15T18:40:37Z

amazon/ionbenchmark/ion_benchmark_cli.py

+
+
+# Generates benchmark code for json dump API
+def generate_write_test_code(obj, memory_profiling, format_option, io_type, binary):


Is there a reason Ion can't follow this path?

same as above comment.

tgregg · 2023-03-15T18:42:16Z

amazon/ionbenchmark/ion_benchmark_cli.py

+        # reset each option configuration
+        api, format_option, io_type = reset_for_each_execution(each_option)
+        binary = format_is_binary(format_option)
+        # TODO. currently, we must provide the tool a corresponding file format for read benchmarking. For example,


The tool to convert to a corresponding file format for read benchmarking. ?

tgregg · 2023-03-15T18:44:19Z

amazon/ionbenchmark/ion_benchmark_cli.py

+                def test_func():
+                    tracemalloc.start()
+                    data = ion.loads(benchmark_data, single_value=single_value, emit_bare_values=emit_bare_values)
+                    global read_memory_usage_peak
+                    read_memory_usage_peak = tracemalloc.get_traced_memory()[1] / BYTES_TO_MB
+                    tracemalloc.stop()
+                    return data


test_func() is what is being timed, correct? That means the cost of using tracemalloc is included in the results. Is there a way for us to extract this logic outside of the timed block?

Yes, we execute it twice; once with memory_profiling enabled, and once with it disabled.

We profile the memory before all performance benchmarking here. Then we benchmark the performance separately.

cheqianh · 2023-03-15T22:00:00Z

Since format conversion is left as a TODO, can you explain what is actually happening today in each of your examples in the description? Do those come from an actual execution?

Yep, these metrics are generated by the tool directly. Overall, the tool is assuming that we had the format conversion functionality already but it doesn't do any conversion work for files.

For each example,
(1)
Benchmarking both read and write for JSON/Ion_text with simple files does NOT require format conversion.

One thing I want to point out is that ion_binary is compared in the example but it's doing the exact same thing as format ion_text. We had an enhancement ticket describing this - #234. Once we support the format conversion, the tool will update it automatically without any changes.

(2) and (3)
Benchmarking write command for CBOR/Ion does NOT require format conversion; we provide python layer objects and doesn't care about how they are encoded and written into files. Both example are targeting to write above.

I found one thing that breaks the fairness of Ion/CBOR benchamrking is the slight difference in python objects generated by them (E.g., cbor2 reads "9233720363654371807" as -12852 in python). But the tool will automatically generate the correct object once we support the format conversion. (This is one benefit the tenet brings when I work on this PR - always assume that the tool supported formats conversion already. So after we have the formats conversion tool, cbor2 will generate correct cbor2 objects from the desired file format automatically)

Once we have data conversion, I'd expect the data size stat to be different for each of the formats, as it will reflect the size of the converted data, for both read and write.

Yeah, the converted file is being used for calculating the file size. Once we support the formats conversion, it will generate new file size automatically.

…ion#250)

Adds. table

amazon/ionbenchmark/ion_benchmark_cli.py

cheqianh · 2023-03-28T22:42:00Z

The new commit addressed all the feedback above, passed all the CI/CD and pypy incompatible issue, and deprecated orjson for now.

Here is the GH issue to add orjson back.

Adds JSON/CBOR support and an Io-type option to benchmark different t…

e9914bc

…ype of sources.

cheqianh requested a review from tgregg March 10, 2023 02:24

cheqianh commented Mar 10, 2023

View reviewed changes

Adds dependencies.

8bcd385

cheqianh force-pushed the json branch from eaa8ef2 to 8bcd385 Compare March 14, 2023 08:06

cheqianh commented Mar 14, 2023

View reviewed changes

linlin-s reviewed Mar 15, 2023

View reviewed changes

amazon/ionbenchmark/ion_benchmark_cli.py Outdated Show resolved Hide resolved

amazon/ionbenchmark/ion_benchmark_cli.py Show resolved Hide resolved

amazon/ionbenchmark/ion_benchmark_cli.py Show resolved Hide resolved

tests/test_benchmark_cli.py Outdated Show resolved Hide resolved

tgregg reviewed Mar 15, 2023

View reviewed changes

Adds formats to documentation, changes a test namespace.

87e7f65

cheqianh and others added 2 commits March 19, 2023 23:34

Uses specific ion-c version to build ion-python C extension. (amazon-…

ed55973

…ion#250)

Address comments

3c99a29

Adds. table

cheqianh force-pushed the json branch from d1afbe2 to 3c99a29 Compare March 20, 2023 06:47

tgregg reviewed Mar 20, 2023

View reviewed changes

amazon/ionbenchmark/ion_benchmark_cli.py Outdated Show resolved Hide resolved

cheqianh force-pushed the json branch from 14785d3 to 2406184 Compare March 28, 2023 22:39

Addressed feedback, passed all CI/CD, and deprecated orjson for now.

2cbbdd8

cheqianh force-pushed the json branch from 2406184 to 2cbbdd8 Compare March 28, 2023 22:46

tgregg approved these changes Mar 29, 2023

View reviewed changes

cheqianh merged commit b08b9a7 into amazon-ion:master Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds JSON/CBOR support and an Io-type option #243

Adds JSON/CBOR support and an Io-type option #243

cheqianh commented Mar 10, 2023 •

edited

Loading

cheqianh Mar 10, 2023 •

edited

Loading

cheqianh left a comment •

edited

Loading

tgregg commented Mar 15, 2023

tgregg Mar 15, 2023

tgregg left a comment

tgregg Mar 15, 2023

cheqianh Mar 28, 2023

tgregg Mar 15, 2023

cheqianh Mar 28, 2023

tgregg Mar 15, 2023

cheqianh Mar 28, 2023

tgregg Mar 15, 2023

cheqianh Mar 28, 2023

tgregg Mar 15, 2023

cheqianh Mar 28, 2023

cheqianh commented Mar 15, 2023 •

edited

Loading

cheqianh commented Mar 28, 2023



		# Generates benchmark code for json/cbor load/loads APIs
		def generate_read_test_code(file, memory_profiling, format_option, binary, io_type):



		# Generates benchmark code for json dump API
		def generate_write_test_code(obj, memory_profiling, format_option, io_type, binary):

Adds JSON/CBOR support and an Io-type option #243

Adds JSON/CBOR support and an Io-type option #243

Conversation

cheqianh commented Mar 10, 2023 • edited Loading

Description:

Details

Output Example

Example One

Example Two

Example Three

Follow-up issues

cheqianh Mar 10, 2023 • edited Loading

Choose a reason for hiding this comment

cheqianh left a comment • edited Loading

Choose a reason for hiding this comment

tgregg commented Mar 15, 2023

Choose a reason for hiding this comment

tgregg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cheqianh commented Mar 15, 2023 • edited Loading

cheqianh commented Mar 28, 2023

cheqianh commented Mar 10, 2023 •

edited

Loading

cheqianh Mar 10, 2023 •

edited

Loading

cheqianh left a comment •

edited

Loading

cheqianh commented Mar 15, 2023 •

edited

Loading