-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds JSON/CBOR support and an Io-type option #243
Conversation
def rewrite_file_to_format(file, format_option): | ||
return file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file format convert logic will go in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI/CD failed due some json libraries does not support pypy
. E.g., orjson. We can either
- Treat
orjson
as regular built-in json library when usingpypy
interpreter, or - Remove orjson support on pypy.
Without that, it works as expected with known 10 failed tests - #249
Let's go with option 2 |
ion-c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this introduces test failures that haven't yet been resolved, let's leave the ion-c submodule update for a separate PR, and revert it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since format conversion is left as a TODO, can you explain what is actually happening today in each of your examples in the description? Do those come from an actual execution?
Once we have data conversion, I'd expect the data size stat to be different for each of the formats, as it will reflect the size of the converted data, for both read and write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the --api
option, the allowed values should be changed to something like load_dump
and streaming
. The option simple_ion
is no longer accurate for the other formats, and looks weird in the options list when those formats are selected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
|
||
|
||
# Generates benchmark code for json/cbor load/loads APIs | ||
def generate_read_test_code(file, memory_profiling, format_option, binary, io_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we can't follow the same path for Ion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged them into one.
|
||
|
||
# Generates benchmark code for json dump API | ||
def generate_write_test_code(obj, memory_profiling, format_option, io_type, binary): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason Ion can't follow this path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above comment.
# reset each option configuration | ||
api, format_option, io_type = reset_for_each_execution(each_option) | ||
binary = format_is_binary(format_option) | ||
# TODO. currently, we must provide the tool a corresponding file format for read benchmarking. For example, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tool to convert to a corresponding file format for read benchmarking. ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
def test_func(): | ||
tracemalloc.start() | ||
data = ion.loads(benchmark_data, single_value=single_value, emit_bare_values=emit_bare_values) | ||
global read_memory_usage_peak | ||
read_memory_usage_peak = tracemalloc.get_traced_memory()[1] / BYTES_TO_MB | ||
tracemalloc.stop() | ||
return data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_func()
is what is being timed, correct? That means the cost of using tracemalloc
is included in the results. Is there a way for us to extract this logic outside of the timed block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, these metrics are generated by the tool directly. Overall, the tool is assuming that we had the format conversion functionality already but it doesn't do any conversion work for files. For each example, One thing I want to point out is that ion_binary is compared in the example but it's doing the exact same thing as format (2) and (3) I found one thing that breaks the fairness of Ion/CBOR benchamrking is the slight difference in python objects generated by them (E.g., cbor2 reads
Yeah, the converted file is being used for calculating the file size. Once we support the formats conversion, it will generate new file size automatically. |
The new commit addressed all the feedback above, passed all the CI/CD and Here is the GH issue to add orjson back. |
Description:
This PR integrates JSON/CBOR into the benchmark CLI. It additionally adds an IO type option to decide which IO type of the Ion data will be benchmarked.
Details
In details, this PR adds below features/components
Output Example
Example One
Benchmark load API for all JSON libraries as well as ION
Command and Output:
Example Two
Benchmark dumps API (--io-types buffer) for CBOR2, and Ion_binary
Command and Output:
Example Three
Benchmark dump (--io-types file) API for CBOR2, and Ion_binary
Command and Output:
Follow-up issues
Currently, the tool only profile a memory usage peak. Due to the potential optimization Python may do, the memory usage metrics might be inaccurate after running different formats multiple execution times. Need more investigation for memory related metrics - Benchmark CLI memory profiling feature enhancement. #245
Benchmark-cli read command should support --format option (format conversion feature). #234
Right now, the options output in the output table is unclear. For example, (simpleion, ion_binary, file) represents that
--api
is simpleion,--format
is ion_binary and--io-type
is file. This needs improvement - Benchmark CLI needs pretty printed option log. #244.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.