-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass batch size to JSON reader using environment variable #16502
Pass batch size to JSON reader using environment variable #16502
Conversation
/ok to test |
Co-authored-by: Nghia Truong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I did not realize the environment variable is just for the test. Thanks for doing that. What is the execution time for the test now? Can you try it with compute-sanitizer too?
Also, just a nit, can you change plain size_t
to std::size_t
in places where this PR changed code?
The execution time has dropped from 52s to 3s now. With compute-sanitizer, the runtime is 123s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMake approval.
/ok to test |
/merge |
Description
The JSON reader set the batch size to
INT_MAX
bytes since the motivation for implementing a batched JSON reader was to parse source files whose total size is larger thanINT_MAX
(#16138, #16162). However, we can use a much smaller batch size to evaluate the correctness of the reader and speed up tests significantly.This PR focuses on reducing runtime of the batched reader test by setting the batch size to be used by the reader as an environment variable.
The runtime of
JsonLargeReaderTest.MultiBatch
inLARGE_STRINGS_TEST
gtest drops from ~52s to ~3s.Checklist