-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add avro reader support [databricks] #4956
Conversation
Signed-off-by: remzi <[email protected]>
Signed-off-by: Bobby Wang <[email protected]>
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
use data gen instead of testing file Signed-off-by: remzi <[email protected]>
Add basic avro reading test
build |
build |
build |
1 similar comment
build |
build |
# Only 3 jars: cudf.jar dist.jar integration-test.jar | ||
ALL_JARS="$CUDF_JARS $PLUGIN_JARS $TEST_JARS" | ||
ALL_JARS="$CUDF_JARS $PLUGIN_JARS $TEST_JARS $AVRO_JARS" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Better to check whether the avro jar exists. If not, set AVRO_JARS=""
and CI_EXCLUDE_AVRO=true
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE
# | ||
# `CI_EXCLUDE_AVRO=true ./run_pyspark_from_build.sh -k not avro_test.py` run all tests excluding | ||
# those in avro_test.py | ||
if [[ "${CI_EXCLUDE_AVRO}" != "" ]]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if [[ "${CI_EXCLUDE_AVRO}" != "" ]]; | |
if [[ "${CI_EXCLUDE_AVRO}" == "true" ]]; |
otherwise, CI_EXCLUDE_AVRO=false
can also skip the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
build |
build |
1 similar comment
build |
This PR is trying to support the avro reader on basic type support.
Since there is no meta info for the block data, this PR first iterates the whole avro file to get all the block info, it didn't read all files. instead, it only needs to read small bytes by seeking the desired position, then filters the blocks according to the PartitionedFile. finally read the blocks into CPU and send to GPU to decode ...
Close #4935 .
Re #4831 .