-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Struct support for ParquetWriter #2514
Conversation
Signed-off-by: Raza Jafri <[email protected]>
@@ -2815,7 +2815,8 @@ object GpuOverrides { | |||
exec[DataWritingCommandExec]( | |||
"Writing data", | |||
ExecChecks((TypeSig.commonCudfTypes + | |||
TypeSig.DECIMAL.withPsNote(TypeEnum.DECIMAL, "Only supported for Parquet")).nested(), | |||
TypeSig.DECIMAL.withPsNote(TypeEnum.DECIMAL, "Only supported for Parquet") + | |||
TypeSig.STRUCT.withPsNote(TypeEnum.STRUCT, "Only supported for Parquet")).nested(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing a test? When the struct support went in, the fact that we forgot to update this should have been caught by the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out @jlowe I remembered that we added struct support for GpuInMemoryTableScanExec
and confused it with DataWritingCommandExec
. We have to turn this on as part of this PR. I have updated the PR name to reflect it
Signed-off-by: Raza Jafri <[email protected]>
Signed-off-by: Raza Jafri <[email protected]>
build |
Signed-off-by: Raza Jafri <[email protected]>
Signed-off-by: Raza Jafri <[email protected]>
pytest.param([decimal_gen_default, parquet_basic_struct_gen, | ||
StructGen([['child0', StructGen([['child1', byte_gen]])]]), | ||
decimal_gen_scale_precision, decimal_gen_same_scale_precision, decimal_gen_64bit], | ||
marks=pytest.mark.allow_non_gpu("CoalesceExec"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is CoalesceExec
not on the GPU? and is there an open issue that will put it on the GPU? If not we should file one and reference it here. If so we should just reference it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is fixed by #2530 which added struct support for coalesce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the PR to not allow Coalesce with Structs to run on CPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code still has allow_non_gpu("CoalesceExec")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, its not refflecting it here for some reason, unless you go to the last commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to a commit that has yet to be pushed to his PR? Because I still see it in https://github.com/NVIDIA/spark-rapids/pull/2514/files and at https://github.com/razajafri/spark-rapids-1/blob/parquet-doc/integration_tests/src/main/python/parquet_write_test.py#L49
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK you are seeing the latest, I wasn't seeing it until I looked at the commit specifically.
So that is needed because CoalesceExec doesn't support Decimals yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that is needed because CoalesceExec doesn't support Decimals yet
#2531 explicitly states it adds decimal support and appears to be testing it. So if that is somehow not working in practice an issue needs to be filed and fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for wasting your time. I didn't see _commonTypes
was added and thought it was TypeSig.commonCudfTypes
Signed-off-by: Raza Jafri <[email protected]>
…ts now Signed-off-by: Raza Jafri <[email protected]>
Signed-off-by: Raza Jafri <[email protected]>
build |
* updated Parquet writer support docs Signed-off-by: Raza Jafri <[email protected]> * adding Struct to allow list for ParquetWriter Signed-off-by: Raza Jafri <[email protected]> * add nested struct test Signed-off-by: Raza Jafri <[email protected]> * removed duplicate data type from test Signed-off-by: Raza Jafri <[email protected]> * removed CoalesceExec from allowedOnCpu list because it supports structs now Signed-off-by: Raza Jafri <[email protected]> * Decimals are supported by CoalesceExec Signed-off-by: Raza Jafri <[email protected]> Co-authored-by: Raza Jafri <[email protected]>
* updated Parquet writer support docs Signed-off-by: Raza Jafri <[email protected]> * adding Struct to allow list for ParquetWriter Signed-off-by: Raza Jafri <[email protected]> * add nested struct test Signed-off-by: Raza Jafri <[email protected]> * removed duplicate data type from test Signed-off-by: Raza Jafri <[email protected]> * removed CoalesceExec from allowedOnCpu list because it supports structs now Signed-off-by: Raza Jafri <[email protected]> * Decimals are supported by CoalesceExec Signed-off-by: Raza Jafri <[email protected]> Co-authored-by: Raza Jafri <[email protected]>
Signed-off-by: Raza Jafri [email protected]