-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Struct support for ParquetWriter #2514
Changes from 5 commits
59dfd75
9561dc2
d95ddd8
6ceeaf9
450acd6
7b86505
c97dd2d
94d2569
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -761,7 +761,7 @@ object GpuOverrides { | |
(ParquetFormatType, FileFormatChecks( | ||
cudfRead = (TypeSig.commonCudfTypes + TypeSig.DECIMAL + TypeSig.STRUCT + TypeSig.ARRAY + | ||
TypeSig.MAP).nested(), | ||
cudfWrite = TypeSig.commonCudfTypes + TypeSig.DECIMAL, | ||
cudfWrite = (TypeSig.commonCudfTypes + TypeSig.DECIMAL + TypeSig.STRUCT).nested(), | ||
sparkSig = (TypeSig.atomics + TypeSig.STRUCT + TypeSig.ARRAY + TypeSig.MAP + | ||
TypeSig.UDT).nested())), | ||
(OrcFormatType, FileFormatChecks( | ||
|
@@ -2835,7 +2835,8 @@ object GpuOverrides { | |
exec[DataWritingCommandExec]( | ||
"Writing data", | ||
ExecChecks((TypeSig.commonCudfTypes + | ||
TypeSig.DECIMAL.withPsNote(TypeEnum.DECIMAL, "Only supported for Parquet")).nested(), | ||
TypeSig.DECIMAL.withPsNote(TypeEnum.DECIMAL, "Only supported for Parquet") + | ||
TypeSig.STRUCT.withPsNote(TypeEnum.STRUCT, "Only supported for Parquet")).nested(), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we missing a test? When the struct support went in, the fact that we forgot to update this should have been caught by the test. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for pointing this out @jlowe I remembered that we added struct support for |
||
TypeSig.all), | ||
(p, conf, parent, r) => new SparkPlanMeta[DataWritingCommandExec](p, conf, parent, r) { | ||
override val childDataWriteCmds: scala.Seq[DataWritingCommandMeta[_]] = | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is
CoalesceExec
not on the GPU? and is there an open issue that will put it on the GPU? If not we should file one and reference it here. If so we should just reference it here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is fixed by #2530 which added struct support for coalesce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the PR to not allow Coalesce with Structs to run on CPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code still has
allow_non_gpu("CoalesceExec")
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, its not refflecting it here for some reason, unless you go to the last commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to a commit that has yet to be pushed to his PR? Because I still see it in https://github.com/NVIDIA/spark-rapids/pull/2514/files and at https://github.com/razajafri/spark-rapids-1/blob/parquet-doc/integration_tests/src/main/python/parquet_write_test.py#L49
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK you are seeing the latest, I wasn't seeing it until I looked at the commit specifically.
So that is needed because CoalesceExec doesn't support Decimals yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#2531 explicitly states it adds decimal support and appears to be testing it. So if that is somehow not working in practice an issue needs to be filed and fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for wasting your time. I didn't see
_commonTypes
was added and thought it wasTypeSig.commonCudfTypes