-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid unnecessary Table instances after contiguous split #1593
Conversation
Signed-off-by: Jason Lowe <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a single pass, and I am not seeing anything odd. It is nice to see all the columnar-meta code go away in favor of the packed format. I'll do a second pass today.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/CopyCompressionCodec.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/RapidsShuffleInternalManagerBase.scala
Show resolved
Hide resolved
sql-plugin/src/main/java/com/nvidia/spark/rapids/GpuCompressedColumnVector.java
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/CopyCompressionCodec.scala
Show resolved
Hide resolved
sql-plugin/src/main/java/com/nvidia/spark/rapids/GpuColumnVectorFromBuffer.java
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsDeviceMemoryStore.scala
Show resolved
Hide resolved
sql-plugin/src/main/java/com/nvidia/spark/rapids/GpuCompressedColumnVector.java
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/java/com/nvidia/spark/rapids/GpuPackedTableColumn.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Jason Lowe <[email protected]>
Signed-off-by: Jason Lowe <[email protected]>
Signed-off-by: Jason Lowe <[email protected]>
build |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsDeviceMemoryStore.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Avoid unnecessary Table instances after contiguous split Signed-off-by: Jason Lowe <[email protected]> * Address review comments Signed-off-by: Jason Lowe <[email protected]> * Remove ColumnMeta Signed-off-by: Jason Lowe <[email protected]> * Remove extra license file Signed-off-by: Jason Lowe <[email protected]>
* Avoid unnecessary Table instances after contiguous split Signed-off-by: Jason Lowe <[email protected]> * Address review comments Signed-off-by: Jason Lowe <[email protected]> * Remove ColumnMeta Signed-off-by: Jason Lowe <[email protected]> * Remove extra license file Signed-off-by: Jason Lowe <[email protected]>
This depends on rapidsai/cudf#7127.
This change avoids manifesting cudf
Table
andColumnVector
instances after a partition via contiguous split, as these buffers are likely to be shipped off to other nodes over the network rather than used as a columnar batch input in the same process. Instantiating all of the tables and column objects for many splits can take a significant amount of time per task.This also changes the table metadata used for UCX shuffle to use the new libcudf opaque metadata generated by the
contiguous_split
andpack
methods, eliminating a lot ofColumnMeta
flatbuffer classes.