Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unnecessary Table instances after contiguous split #1593

Merged
merged 5 commits into from
Feb 5, 2021

Conversation

jlowe
Copy link
Member

@jlowe jlowe commented Jan 26, 2021

This depends on rapidsai/cudf#7127.

This change avoids manifesting cudf Table and ColumnVector instances after a partition via contiguous split, as these buffers are likely to be shipped off to other nodes over the network rather than used as a columnar batch input in the same process. Instantiating all of the tables and column objects for many splits can take a significant amount of time per task.

This also changes the table metadata used for UCX shuffle to use the new libcudf opaque metadata generated by the contiguous_split and pack methods, eliminating a lot of ColumnMeta flatbuffer classes.

@jlowe jlowe added performance A performance related task/issue shuffle things that impact the shuffle plugin labels Jan 26, 2021
@jlowe jlowe added this to the Jan 18 - Jan 29 milestone Jan 26, 2021
@jlowe jlowe self-assigned this Jan 26, 2021
@abellina abellina self-requested a review January 27, 2021 14:56
Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a single pass, and I am not seeing anything odd. It is nice to see all the columnar-meta code go away in favor of the packed format. I'll do a second pass today.

jlowe added 3 commits January 27, 2021 17:21
Signed-off-by: Jason Lowe <[email protected]>
Signed-off-by: Jason Lowe <[email protected]>
Signed-off-by: Jason Lowe <[email protected]>
@jlowe jlowe marked this pull request as ready for review February 4, 2021 17:32
@jlowe
Copy link
Member Author

jlowe commented Feb 4, 2021

build

@jlowe jlowe requested review from gerashegalov and revans2 February 4, 2021 19:21
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jlowe jlowe merged commit 00e9118 into NVIDIA:branch-0.4 Feb 5, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Avoid unnecessary Table instances after contiguous split

Signed-off-by: Jason Lowe <[email protected]>

* Address review comments

Signed-off-by: Jason Lowe <[email protected]>

* Remove ColumnMeta

Signed-off-by: Jason Lowe <[email protected]>

* Remove extra license file

Signed-off-by: Jason Lowe <[email protected]>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Avoid unnecessary Table instances after contiguous split

Signed-off-by: Jason Lowe <[email protected]>

* Address review comments

Signed-off-by: Jason Lowe <[email protected]>

* Remove ColumnMeta

Signed-off-by: Jason Lowe <[email protected]>

* Remove extra license file

Signed-off-by: Jason Lowe <[email protected]>
@jlowe jlowe deleted the contig-packed branch September 10, 2021 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue shuffle things that impact the shuffle plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants