Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] cuIO Statistics calculation code is redundant #6920

Closed
devavret opened this issue Dec 4, 2020 · 7 comments
Closed

[FEA] cuIO Statistics calculation code is redundant #6920

devavret opened this issue Dec 4, 2020 · 7 comments
Assignees
Labels
cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code

Comments

@devavret
Copy link
Contributor

devavret commented Dec 4, 2020

cuIO has common code for statistics calculation between Parquet writer and ORC writer. This uses custom logic to perform reductions across chunks of rows. These chunks of rows are defined by the unit for which the statistics is generated. e.g. pages in case of parquet and stripes in case of ORC.

This can be refactored to use cub::DeviceSegmentedReduce with a custom iterator that that creates a statistics_val from each column element and a custom reduce operator that reduces between two statistics_vals.

We should also think about using input columns' cudf types rather than specially mapped output types to perform the reduction in. Once the reduction is complete, if the format calls for it, we can convert the type while encoding. This will allow us to replace switch cases made for these dtypes (e.g. here here here) with cudf's type dispatcher.

This will have following advantages:

  1. By using cub's optimized kernels we reduce line count by a lot, thus de-duplicating functionality.
  2. We can use cudf's standard DeviceMin, DeviceMax, and DeviceSum operators that define the min/max/sum operators but more importantly, the respective identity for all current and future cudf types.

Concerns:

The current kernel gpuMergeColumnStatistics is launched only once for the entire table but with cub::DeviceSegmentedReduce, we'd have one async launch per column. This can be an issue when the table has a high number of columns.

Profiling for feasibility

As per some preliminary profiling, I found that the cub kernel performs faster in case of a single 1GB column as compared to the existing approach.
comparison
To predict the effect of launching multiple kernels for columns, I tried launching 64 cub kernels totaling 1GB data. The total resulting time loses to single column scenario but still performs a bit better than the current approach. (3.1 ms vs 5.7 ms)
Screenshot 2020-12-05 at 1 46 50 AM

@devavret devavret added feature request New feature or request Needs Triage Need team to review and classify labels Dec 4, 2020
@devavret devavret added cuIO cuIO issue proposal Change current process or code code quality and removed feature request New feature or request labels Dec 4, 2020
@kkraus14 kkraus14 added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Dec 7, 2020
@vuule
Copy link
Contributor

vuule commented Dec 11, 2020

@devavret assigning to you since you already got the POC implementation. Feel free to hand off if needed.

@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@github-actions github-actions bot added the stale label Feb 16, 2021
@vuule
Copy link
Contributor

vuule commented Mar 31, 2021

Do the recent changes to statistics calculation affect this issue?

@devavret
Copy link
Contributor Author

Do the recent changes to statistics calculation affect this issue?

Quite possibly. The new statistics code is much simpler and maybe obviate the need for this.

rapids-bot bot pushed a commit that referenced this issue May 26, 2021
Addresses #6920 

Use type dispatched functors to calculate statistics in Parquet and ORC.

Authors:
  - Kumar Aatish (https://github.com/kaatish)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Devavret Makkar (https://github.com/devavret)
  - Vukasin Milovanovic (https://github.com/vuule)
  - Ashwin Srinath (https://github.com/shwina)
  - Michael Wang (https://github.com/isVoid)

URL: #8191
@vuule
Copy link
Contributor

vuule commented May 26, 2021

@devavret can we close this issue now?

@devavret
Copy link
Contributor Author

Sure. I don't remember why #8191 didn't close this and called it as being "addressed". Most of what this issue aims to achieve has been completed in #8191

@JohnZed
Copy link
Contributor

JohnZed commented May 27, 2021

Per @devavret's comment above, closing this. Can reopen a more targeted issue for few remaining redundancies if needed.

@JohnZed JohnZed closed this as completed May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code
Projects
None yet
Development

No branches or pull requests

5 participants