-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] really slow GPU parsing of parquet file #7114
Comments
This issue has been labeled |
This issue is still valid. |
This issue has been labeled |
This is still valid |
@revans2 what is the driver version you are/were using? |
Sorry I was out on vacation for the past week.
|
But I am running with the 10.1 runtime. I will upgrade to 11.2 soon-ish and see if it fixes the issue. |
Describe the bug
I have a parquet file that is very slow to unsnap on the GPU, but is very fast on the CPU.
Steps/Code to reproduce bug
I am happy to share the file with whom ever, it is a 326 MB parquet file based off of some data from a TPC-DS run for the
store sales
table with decimal data instead of doubles for money related columns.For me all I have to do is to read the data in and it takes about 1.8 seconds to do it, where as on the CPU through Spark it typically takes about 300 ms single threaded (including some computation at the end/spark overhead).
Expected behavior
When I use nsys profile it shows that nearly all of the time 98% is being taken by the unsnap kernel.
I realize that there are not that many columns (6) and that there are only 9 row groups, but there are about 701 data/dictionary pages so I would expect to be able to unsnap the pages all in parallel and see some kind of a speed up, but instead it is much much slower. I also didn't see much if any data size skew between the pages (at least according to the parquet CLI tool).
Environment overview (please complete the following information)
Bare metal desktop with a V100 16 GB GPU in it (cuda 10.1)
Environment details
Click here to see environment details
Additional context
I am running this on 11ebc3e
I tried to do a bit of profiling with nsight compute, but I am not an expert on any of this so I couldn't come up with much of what might be wrong/off about this.
The text was updated successfully, but these errors were encountered: