-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Slow kernel_agent (make_string_columns) calls when parsing a Parquet File #7124
Comments
This issue has been labeled |
Are the strings large? If so, this will be fixed by #7576 |
@tgravescs Can you check if the issue persists with the latest code? |
This issue has been labeled |
sorry, I missed your questions. I had to delete the original file that reproduced this due to it having to be deleted after so many days, we can close this for now and I'll reopen if we see the issue again. |
Describe the bug
I have a parquet file that during load shows very high times in a _kernel_agent kernel. nsys profile seems to indicate this comes from a make_string_column call. The times are in the 25+ms just for that kernel to run. the unsnap kernel is like 48ms, gpuDecodePageData is only 7ms and GpuBuildStringDictionaryIndex is also only 6.4ms.
The time for the make_string_column seems large here.
Steps/Code to reproduce bug
I can provide a file to reproduce this but can't share publicly. I can also provide an nsys trace if needed.
Using the cudf java API from Scala I can read the parquet file like below and get an nsys trace of it:
schema of data:
Expected behavior
I was investigating parquet read performance and when I use I use nsys profile it seems like the time by this kernel is very high. I would have expected it to be less.
Environment overview (please complete the following information)
bare metal running on a T4 box with ubuntu
Environment details
Click here to see environment details
The text was updated successfully, but these errors were encountered: