-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Address performance issue with decimal types in ORC reader #12677
Labels
1 - On Deck
To be worked on next
cuIO
cuIO issue
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
Performance
Performance related issue
Comments
GregoryKimball
added
feature request
New feature or request
1 - On Deck
To be worked on next
libcudf
Affects libcudf (C++/CUDA) code.
cuIO
cuIO issue
Performance
Performance related issue
labels
Feb 2, 2023
GregoryKimball
changed the title
[FEA] Address performance issues decimal types in ORC reader
[FEA] Address performance issue with decimal types in ORC reader
Feb 2, 2023
Looks like there's a bit to unpack here. Local benchmark results with separated decimal types:
At first glance - decimal 32 and 64 are even slower than the mixed table benchmark suggests. Related - dec128 looks broken; files are too small and the throughput is too large to be correct. |
Closing in favor of #13251 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
1 - On Deck
To be worked on next
cuIO
cuIO issue
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
Performance
Performance related issue
Is your feature request related to a problem? Please describe.
In the ORC reader benchmarks, the decimal
data_type
inorc_read_decode
is much slower than other fixed width types. The profiles below used low cardinality, high run length settings, but the slowdown applies to high cardinality, low run length data as well. The extra time is being spent in the ORCgpuDecodeOrcColumnData
kernel.read_orc
takes about 25 ms for signed integersread_orc
takes about 150 ms for decimalDescribe the solution you'd like
I'd like decimal types to read with data ingest throughput that is similar to integer and float types.
Describe alternatives you've considered
n/a
Additional context
Here are the commands I used to generate the profiles:
Running rapids devel image
0022659d9d65
on cudf commit3c39be5a9d7d69adaad47121359e0c084b76decf
A100 + AMD Epyc hardwareThe text was updated successfully, but these errors were encountered: