Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add column_device_view to orc writer #7676

Merged
merged 13 commits into from
Mar 25, 2021
Merged

Add column_device_view to orc writer #7676

merged 13 commits into from
Mar 25, 2021

Conversation

kaatish
Copy link
Contributor

@kaatish kaatish commented Mar 23, 2021

This PR adds column_device_view members to EncChunk, DictionaryChunk and StripeDictionary structures which are used in the ORC writer. The idea is to replace members in these structures which replicate the same information. Usage of nvstrdesc_s has also been eliminated in the ORC writer.

Fixes #7347, Addresses #5682, Addresses #7334

@kaatish kaatish added 2 - In Progress Currently a work in progress code quality libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 23, 2021
@kaatish kaatish requested review from vuule and rgsl888prabhu March 23, 2021 04:51
@kaatish kaatish self-assigned this Mar 23, 2021
@kaatish kaatish requested a review from a team as a code owner March 23, 2021 04:51
@kaatish kaatish added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Mar 23, 2021
@codecov
Copy link

codecov bot commented Mar 23, 2021

Codecov Report

Merging #7676 (d3a573e) into branch-0.19 (7871e7a) will increase coverage by 0.66%.
The diff coverage is n/a.

❗ Current head d3a573e differs from pull request most recent head 0a80942. Consider uploading reports for the commit 0a80942 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7676      +/-   ##
===============================================
+ Coverage        81.86%   82.53%   +0.66%     
===============================================
  Files              101      101              
  Lines            16884    17453     +569     
===============================================
+ Hits             13822    14404     +582     
+ Misses            3062     3049      -13     
Impacted Files Coverage Δ
python/cudf/cudf/core/column/decimal.py 92.95% <0.00%> (-1.92%) ⬇️
python/cudf/cudf/core/column/lists.py 90.00% <0.00%> (-1.40%) ⬇️
python/cudf/cudf/core/column/numerical.py 94.83% <0.00%> (-0.20%) ⬇️
python/cudf/cudf/core/column/column.py 87.61% <0.00%> (-0.15%) ⬇️
python/cudf/cudf/utils/utils.py 85.36% <0.00%> (-0.07%) ⬇️
python/cudf/cudf/io/feather.py 100.00% <0.00%> (ø)
python/cudf/cudf/utils/ioutils.py 78.71% <0.00%> (ø)
python/cudf/cudf/comm/serialize.py 0.00% <0.00%> (ø)
python/cudf/cudf/_fuzz_testing/io.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/struct.py 100.00% <0.00%> (ø)
... and 42 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b271c06...0a80942. Read the comment docs.

@vuule
Copy link
Contributor

vuule commented Mar 24, 2021

Fixes #7347, #5682 Addresses #7334

You cannot apply "fixes" to multiple issues like this. Has to be "Fixes #7347, fixes #5682, addresses #7334"

@vuule
Copy link
Contributor

vuule commented Mar 24, 2021

Should this PR close #5682? I see uses of nvstrdesc_s that are not removed in this PR. I would expect for all nvstrdesc_s declaration to be removed before we close #5682.

Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big fan of the changes. Just needs more span and to add missing docs. And the span use can be pushed to another PR IMO.

Have you run the benchmarks?

cpp/src/io/orc/writer_impl.hpp Outdated Show resolved Hide resolved
cpp/src/io/orc/writer_impl.cu Outdated Show resolved Hide resolved
Comment on lines +123 to +129
gpuInitDictionaryIndices(DictionaryChunk *chunks,
const table_device_view view,
uint32_t *dict_data,
uint32_t *dict_index,
size_t row_index_stride,
size_type *str_col_ids,
uint32_t num_columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should (almost) all be spans/2dspans, but we might want to leave this for another PR, given the urgency of this PR.

cpp/src/io/orc/dict_enc.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/dict_enc.cu Outdated Show resolved Hide resolved
@ttnghia
Copy link
Contributor

ttnghia commented Mar 24, 2021

Fixes #7347, #5682 Addresses #7334

Please add more description. It's not obvious to know what this PR is doing unless we have to click/hover on those links to open these issues.

@davidwendt
Copy link
Contributor

Fixes #7347, #5682 Addresses #7334

Please add more description. It's not obvious to know what this PR is doing unless we have to click/hover on those links to open these issues.

Also, the description gets added to the changelog entry so even a simple description here would be helpful.

@kaatish
Copy link
Contributor Author

kaatish commented Mar 24, 2021

Have you run the benchmarks?

Performance impact is captured here.

@kaatish
Copy link
Contributor Author

kaatish commented Mar 24, 2021

Should this PR close #5682? I see uses of nvstrdesc_s that are not removed in this PR. I would expect for all nvstrdesc_s declaration to be removed before we close #5682.

I have removed the link to that issue for now but it talks about removing nvstrdesc_s in the writers only which has been done in this PR.

@vuule
Copy link
Contributor

vuule commented Mar 24, 2021

Should this PR close #5682? I see uses of nvstrdesc_s that are not removed in this PR. I would expect for all nvstrdesc_s declaration to be removed before we close #5682.

I have removed the link to that issue for now but it talks about removing nvstrdesc_s in the writers only which has been done in this PR.

@devavret to clarify the issue. My interpretation is that all nvstrdesc_s uses should be replaced.

@devavret
Copy link
Contributor

Should this PR close #5682? I see uses of nvstrdesc_s that are not removed in this PR. I would expect for all nvstrdesc_s declaration to be removed before we close #5682.

I have removed the link to that issue for now but it talks about removing nvstrdesc_s in the writers only which has been done in this PR.

@devavret to clarify the issue. My interpretation is that all nvstrdesc_s uses should be replaced.

That is true. It just might need one more PR. So it could be an "addresses" here.

@kaatish kaatish requested a review from vuule March 24, 2021 17:55
@vuule vuule requested a review from devavret March 24, 2021 18:43
@vuule
Copy link
Contributor

vuule commented Mar 24, 2021

rerun tests

@vuule
Copy link
Contributor

vuule commented Mar 25, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit f1f1d0f into rapidsai:branch-0.19 Mar 25, 2021
rapids-bot bot pushed a commit that referenced this pull request Mar 27, 2021
In PR #7676 the length of the current string being referred to while building stripe dictionaries was always set to 0 while incrementing the dictionary character count of a StripeDictionary. This led to corrupted strings when the dictionary encoding was used as noted in issue #7741. This has been fixed in this PR.

Fixes #7741

Authors:
  - Kumar Aatish (@kaatish)

Approvers:
  - Vukasin Milovanovic (@vuule)
  - Nghia Truong (@ttnghia)

URL: #7744
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Refactor stats_column_desc in cuio
5 participants