Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase max RLE stream size estimate to avoid potential overflows #9568

Merged
merged 5 commits into from
Nov 1, 2021

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Oct 29, 2021

Issue #9478

Increase the max RLE stream size to account for the fact that varint encoding encodes 7 bits pre byte.
Remove unused flush parameter in integerRLE kernel.

@vuule vuule added bug Something isn't working cuIO cuIO issue non-breaking Non-breaking change labels Oct 29, 2021
@vuule vuule requested a review from a team as a code owner October 29, 2021 20:25
@vuule vuule self-assigned this Oct 29, 2021
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Oct 29, 2021
@codecov
Copy link

codecov bot commented Oct 29, 2021

Codecov Report

Merging #9568 (6b79de2) into branch-21.12 (ab4bfaa) will decrease coverage by 0.13%.
The diff coverage is n/a.

❗ Current head 6b79de2 differs from pull request most recent head a8986ed. Consider uploading reports for the commit a8986ed to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.12    #9568      +/-   ##
================================================
- Coverage         10.79%   10.65%   -0.14%     
================================================
  Files               116      117       +1     
  Lines             18869    19331     +462     
================================================
+ Hits               2036     2059      +23     
- Misses            16833    17272     +439     
Impacted Files Coverage Δ
python/dask_cudf/dask_cudf/sorting.py 92.90% <0.00%> (-1.21%) ⬇️
python/cudf/cudf/io/csv.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/orc.py 0.00% <0.00%> (ø)
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/index.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/parquet.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/series.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/reshape.py 0.00% <0.00%> (ø)
python/cudf/cudf/utils/dtypes.py 0.00% <0.00%> (ø)
... and 42 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 77c6f1d...a8986ed. Read the comment docs.

@vuule vuule removed the request for review from devavret October 29, 2021 22:56
@rgsl888prabhu
Copy link
Contributor

Is it possible to add a small test case ?

@vuule
Copy link
Contributor Author

vuule commented Oct 29, 2021

Is it possible to add a small test case ?

Working on it, it's a bit tricky to avoid all benefits of RLE encoding :)

@github-actions github-actions bot added the Python Affects Python cuDF API. label Oct 30, 2021
@vuule vuule requested a review from a team as a code owner October 30, 2021 00:17
@vuule vuule requested review from shwina and bdice October 30, 2021 00:17
@vuule
Copy link
Contributor Author

vuule commented Oct 30, 2021

@rgsl888prabhu Update on testing:
added a test that reads a valid file (with correct stream sizes, so no overlap), writes it again to an ORC file and tries to read the result with pyarrow. Without this PR, pyarrow segfaults, and with the PR it reads the data correctly. It's a bit of a weird test, but AFAICT it's valid.

@vuule
Copy link
Contributor Author

vuule commented Oct 30, 2021

rerun tests

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python looks good. 👍

@vuule
Copy link
Contributor Author

vuule commented Oct 30, 2021

@gpucibot merge

@vuule vuule added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs cuDF (Python) Reviewer labels Oct 30, 2021
@vuule
Copy link
Contributor Author

vuule commented Oct 30, 2021

rerun tests

@vuule
Copy link
Contributor Author

vuule commented Oct 30, 2021

looks like branch checker is stuck :\

@galipremsagar
Copy link
Contributor

rerun tests

@galipremsagar galipremsagar changed the base branch from branch-21.12 to branch-21.10 November 1, 2021 04:53
@galipremsagar galipremsagar changed the base branch from branch-21.10 to branch-21.12 November 1, 2021 04:53
@galipremsagar galipremsagar requested review from a team as code owners November 1, 2021 04:53
@galipremsagar
Copy link
Contributor

rerun tests

@galipremsagar
Copy link
Contributor

looks like branch checker is stuck :\

Switching the base branch to 21.10 and back again to 21.12 did the trick. But tagging @ajschmidt8 for visibility just in case you may see this in the future again.

@rapids-bot rapids-bot bot merged commit ca347ff into rapidsai:branch-21.12 Nov 1, 2021
@vuule vuule deleted the bug-orc-stream-overlap branch April 20, 2022 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants