Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OOB memory access in CSV reader when reading without NA values #13011

Merged
merged 6 commits into from
Apr 5, 2023

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Mar 25, 2023

Description

CSV reader uses a trie to read field with special values as nulls. The creation of the trie does not work correctly when there are not special values. This can happen when the NA filter is enabled, but the default NA values are removed, and user does not specify custom values. In this case, use of this trie leads to OOB memory access.
This PR fixes the trie creation to create an empty trie when there are not special values to look for.
Included a C++ test that crashes without the fix.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vuule vuule self-assigned this Mar 25, 2023
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Mar 25, 2023
@vuule vuule added bug Something isn't working cuIO cuIO issue non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. labels Mar 25, 2023
@vuule vuule changed the base branch from branch-23.04 to branch-23.06 March 25, 2023 07:17
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Mar 27, 2023
@vuule vuule marked this pull request as ready for review April 3, 2023 17:28
@vuule vuule requested a review from a team as a code owner April 3, 2023 17:28
@@ -83,8 +83,8 @@ trie create_serialized_trie(const std::vector<std::string>& keys, rmm::cuda_stre
__host__ __device__ inline bool serialized_trie_contains(device_span<serial_trie_node const> trie,
device_span<char const> key)
{
if (trie.data() == nullptr || trie.empty()) return false;
if (key.empty()) return trie.front().is_leaf;
if (trie.empty()) { return false; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong with trie.data() == nullptr?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it's strictly redundant since a device span with data() == nullptr will always have size 0.

@@ -83,8 +83,8 @@ trie create_serialized_trie(const std::vector<std::string>& keys, rmm::cuda_stre
__host__ __device__ inline bool serialized_trie_contains(device_span<serial_trie_node const> trie,
device_span<char const> key)
{
if (trie.data() == nullptr || trie.empty()) return false;
if (key.empty()) return trie.front().is_leaf;
if (trie.empty()) { return false; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it's strictly redundant since a device span with data() == nullptr will always have size 0.

Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@vuule
Copy link
Contributor Author

vuule commented Apr 5, 2023

/merge

@rapids-bot rapids-bot bot merged commit 9a770f6 into rapidsai:branch-23.06 Apr 5, 2023
@vuule vuule deleted the bug-read_csv-no-na-values branch April 5, 2023 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants