-
Notifications
You must be signed in to change notification settings - Fork 916
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fail loudly to avoid data corruption with unsupported input in `read_…
…orc` (#12325) Issue #11890 Motivating issue: The ORC reader reads nulls in row groups after the first one when reading a string column encoded with Pandas, with direct encoding. The root cause is that cuDF reads offsets from the row group index as larger then the stream sizes. This PR does not fix the issue, but ensures that the reader fails loudly when the row group index offsets are read as too large to be correct. This should prevent data corruption until the fix is implemented. This PR also sets up a mechanism to report decode errors from unsupported data. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Mike Wilson (https://github.com/hyperbolic2346) - Nghia Truong (https://github.com/ttnghia) URL: #12325
- Loading branch information
Showing
4 changed files
with
32 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters