-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORC reader supports struct #2887
Conversation
Signed-off-by: Firestarman <[email protected]>
Now it is draft because some tests for nested type are failing due to the different values between CPU and GPU. Filed the issue rapidsai/cudf#8704 to track the test failures. |
Signed-off-by: Firestarman <[email protected]>
Updated the tests to avoid generating null struct rows, to unblock this PR. |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this looks good, but we can't default to turning ORC struct support on if we know it cannot load null struct rows properly.
TimestampGen(start=datetime(1590, 1, 1, tzinfo=timezone.utc))], | ||
TimestampGen(start=datetime(1590, 1, 1, tzinfo=timezone.utc))] | ||
|
||
# Set to true after the issue https://github.com/rapidsai/cudf/issues/8704 is fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't check this in as-is -- it can silently corrupt data on an ORC read. Either we need to make the struct support qualified by a separate incompatible config that defaults to off or wait until it is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue rapidsai/cudf#8704 is marked as " v21.08 Release", so I chose to wait for the fix.
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Here is the fix rapidsai/cudf#8819 for the issue rapidsai/cudf#8704 |
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Found a bug when reading data from orc with different column order in read schema against file schema, and @wbo4958 is working on it. |
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
build |
Signed-off-by: Firestarman <[email protected]>
build |
Signed-off-by: Firestarman <[email protected]>
build |
Signed-off-by: Firestarman <[email protected]>
build |
build |
Also add tests for the nested predicate pushdown, and the support for nested column pruning. Relevant PRs: NVIDIA#3079 NVIDIA#2887 Signed-off-by: Firestarman <[email protected]>
Also add tests for the nested predicate pushdown, and the support for nested column pruning. Relevant PRs: NVIDIA#3079 NVIDIA#2887 Signed-off-by: Firestarman <[email protected]>
Also add tests for the nested predicate pushdown, and the support for nested column pruning. Relevant PRs: NVIDIA#3079 NVIDIA#2887 Signed-off-by: Firestarman <[email protected]>
Also add tests for the nested predicate pushdown, and the support for nested column pruning. Relevant PRs: NVIDIA#3079 NVIDIA#2887 Signed-off-by: Firestarman <[email protected]>
* Re-enable the struct support for the orc reader. Also add tests for the nested predicate pushdown, and the support for nested column pruning. Relevant PRs: #3079 #2887 Signed-off-by: Firestarman <[email protected]>
Let ORC reader support struct, since cudf has supported them by the PR rapidsai/cudf#8599 .
Also add tests for pruning nested fields.
closes #2879
closes #1481
Signed-off-by: Firestarman [email protected]