-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement initial support for avro logical types (#6482) #12788
Implement initial support for avro logical types (#6482) #12788
Conversation
Pull requests from external contributors require approval from a |
Looks like I don't have permission to edit the labels for this PR, maybe because I created it off an issue that I didn't raise?
|
python/cudf/cudf/tests/test_avro_reader_fastavro_integration.py
Outdated
Show resolved
Hide resolved
python/cudf/cudf/tests/test_avro_reader_fastavro_integration.py
Outdated
Show resolved
Hide resolved
/ok to test |
Thanks for the contribution @tpn! @vuule and @galipremsagar are probably the best people to answer the open questions here. |
71de769
to
80f28de
Compare
Ignore the 80f28de force-push, I'm about to squash and remove the time-millis/micros stuff. |
cb82e82
to
41790ff
Compare
@vyasr can I get another And friendly ping to @vuule & @galipremsagar for review. I'd like to get this PR merged so I can tackle a couple of follow-up avro items. Thanks! |
41790ff
to
397ba8b
Compare
/ok to test |
@tpn I'm OOTO this week, will review as soon as I can next week. |
No problem, thanks for the update. |
dates = [ | ||
datetime.date(1970, 1, 1), | ||
datetime.date(1970, 1, 2), | ||
datetime.date(1981, 10, 25), | ||
datetime.date(2012, 5, 18), | ||
datetime.date(2019, 9, 3), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would maybe be good to have some nulls mixed into a variant of this test as well to ensure that we properly handle the situations where we get a UNION
type with nulls and dates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed a change that rewrites the test to test nulls where applicable. Can you take a look? Thanks!
I like that idea.
On Feb 21, 2023, at 10:24, Keith Kraus ***@***.***> wrote:
@kkraus14 commented on this pull request.
________________________________
In python/cudf/cudf/tests/test_avro_reader_fastavro_integration.py<#12788 (comment)>:
+ dates = [
+ datetime.date(1970, 1, 1),
+ datetime.date(1970, 1, 2),
+ datetime.date(1981, 10, 25),
+ datetime.date(2012, 5, 18),
+ datetime.date(2019, 9, 3),
+ ]
Would maybe be good to have some nulls mixed into a variant of this test as well to ensure that we properly handle the situations where we get a UNION type with nulls and dates.
—
Reply to this email directly, view it on GitHub<#12788 (review)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAEZZYWOJIBE22ZGCAXOVE3WYUB4FANCNFSM6AAAAAAU5VMOJY>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
397ba8b
to
e83f52e
Compare
@tpn looks like the Java failure is related to this PR, not a random issue |
The Java tests are here: https://github.com/rapidsai/cudf/blob/branch-23.04/java/src/test/java/ai/rapids/cudf/TableTest.java#L973-L1030 |
Ah! I'll investigate. |
326b171
to
1731f4f
Compare
Yeah it looks like either the Java tests or the Java avro test file will need some tweaking. Still investigating, but finishing up for the week shortly. I'll pick it up first thing next week. |
I think I need some help with the Java test failures. If I run
I'm still trying to get a local Java env set up -- the
|
Try running Reference: Line 41 in 55ed347
|
Hey @davidwendt, yeah, that didn't seem to work either. Fails with the exact same Don't suppose you're able to share your local conda env + build.sh invocation? Feel free to email if that's easier: trent at trent dot me. |
/ok to test |
I don't think I will be able to help you with the Java debugging. However, I was able to recreate the error with a C++ test. Here is the source:
It is odd we do not have any avro gtests for this.
(Here the test file is
I'm not sure why it fails only with your new code. One thing I did notice that was different with the Python tests is that this one passes in column names. If you comment out passing in the column names in the
One last thing I noticed is that the I'm not sure if that helps. I certainly can provide better help with the C++ test/debug. I'm confident that if this is fixed that the Java test will work as well. |
Oh this is incredibly helpful, thanks! Investigating now. |
Found the issue, heh, @davidwendt, it was an artifact of that switch statement refactoring I did. I also noticed a subtle logic error whilst traipsing through the code in cuda-gdb related to kind/logical_kind and type_union. Here's the patch. I'm going to pick it up tomorrow and rebase etc. diff --git a/cpp/src/io/avro/avro_gpu.cu b/cpp/src/io/avro/avro_gpu.cu
index b6f56cbb5c..4d33d3f0e7 100644
--- a/cpp/src/io/avro/avro_gpu.cu
+++ b/cpp/src/io/avro/avro_gpu.cu
@@ -97,6 +97,8 @@ avro_decode_row(schemadesc_s const* schema,
}
if (i >= schema_len || skip_after < 0) break;
kind = schema[i].kind;
+ logical_kind = schema[i].logical_kind;
+ if (is_supported_logical_type(logical_kind)) { kind = static_cast<type_kind_e>(logical_kind); }
skip = skip_after;
}
@@ -110,13 +112,17 @@ avro_decode_row(schemadesc_s const* schema,
break;
case type_int: {
- int64_t v = avro_decode_zigzag_varint(cur, end);
- static_cast<int32_t*>(dataptr)[row] = static_cast<int32_t>(v);
+ int64_t v = avro_decode_zigzag_varint(cur, end);
+ if (dataptr != nullptr && row < max_rows) {
+ static_cast<int32_t*>(dataptr)[row] = static_cast<int32_t>(v);
+ }
} break;
case type_long: {
- int64_t v = avro_decode_zigzag_varint(cur, end);
- static_cast<int64_t*>(dataptr)[row] = v;
+ int64_t v = avro_decode_zigzag_varint(cur, end);
+ if (dataptr != nullptr && row < max_rows) {
+ static_cast<int64_t*>(dataptr)[row] = v;
+ }
} break;
case type_bytes: [[fallthrough]]; Semi-related question: I'd like to add in the C++ unit tests for the #include <string>
#include <experimental/filesystem>
namespace fs = std::experimental::filesystem;
...
TEST_F(AvroTest, TestFile)
{
fs::path basedir = fs::path(__FILE__).parent_path();
fs::path avro_path = basedir / "../../../java/src/test/resources/alltypes_plain.avro";
auto srcinfo = cudf::io::source_info{avro_path.string()};
... However, depending on |
I'm glad you found the issue. I'll let @vuule answer the question about testing against an existing test file in our repo. |
Great news @tpn ! As for |
Actually yeah I need to figure out why the existing Python unit tests didn't hit this. There shouldn't be any reason preventing this logic from being exercised via Python avenue. I'll investigate.
Ah maybe it's `std::filesystem::path`` I'm thinking of that's still experimental. Nevertheless, probably moot if I can get the Python tests to exercise this stuff. |
Co-authored-by: Vukasin Milovanovic <[email protected]>
- Use `Optional` instead of `Union`. - In `test_can_parse_avro_date_logical_type()`: - Tweak `avro_type` initialization. - Improve clarity of nullable data initialization.
Only specify our test dates once.
- Remove `using` directive. - Fix typo in `CUDF_UNREACHABLE()` message.
Comment-out dead/unreachable avro device code for now.
Tweak the switch statement in avro_decode_row().
b0da10f
to
2141345
Compare
Just rebased and pushed a fix for the CUDA kernel crashes. Figured out how to replicate the crash via a Python unit test (and verified it actually did crash prior to this fix being applied). Friendly request for (hopefully the last!) Edit: pushed again to fix a small typo. |
2141345
to
4d2a317
Compare
/ok to test |
Hurrah! Thanks folks for all your assistance in getting this merged. Much appreciated! |
Description
This includes the date type. Scaffolding has been put in place to handle the other logical types. This closes #6482.
Checklist