-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor LogicalType for Parquet #14264
Refactor LogicalType for Parquet #14264
Conversation
The ultimate goal is to allow greater use of |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of small questions/suggestions
// isset.TIME or isset.TIMESTAMP or isset.INTEGER or isset.UNKNOWN or isset.JSON or isset.BSON) | ||
// { | ||
if (isset.TIMESTAMP or isset.TIME) { c.field_struct(10, s.logical_type); } | ||
if (s.field_id.has_value()) { c.field_int(9, s.field_id.value()); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: we could probably get rid of this if has_value -> write value
pattern with SFINAE in ProtobufWriter
, assuming that the field_xyz
names map to the actual type of the parameter.
If all types were supported with an overload set, optional support could be a part of the set, and simply delegate to <T::value> implementation if has_value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot of template foo to avoid a few invocations of has_value
. We can leave this for another day :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would have a larger impact than that, but maybe I'm mixing it up with the reader side. Either way it's not a suggestion for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not sure if I'm on the same page)
How about std::visit
to deal with such template issue and optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can be in the next refactor of CompactProtocolReader/Writer 🤣 (or should I say compact_protocol_reader/writer?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think std::visit is applicable here. Outside of reflection that would allow us to iterate over data members (which AFAIK does not exist), I think an overload set is as good as this gets. I'd be happy to learn about other solutions.
/ok to test |
NullType UNKNOWN; | ||
JsonType JSON; | ||
BsonType BSON; | ||
enum Type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enum class
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Type
enums are already buried inside the struct, so we're already getting the benefits of a scoped enum. And keeping it non-scoped allows me to use the enum values as the positional argument to the field_struct
calls in the writer without having to do a cast.
/ok to test |
/merge |
@galipremsagar - As far as I can tell the |
Not sure, but I don't immediately think so. The test failure we are investigating suggests that the behavior of |
@rjzamora can you please point us to the failing tests? Is there an issue open? |
Failing test is here:
I'll trying to boil this down to a simpler reproducer (if possible). |
oof |
🤯 🤯 🤯 |
@rjzamora I think I'm running this to ground...it seems that the blob passed in to |
Thanks for looking into this @etseidl !
Dask is using whatever type
Okay, either approach sounds reasonable to me. To be completely honest, I'm somewhat doubtful that there is anyone actually depending on the behavior covered in this test. In fact, we are already expecting the "wrong" result for |
Description
Continuation of #14097, this PR refactors the LogicalType struct to use the new way of treating unions defined in the parquet thrift (more enum like than struct like).
Checklist