-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17289: [C++] Add type category membership checks #13783
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm extremely lukewarm about this. In any case, the implementation is needlessly inefficient.
cpp/src/arrow/type.cc
Outdated
@@ -2494,6 +2519,61 @@ const std::vector<std::shared_ptr<DataType>>& PrimitiveTypes() { | |||
return g_primitive_types; | |||
} | |||
|
|||
bool IsBaseBinaryType(std::shared_ptr<DataType> type) { | |||
std::call_once(static_data_initialized, InitStaticData); | |||
return g_base_binary_types_set.find(type) != g_base_binary_types_set.end(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really an inefficient way to do it instead of simply switching based on the id()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
cpp/src/arrow/type.h
Outdated
@@ -2151,4 +2151,32 @@ const std::vector<std::shared_ptr<DataType>>& IntervalTypes(); | |||
ARROW_EXPORT | |||
const std::vector<std::shared_ptr<DataType>>& PrimitiveTypes(); | |||
|
|||
ARROW_EXPORT | |||
bool IsSignedIntType(std::shared_ptr<DataType>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards -1 on adding these APIs. There are similar APIs, with different names and inlined, that take a type id. It's not significantly to write IsSignedIntType(type)
rather than is_signed_integer(type->id())
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, the signature is inefficient as this is always copying a shared_ptr (implying an atomic reference increment). Instead this could just take a const DataType&
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out the set of *Type()
functions does not correspond to the set of is_*
ones. This can be really confusing to the user; for example, one could easily incorrectly assume that PrimtiveTypes()
corresponds to is_primitive
. So, I think some clean up is warranted around this code, while considering backwards compatibility.
I'll soon add a commit here that shows the is_*
functions I needed to add to obtain correspondence. However, I don't like the mess this makes of the function names, so I'd appreciate a suggestion here.
It sounds like it would be better to document the existing predicates, rather than add new ones with different names. |
The existing set of predicates is both confusing and incomplete, so I'd argue for both documenting and adding predicates. I'm not sure what to do with the names, considering backwards compatibility, though. Suppose we agree that a predicate corresponding to |
I added docs that clarify the meaning of and correspondence between |
Hmm... can you point out the causes for confusion? Perhaps there's a way to improve that. Same for incompleteness: is the set of
Why is |
Sure. Check out this commit:
So, part of the confusion is the unexpected non-correspondence of
As compared to |
Ok, so there are two different areas here:
I think the former (case number 1 above) is the most interesting to solve, because it directly affects readability. Would you like to defer number 2 to a separate JIRA (and potentially PR, if people agree it's a worthwhile thing to do)? |
Also, for the record, functions such as |
Now to answer the problems with the current predicates:
Also note that the Does that make sense? Also @lidavidm what do you think? |
Note #13753 removes the I'm -1 on the PR as is, but I think we should add the missing "primitive" is hard to define and it depends on what the user is after. I might favor a set of more specific predicates like the ones we have for template metaprogramming, e.g. |
OK, so I'll avoid changes around the "primitive" functions and fix this PR to only:
|
Anyone knows what the errors in the Dev jobs mean? |
I think you can ignore those, some files need to be bumped post-release |
cpp/src/arrow/type_traits.h
Outdated
/// | ||
/// \param[in] type_id the type-id to check | ||
/// \return whether type-id is a primitive-like type one | ||
static inline bool is_primitive_like(Type::type type_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would is_not_nested
or something be clearer here? "primitive-like" is a little confusing (though, I suppose that then raises the question of why fixed-width binary and decimal are also missing here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to remove this predicate and I'll do so soon. It was only added to correspond to some *Types()
function, all of which are going to be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: the existing (pre-PR) code has an is_nested
predicate, though it is not the complement of the function discussed here, e.g., due to timestamp and interval types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! I posted a couple suggestions but this looks good overall.
@pitrou, is this good to go? |
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Sorry for the delay @rtpsw . I rebased from master and did a couple of very minor changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Will wait for CI now... |
Benchmark runs are scheduled for baseline = cef6894 and contender = f0688d0. f0688d0 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
See https://issues.apache.org/jira/browse/ARROW-17289 Lead-authored-by: Yaron Gvili <[email protected]> Co-authored-by: rtpsw <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
See https://issues.apache.org/jira/browse/ARROW-17289