-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] disallow SUM and MEAN of timestamp types #5319
[REVIEW] disallow SUM and MEAN of timestamp types #5319
Conversation
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
We shouldn't be allowing MEAN of timestamp at all. The user should be required to first cast to a duration or integer type. |
You were not so firm about this here, which may have led to this PR: #4074 (comment)
|
@karthikeyann I think the first step is to add support for duration types... #5272 |
casting to duration type/integer type is a workaround. duration types will still be implemented. but it's not required for implementing MEAN of timestamp. |
libcudf is not Pandas. libcudf is a C++ library that tries to follow semantics of STL types (std::chrono) as closely as possible. As such, we shouldn't allow any operation that requires summing timestamp types. If Python wants to allow sum/mean of timestamps, then they can zero-copy cast from a timestamp to a duration/integer type and do so. But that's Python's responsibility, not libcudf's. |
template <typename OutputType, | ||
typename agg_op, | ||
bool is_mean_of_timestamp, | ||
std::enable_if_t<!is_mean_of_timestamp>* = nullptr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've always found this explicit "is_mean" bool that gets passed around a little strange (the actual rolling code did it too). The logic for this can be inferred from the other parameters, ie
std::enable_if_t<!is_mean_of_timestamp>* = nullptr> | |
template <typename OutputType, | |
typename agg_op, | |
std::enable_if_t<!(agg_op == MEAN && OutputType == cudf::is_timestamp<OutputType>())>* = nullptr | |
> |
Then this could be bubbled up further so that is_mean can be removed from the template params for create_reference_output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue #5466 is created to track this.
template <typename OutputType, | ||
typename agg_op, | ||
bool is_mean_of_timestamp, | ||
std::enable_if_t<is_mean_of_timestamp>* = nullptr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
cpp/tests/rolling/rolling_test.cpp
Outdated
template <typename OutputType, | ||
typename agg_op, | ||
bool is_mean_of_timestamp, | ||
std::enable_if_t<!is_mean_of_timestamp>* = nullptr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as comment from grouped_rolling_test.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this PR should be made to just disable sum of timestamp types. Emulating Pandas MEAN of timestamp can be done by casting to duration.
Codecov Report
@@ Coverage Diff @@
## branch-0.15 #5319 +/- ##
===============================================
+ Coverage 86.09% 86.38% +0.29%
===============================================
Files 75 75
Lines 12667 12983 +316
===============================================
+ Hits 10906 11216 +310
- Misses 1761 1767 +6
Continue to review full report at Codecov.
|
closes #4074