-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuDF/libcudf exponentially weighted moving averages #9027
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some smaller things, then hopefully we can wrap this up!
// Use the null mask produced by the op for EWM | ||
if (agg.kind != aggregation::EWMA) { output->set_null_mask(std::move(mask), null_count); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this, but open question for other reviewers if they would prefer a more systematic approach to handling this. I'm inclined to leave it as-is for now and not try to overgeneralize to a single case.
auto const expected_ewma_vals_adjust = cudf::test::fixed_width_column_wrapper<TypeParam>{ | ||
{1.0, 1.75, 2.61538461538461497469, 3.54999999999999982236, 4.52066115702479365268}}; | ||
|
||
auto const expected_ewma_vals_noadjust = | ||
cudf::test::fixed_width_column_wrapper<TypeParam>{{1.0, | ||
1.66666666666666651864, | ||
2.55555555555555535818, | ||
3.51851851851851815667, | ||
4.50617283950617242283}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are a lot of digits of precision. Maybe we should consider rounding off some digits and comparing that instead or something? I'm not sure.
/ok to test |
/ok to test |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has stalled for a long time. I would like to approve and merge as-is, but please follow up with a new PR to address the comments in this review.
* | ||
* @param center_of_mass the center of mass. | ||
* @param history which assumption to make about the first value | ||
* @return A EWM aggregation object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specific to moving average, right? Please check me.
* @return A EWM aggregation object | |
* @return A EWMA aggregation object |
} | ||
|
||
struct ewma_functor { | ||
template <typename T, CUDF_ENABLE_IF(!std::is_floating_point<T>::value)> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
template <typename T, CUDF_ENABLE_IF(!std::is_floating_point<T>::value)> | |
template <typename T, CUDF_ENABLE_IF(!std::is_floating_point_v<T>)> |
CUDF_FAIL("Unsupported type for EWMA."); | ||
} | ||
|
||
template <typename T, CUDF_ENABLE_IF(std::is_floating_point<T>::value)> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
template <typename T, CUDF_ENABLE_IF(std::is_floating_point<T>::value)> | |
template <typename T, CUDF_ENABLE_IF(std::is_floating_point_v<T>)> |
#include <thrust/transform_scan.h> | ||
|
||
namespace cudf { | ||
namespace detail { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we wrap some parts of this in an anonymous namespace? Anything that is not declared in a header should be anonymous.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.ewm.html | ||
for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a Sphinx reference instead of a raw link. Something like
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.ewm.html | |
for details. | |
See :meth:`pandas.DataFrame.ewm` for details. |
Please check that this renders nicely in the docs.
2 1.615385 | ||
3 1.615385 | ||
4 3.670213 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be consistent with blank lines or no blank lines.
# libcudf ewm has special casing for nulls only | ||
# and come what may with nans. It treats those nulls like | ||
# pandas does nans in the same positions mathematically. | ||
# as such we need to convert the nans to nulls before | ||
# passing them in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this comment clearer? What does "come what may" mean here, precisely?
python/cudf/cudf/tests/test_ewm.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need tests for all the error modes (unsupported arguments, invalid values, etc.), and probably more input types, too.
No problem @bdice . It shouldn't be too big of a jump from here to |
I'd prefer a cleanup before expanding the feature set, to avoid stalling in review. 😉 |
/ok to test |
/merge |
Adds an exponentially weighted moving average aggregation to
cudf::scan
and plumbs it up throughcudf.Series.ewm
, similar topandas.Series.ewm
.partially resolves #1263