Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuDF/libcudf exponentially weighted moving averages #9027

Merged
merged 162 commits into from
Jun 24, 2024

Conversation

brandon-b-miller
Copy link
Contributor

@brandon-b-miller brandon-b-miller commented Aug 12, 2021

Adds an exponentially weighted moving average aggregation to cudf::scan and plumbs it up through cudf.Series.ewm, similar to pandas.Series.ewm.

partially resolves #1263

@brandon-b-miller brandon-b-miller added feature request New feature or request 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. non-breaking Non-breaking change labels Aug 12, 2021
@brandon-b-miller brandon-b-miller self-assigned this Aug 12, 2021
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some smaller things, then hopefully we can wrap this up!

cpp/src/reductions/scan/ewm.cu Outdated Show resolved Hide resolved
cpp/src/reductions/scan/ewm.cu Outdated Show resolved Hide resolved
cpp/src/reductions/scan/ewm.cu Outdated Show resolved Hide resolved
cpp/src/reductions/scan/ewm.cu Outdated Show resolved Hide resolved
cpp/src/reductions/scan/ewm.cu Outdated Show resolved Hide resolved
Comment on lines +185 to +186
// Use the null mask produced by the op for EWM
if (agg.kind != aggregation::EWMA) { output->set_null_mask(std::move(mask), null_count); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this, but open question for other reviewers if they would prefer a more systematic approach to handling this. I'm inclined to leave it as-is for now and not try to overgeneralize to a single case.

cpp/tests/reductions/ewm_tests.cpp Outdated Show resolved Hide resolved
Comment on lines +50 to +58
auto const expected_ewma_vals_adjust = cudf::test::fixed_width_column_wrapper<TypeParam>{
{1.0, 1.75, 2.61538461538461497469, 3.54999999999999982236, 4.52066115702479365268}};

auto const expected_ewma_vals_noadjust =
cudf::test::fixed_width_column_wrapper<TypeParam>{{1.0,
1.66666666666666651864,
2.55555555555555535818,
3.51851851851851815667,
4.50617283950617242283}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are a lot of digits of precision. Maybe we should consider rounding off some digits and comparing that instead or something? I'm not sure.

python/cudf/cudf/core/window/ewm.py Show resolved Hide resolved
python/cudf/cudf/core/window/ewm.py Outdated Show resolved Hide resolved
@brandon-b-miller
Copy link
Contributor Author

/ok to test

@vyasr
Copy link
Contributor

vyasr commented Jun 18, 2024

/ok to test

@brandon-b-miller
Copy link
Contributor Author

/ok to test

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has stalled for a long time. I would like to approve and merge as-is, but please follow up with a new PR to address the comments in this review.

*
* @param center_of_mass the center of mass.
* @param history which assumption to make about the first value
* @return A EWM aggregation object
Copy link
Contributor

@bdice bdice Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specific to moving average, right? Please check me.

Suggested change
* @return A EWM aggregation object
* @return A EWMA aggregation object

}

struct ewma_functor {
template <typename T, CUDF_ENABLE_IF(!std::is_floating_point<T>::value)>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
template <typename T, CUDF_ENABLE_IF(!std::is_floating_point<T>::value)>
template <typename T, CUDF_ENABLE_IF(!std::is_floating_point_v<T>)>

CUDF_FAIL("Unsupported type for EWMA.");
}

template <typename T, CUDF_ENABLE_IF(std::is_floating_point<T>::value)>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
template <typename T, CUDF_ENABLE_IF(std::is_floating_point<T>::value)>
template <typename T, CUDF_ENABLE_IF(std::is_floating_point_v<T>)>

#include <thrust/transform_scan.h>

namespace cudf {
namespace detail {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we wrap some parts of this in an anonymous namespace? Anything that is not declared in a header should be anonymous.

Comment on lines +38 to +39
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.ewm.html
for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a Sphinx reference instead of a raw link. Something like

Suggested change
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.ewm.html
for details.
See :meth:`pandas.DataFrame.ewm` for details.

Please check that this renders nicely in the docs.

2 1.615385
3 1.615385
4 3.670213

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be consistent with blank lines or no blank lines.

Suggested change

Comment on lines +139 to +143
# libcudf ewm has special casing for nulls only
# and come what may with nans. It treats those nulls like
# pandas does nans in the same positions mathematically.
# as such we need to convert the nans to nulls before
# passing them in.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this comment clearer? What does "come what may" mean here, precisely?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need tests for all the error modes (unsupported arguments, invalid values, etc.), and probably more input types, too.

@brandon-b-miller
Copy link
Contributor Author

This PR has stalled for a long time. I would like to approve and merge as-is, but please follow up with a new PR to address the comments in this review.

No problem @bdice . It shouldn't be too big of a jump from here to ewmvar and ewmstd, how about we take care of some of this cleanup at the same time?

@bdice
Copy link
Contributor

bdice commented Jun 18, 2024

No problem @bdice . It shouldn't be too big of a jump from here to ewmvar and ewmstd, how about we take care of some of this cleanup at the same time?

I'd prefer a cleanup before expanding the feature set, to avoid stalling in review. 😉

@brandon-b-miller
Copy link
Contributor Author

/ok to test

@github-actions github-actions bot added the cudf.pandas Issues specific to cudf.pandas label Jun 24, 2024
@brandon-b-miller
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit bd76bf6 into rapidsai:branch-24.08 Jun 24, 2024
77 checks passed
@vyasr vyasr mentioned this pull request Jun 24, 2024
5 tasks
sdrp713 added a commit to sdrp713/cudf that referenced this pull request Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue cudf.pandas Issues specific to cudf.pandas feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA] need official EWM function
9 participants