Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuDF/libcudf exponentially weighted moving averages #9027

Merged
merged 162 commits into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from 141 commits
Commits
Show all changes
162 commits
Select commit Hold shift + click to select a range
9bec99f
initial
brandon-b-miller Jul 27, 2021
4c479f6
a few python updates
brandon-b-miller Jul 28, 2021
b99c0b3
need to jump to something else
brandon-b-miller Jul 28, 2021
f90addf
Merge branch 'branch-21.10' into fea-ewm
brandon-b-miller Aug 5, 2021
adbcb59
testing
brandon-b-miller Aug 6, 2021
f4666f0
up and running
brandon-b-miller Aug 9, 2021
7f2bd97
style, use a functor
brandon-b-miller Aug 9, 2021
9e4bc0a
updates
brandon-b-miller Aug 9, 2021
6021d78
very basic test
brandon-b-miller Aug 10, 2021
33b9731
add tests and generalize over kwargs
brandon-b-miller Aug 10, 2021
1127fef
what?
brandon-b-miller Aug 11, 2021
1dfb4ba
move stuff around
brandon-b-miller Aug 11, 2021
d75f1bc
refactor
brandon-b-miller Aug 11, 2021
a3c55e6
up and running!
brandon-b-miller Aug 11, 2021
002de60
Merge branch 'branch-21.10' into fea-ewm
brandon-b-miller Aug 13, 2021
4383246
Merge branch 'branch-21.10' into fea-ewm
brandon-b-miller Aug 23, 2021
1855852
add adjust kwarg that so far does nothing, but compiles
brandon-b-miller Aug 23, 2021
8827f57
cleanup
brandon-b-miller Aug 23, 2021
edd54eb
building out adjust=False
brandon-b-miller Aug 23, 2021
ca6ab06
add tests for adjust=False for ewma
brandon-b-miller Aug 23, 2021
318dff5
progress, some failing tests
brandon-b-miller Aug 24, 2021
dba2c03
Merge branch 'branch-21.10' into fea-ewm
brandon-b-miller Aug 25, 2021
5413bbc
fix ewma noadjust
brandon-b-miller Aug 25, 2021
668e9a3
ewmvar plumbed to return ewma as a placeholder
brandon-b-miller Aug 25, 2021
2414cf3
ewmvar up and running!
brandon-b-miller Aug 31, 2021
a9b9afd
add ewmstd
brandon-b-miller Aug 31, 2021
27ec9bf
slowly refactoring
brandon-b-miller Sep 1, 2021
2de8ab0
refactoring
brandon-b-miller Sep 1, 2021
800abb1
reformatting
brandon-b-miller Sep 1, 2021
d252c74
more refactoring
brandon-b-miller Sep 1, 2021
0994e4d
aggregations is_equal
brandon-b-miller Sep 2, 2021
f3aa3b1
merge latest
brandon-b-miller Sep 8, 2021
3e67839
Merge branch 'branch-21.10' into fea-ewm
brandon-b-miller Sep 8, 2021
2ceb8bc
fix bad merge
brandon-b-miller Sep 8, 2021
d271a20
migrate methods to frame
brandon-b-miller Sep 9, 2021
1455415
import fix
brandon-b-miller Sep 9, 2021
82e388c
basic type casting and error handling
brandon-b-miller Sep 9, 2021
c76740d
ewm allocates memory for pairs and reuses everything else
brandon-b-miller Sep 9, 2021
83c73e7
switch to device_uvector and fix bugs
brandon-b-miller Sep 9, 2021
520247b
update aggregations
brandon-b-miller Sep 10, 2021
eb74b0d
partial non working commit - have to jump to something else
brandon-b-miller Sep 10, 2021
fc08fda
updates and null handling
brandon-b-miller Sep 15, 2021
80f16ea
refactoring
brandon-b-miller Sep 16, 2021
d7f3a6d
ewmvar nulls pass for adjust=True
brandon-b-miller Sep 16, 2021
54710c4
minor cleanup
brandon-b-miller Sep 16, 2021
fc9065e
minor cleanup
brandon-b-miller Sep 17, 2021
71599e0
factor out null_roll_up
brandon-b-miller Sep 17, 2021
6513f69
correct null handling for ewm noadjust
brandon-b-miller Sep 20, 2021
2019564
cleanup
brandon-b-miller Sep 20, 2021
f51e884
partial bias plumbing
brandon-b-miller Sep 20, 2021
728890a
little more plumbing
brandon-b-miller Sep 20, 2021
6d43161
refactor tests
brandon-b-miller Sep 21, 2021
641ac93
partial bias cases
brandon-b-miller Sep 21, 2021
298d26c
partial
brandon-b-miller Sep 22, 2021
29e00bf
attempt to merge latest - possibly unrelated build error
brandon-b-miller Nov 4, 2021
d3b5c3d
merge 21.12
brandon-b-miller Nov 18, 2021
eaf9a41
cut things down for now
brandon-b-miller Nov 18, 2021
08814c6
test cleanup
brandon-b-miller Nov 19, 2021
d774d56
move files around
brandon-b-miller Nov 23, 2021
0db0605
refactor, generalize over types (?)
brandon-b-miller Nov 23, 2021
b5e01d9
fix bug where null roll up was not being correctly computed
brandon-b-miller Nov 23, 2021
c904e2c
refactor to conserve memory
brandon-b-miller Nov 23, 2021
af8ce79
style
brandon-b-miller Nov 23, 2021
41193a0
some docs
brandon-b-miller Nov 23, 2021
d02b377
greatly refactor and address reviews
brandon-b-miller Nov 24, 2021
72aff2e
add docs on the python side
brandon-b-miller Nov 24, 2021
b5851ad
minor change
brandon-b-miller Nov 24, 2021
36af4b9
Merge branch 'branch-22.02' into fea-ewm
brandon-b-miller Nov 24, 2021
f03cf25
basic framework for tests
brandon-b-miller Nov 30, 2021
d357a4f
basic tests
brandon-b-miller Nov 30, 2021
3f67ba6
add tests with nulls
brandon-b-miller Nov 30, 2021
eabdefe
minor touchups
brandon-b-miller Nov 30, 2021
6c236cf
remove debugging code
brandon-b-miller Nov 30, 2021
1beac09
convert nans to nulls in the input
brandon-b-miller Nov 30, 2021
a1a65c0
fix segfault
brandon-b-miller Dec 1, 2021
d344a01
minor cleanup
brandon-b-miller Dec 1, 2021
8437772
merge latest and fix bugs
brandon-b-miller Dec 3, 2021
c17a876
Apply suggestions from code review
brandon-b-miller Dec 3, 2021
a29787f
small compilation bug fix
brandon-b-miller Dec 3, 2021
6d2bb74
partially address reviews
brandon-b-miller Dec 3, 2021
e279915
address more reviews
brandon-b-miller Dec 3, 2021
fa976ca
python style
brandon-b-miller Dec 3, 2021
dbe50a9
cython style
brandon-b-miller Dec 3, 2021
107ceef
cmake style ..git add cpp/tests/CMakeLists.txt
brandon-b-miller Dec 3, 2021
5a205b3
avoid double allocation
brandon-b-miller Dec 15, 2021
3ea290b
Merge branch 'branch-22.02' into fea-ewm
brandon-b-miller Dec 16, 2021
d3bcaf1
Merge branch 'branch-22.02' into fea-ewm
brandon-b-miller Jan 5, 2022
8210601
switch to an enum
brandon-b-miller Jan 5, 2022
d0d47b0
inline compute_recurrence
brandon-b-miller Jan 5, 2022
22ebffa
Apply suggestions from code review
brandon-b-miller Jan 5, 2022
e14144f
reorganize if blocks
brandon-b-miller Jan 5, 2022
092a49f
move to a counting iterator in unadjusted case
brandon-b-miller Jan 5, 2022
2e73f8c
updates to ewma
brandon-b-miller Jan 5, 2022
e9ea142
python updates
brandon-b-miller Jan 5, 2022
219444e
fuse in pair_beta_adjust in compute_ewma_adjust
brandon-b-miller Jan 8, 2022
beb4d7e
more kernel fusion in compute_ewma_adjust
brandon-b-miller Jan 8, 2022
81e10fb
Merge branch 'branch-22.02' into fea-ewm
brandon-b-miller Jan 10, 2022
9a69760
fuse more kernels and add test for untested branch in lambda
brandon-b-miller Jan 10, 2022
8cfc207
continue fusing kernels
brandon-b-miller Jan 10, 2022
a9d2c8c
updates
brandon-b-miller Jan 10, 2022
1048428
style
brandon-b-miller Jan 10, 2022
222d516
Merge branch 'branch-22.02' into fea-ewm
brandon-b-miller Jan 11, 2022
283b2c7
cleanup
brandon-b-miller Jan 11, 2022
ea5d348
Merge branch 'branch-22.02' into fea-ewm
brandon-b-miller Jan 13, 2022
3ba7cfc
Merge branch 'branch-22.02' into fea-ewm
brandon-b-miller Jan 18, 2022
b50f247
working through transform_inclusive_scan stuff
brandon-b-miller Jan 18, 2022
9bd6611
continue moving towards transform_inclusive_scan
brandon-b-miller Jan 18, 2022
0f55802
finish up compute_ewma_adjust
brandon-b-miller Jan 18, 2022
6a62a07
last transform_inclusive_scan is in
brandon-b-miller Jan 19, 2022
fad6e10
check for a failed cast
brandon-b-miller Jan 19, 2022
b469116
Merge branch 'branch-22.02' into fea-ewm
brandon-b-miller Jan 19, 2022
5de4a2a
fix tests
brandon-b-miller Jan 19, 2022
feb2efe
Apply suggestions from code review
brandon-b-miller Jan 24, 2022
db0714e
copyrights, docs
brandon-b-miller Jan 24, 2022
91893b1
Apply suggestions from code review
brandon-b-miller Jan 24, 2022
c9eb8a5
renaming
brandon-b-miller Jan 24, 2022
b2b92e3
resolve conflicts and restore changes
brandon-b-miller Jan 24, 2022
7fd8766
resolve merge issues but undefined symbol error
brandon-b-miller Apr 25, 2022
92c63c2
missing files from previous commit
brandon-b-miller Apr 25, 2022
fe43578
passing tests
brandon-b-miller Apr 25, 2022
decba30
Merge branch 'branch-22.06' into fea-ewm
brandon-b-miller Apr 27, 2022
1792034
start combining functors
brandon-b-miller Apr 27, 2022
447e5a0
continue combining functors
brandon-b-miller Apr 27, 2022
e9661b7
even more fusing functors
brandon-b-miller Apr 28, 2022
bd09dbc
just style updates
brandon-b-miller Apr 28, 2022
e62f362
Apply suggestions from code review
brandon-b-miller May 2, 2022
aef2e94
Apply suggestions from code review
brandon-b-miller May 2, 2022
ef68d0a
Apply suggestions from code review
brandon-b-miller May 2, 2022
02c88fa
merge latest and resolve conflicts
brandon-b-miller May 2, 2022
5953a22
move stuff to parent class, invert constexprs
brandon-b-miller May 2, 2022
167fa7f
cleanup
brandon-b-miller May 2, 2022
05d684d
continue addressing reviews
brandon-b-miller May 2, 2022
0e02b40
Merge branch 'fea-ewm' of github.com:brandon-b-miller/cudf into fea-ewm
brandon-b-miller May 2, 2022
6233774
minor updates
brandon-b-miller May 2, 2022
84fb67c
refactor ewma_adjust_functor
brandon-b-miller May 2, 2022
c09a510
merge upstream
brandon-b-miller Oct 18, 2022
de5e529
minor update
brandon-b-miller Oct 18, 2022
76ed9b6
address reviews
brandon-b-miller Oct 24, 2022
9804908
Merge branch 'branch-22.12' into fea-ewm
brandon-b-miller Oct 24, 2022
c98c37f
convert to new scan api
brandon-b-miller Oct 25, 2022
fdd7a9c
fix style
brandon-b-miller Oct 25, 2022
1b1d9fa
Merge branch 'branch-22.12' into fea-ewm
brandon-b-miller Oct 27, 2022
1964579
partially address reviews
brandon-b-miller Nov 1, 2022
04cd300
Apply suggestions from code review
brandon-b-miller Nov 4, 2022
525bc4c
Merge remote-tracking branch 'upstream/branch-24.08' into fea-ewm
vyasr Jun 12, 2024
8cc0e20
Fuse the kernels
vyasr Jun 13, 2024
c2938b9
Merge remote-tracking branch 'upstream/branch-24.08' into fea-ewm
vyasr Jun 13, 2024
dcd5211
move code around
brandon-b-miller Jan 29, 2024
ea8c738
using CUDF_ENABLE_IF
brandon-b-miller Jan 29, 2024
c5b7f1c
small changes
brandon-b-miller Jan 29, 2024
a7006f3
put everything back under an if/else that checks for nulls
brandon-b-miller Jan 29, 2024
9e5f271
Merge branch 'branch-24.08' into fea-ewm
brandon-b-miller Jun 17, 2024
8a91c47
Address cpp reviews
brandon-b-miller Jun 18, 2024
09a4d39
Address python reviews
brandon-b-miller Jun 18, 2024
997a66b
Various simplifications
vyasr Jun 18, 2024
c9e79a3
A bit more cleanup
vyasr Jun 18, 2024
3fd918d
Style nits
vyasr Jun 18, 2024
4404922
Merge remote-tracking branch 'upstream/branch-24.08' into fea-ewm
vyasr Jun 18, 2024
9ba441c
Style
vyasr Jun 18, 2024
3327e1c
Merge branch 'branch-24.08' into fea-ewm
brandon-b-miller Jun 24, 2024
e21554d
docs
brandon-b-miller Jun 24, 2024
619c9c3
make_intermediate_proxy_type for ewm
brandon-b-miller Jun 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -423,6 +423,7 @@ add_library(
src/reductions/product.cu
src/reductions/reductions.cpp
src/reductions/scan/rank_scan.cu
src/reductions/scan/ewm.cu
src/reductions/scan/scan.cpp
src/reductions/scan/scan_exclusive.cu
src/reductions/scan/scan_inclusive.cu
Expand Down
39 changes: 39 additions & 0 deletions cpp/include/cudf/aggregation.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ class aggregation {
NUNIQUE, ///< count number of unique elements
NTH_ELEMENT, ///< get the nth element
ROW_NUMBER, ///< get row-number of current index (relative to rolling window)
EWMA, ///< get exponential weighted moving average at current index
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved
RANK, ///< get rank of current index
COLLECT_LIST, ///< collect values into a list
COLLECT_SET, ///< collect values into a list without duplicate entries
Expand Down Expand Up @@ -248,6 +249,8 @@ class segmented_reduce_aggregation : public virtual aggregation {
enum class udf_type : bool { CUDA, PTX };
/// Type of correlation method.
enum class correlation_type : int32_t { PEARSON, KENDALL, SPEARMAN };
/// Type of treatment of EWM input values first value
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved
enum class ewm_history : int32_t { INFINITE, FINITE };

/// Factory to create a SUM aggregation
/// @return A SUM aggregation object
Expand Down Expand Up @@ -404,6 +407,42 @@ std::unique_ptr<Base> make_nth_element_aggregation(
template <typename Base = aggregation>
std::unique_ptr<Base> make_row_number_aggregation();

/**
* @brief Factory to create an EWMA aggregation
*
* `EWMA` returns a non-nullable column with the same type as the input,
* whose values are the exponentially weighted moving average of the input
* sequence. Let these values be known as the y_i.
*
* EWMA aggregations are parameterized by a center of mass (`com`) which
vyasr marked this conversation as resolved.
Show resolved Hide resolved
* affects the contribution of the previous values (y_{i-1} ... y_0) in
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved
* computing the y_i.
*
* EWMA aggregations are also parameterized by a history `cudf::ewm_history`.
* Special considerations have to be given to the mathematical treatment of
* the first value of the input sequence. There are two approaches to this,
* one which considers the first value of the sequence to be the exponential
* weighted moving average of some infinite history of data, and one which
* takes the first value to be the only datapoint known. These assumptions
* lead to two different formulas for the y_i. `ewm_history` selects which.
*
* EWMA aggregations have special null handling. Nulls have two effects. The
* first is to propagate forward the last valid value as far as it has been
* computed. This could be thought of as the nulls not affecting the average
* in any way. The second effect changes the way the y_i are computed. Since
* a moving average is conceptually designed to weight contributing values by
* their recency, nulls ought to count as valid periods even though they do
* not change the average. For example, if the input sequence is {1, NULL, 3}
* then when computing y_2 one should weigh y_0 as if it occurs two periods
* before y_2 rather than just one.
*
vyasr marked this conversation as resolved.
Show resolved Hide resolved
* @param center_of_mass the center of mass.
* @param history which assumption to make about the first value
* @return A EWM aggregation object
Copy link
Contributor

@bdice bdice Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specific to moving average, right? Please check me.

Suggested change
* @return A EWM aggregation object
* @return A EWMA aggregation object

*/
template <typename Base = aggregation>
std::unique_ptr<Base> make_ewma_aggregation(double const center_of_mass, ewm_history history);

/**
* @brief Factory to create a RANK aggregation
*
Expand Down
44 changes: 44 additions & 0 deletions cpp/include/cudf/detail/aggregation/aggregation.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ class simple_aggregations_collector { // Declares the interface for the simple
class nth_element_aggregation const& agg);
virtual std::vector<std::unique_ptr<aggregation>> visit(data_type col_type,
class row_number_aggregation const& agg);
virtual std::vector<std::unique_ptr<aggregation>> visit(data_type col_type,
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved
class ewma_aggregation const& agg);
virtual std::vector<std::unique_ptr<aggregation>> visit(data_type col_type,
class rank_aggregation const& agg);
virtual std::vector<std::unique_ptr<aggregation>> visit(
Expand Down Expand Up @@ -122,6 +124,7 @@ class aggregation_finalizer { // Declares the interface for the finalizer
virtual void visit(class nunique_aggregation const& agg);
virtual void visit(class nth_element_aggregation const& agg);
virtual void visit(class row_number_aggregation const& agg);
virtual void visit(class ewma_aggregation const& agg);
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved
virtual void visit(class rank_aggregation const& agg);
virtual void visit(class collect_list_aggregation const& agg);
virtual void visit(class collect_set_aggregation const& agg);
Expand Down Expand Up @@ -631,6 +634,40 @@ class row_number_aggregation final : public rolling_aggregation {
void finalize(aggregation_finalizer& finalizer) const override { finalizer.visit(*this); }
};

/**
* @brief Derived class for specifying an ewma aggregation
*/
class ewma_aggregation final : public scan_aggregation {
public:
double const center_of_mass;
cudf::ewm_history history;
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved

ewma_aggregation(double const center_of_mass, cudf::ewm_history history)
: aggregation{EWMA}, center_of_mass{center_of_mass}, history{history}
{
}

std::unique_ptr<aggregation> clone() const override
{
return std::make_unique<ewma_aggregation>(*this);
}

std::vector<std::unique_ptr<aggregation>> get_simple_aggregations(
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved
data_type col_type, simple_aggregations_collector& collector) const override
{
return collector.visit(col_type, *this);
}

bool is_equal(aggregation const& _other) const override
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved
{
if (!this->aggregation::is_equal(_other)) { return false; }
auto const& other = dynamic_cast<ewma_aggregation const&>(_other);
return this->center_of_mass == other.center_of_mass and this->history == other.history;
}

void finalize(aggregation_finalizer& finalizer) const override { finalizer.visit(*this); }
};

/**
* @brief Derived class for specifying a rank aggregation
*/
Expand Down Expand Up @@ -1273,6 +1310,11 @@ struct target_type_impl<Source, aggregation::ROW_NUMBER> {
using type = size_type;
};

template <typename Source>
struct target_type_impl<Source, aggregation::EWMA> {
using type = double;
};

// Always use size_type accumulator for RANK
template <typename Source>
struct target_type_impl<Source, aggregation::RANK> {
Expand Down Expand Up @@ -1437,6 +1479,8 @@ CUDF_HOST_DEVICE inline decltype(auto) aggregation_dispatcher(aggregation::Kind
return f.template operator()<aggregation::NTH_ELEMENT>(std::forward<Ts>(args)...);
case aggregation::ROW_NUMBER:
return f.template operator()<aggregation::ROW_NUMBER>(std::forward<Ts>(args)...);
case aggregation::EWMA:
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved
return f.template operator()<aggregation::EWMA>(std::forward<Ts>(args)...);
case aggregation::RANK:
return f.template operator()<aggregation::RANK>(std::forward<Ts>(args)...);
case aggregation::COLLECT_LIST:
Expand Down
22 changes: 22 additions & 0 deletions cpp/src/aggregation/aggregation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,12 @@ std::vector<std::unique_ptr<aggregation>> simple_aggregations_collector::visit(
return visit(col_type, static_cast<aggregation const&>(agg));
}

std::vector<std::unique_ptr<aggregation>> simple_aggregations_collector::visit(
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved
data_type col_type, ewma_aggregation const& agg)
{
return visit(col_type, static_cast<aggregation const&>(agg));
}

std::vector<std::unique_ptr<aggregation>> simple_aggregations_collector::visit(
data_type col_type, rank_aggregation const& agg)
{
Expand Down Expand Up @@ -317,6 +323,11 @@ void aggregation_finalizer::visit(row_number_aggregation const& agg)
visit(static_cast<aggregation const&>(agg));
}

void aggregation_finalizer::visit(ewma_aggregation const& agg)
{
visit(static_cast<aggregation const&>(agg));
}

void aggregation_finalizer::visit(rank_aggregation const& agg)
{
visit(static_cast<aggregation const&>(agg));
Expand Down Expand Up @@ -622,6 +633,17 @@ std::unique_ptr<Base> make_row_number_aggregation()
template std::unique_ptr<aggregation> make_row_number_aggregation<aggregation>();
template std::unique_ptr<rolling_aggregation> make_row_number_aggregation<rolling_aggregation>();

/// Factory to create an EWMA aggregation
template <typename Base>
std::unique_ptr<Base> make_ewma_aggregation(double const com, cudf::ewm_history history)
{
return std::make_unique<detail::ewma_aggregation>(com, history);
}
template std::unique_ptr<aggregation> make_ewma_aggregation<aggregation>(double const com,
cudf::ewm_history history);
template std::unique_ptr<scan_aggregation> make_ewma_aggregation<scan_aggregation>(
double const com, cudf::ewm_history history);

/// Factory to create a RANK aggregation
template <typename Base>
std::unique_ptr<Base> make_rank_aggregation(rank_method method,
Expand Down
Loading