-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Add per algorithm aggregation
derived types
#7106
Comments
Present code offers to throw an error if the aggregation is not supported for the respective algorithm. if support is added, it starts working without change to user of the code. The suggested solution adds an extra cost for user of libcudf APIs (python and spark), to maintain a matrix of supported operations each in code. This matrix have to be updated simultaneously whenever new support at libcudf is added. Which is preferred? getting a runtime error or maintaining a matrix of supported APIs at each user. |
The lazy part of me says runtime exceptions because we have code that works now and I am too lazy to change it. |
There are some aggregations that will never be supported by a particular algorithm and the current hierarchy makes it seem as if they should be supported. This is confusing at best and misleading at worst. When it's possible to do so, detecting an error at compile time is always preferable to detecting an error at runtime.
That's a feature, not a bug. The fact that the current system shares a common, single set of types was never something I was happy about. Users have absolutely no clue about what aggregations are supported by a particular algorithm without checking documentation (which gets stale) or simply trying all the possible aggs and seeing what works. What's more, when new agg support is added to an algorithm, it still requires callers to update their code to know that they can pass the new agg to libcudf. As aggregations support becomes more complex, the current system is not going to be sustainable. |
I totally agree and the longer we wait the harder it will be to fix it. |
This issue has been labeled |
Still relevant. |
This issue has been labeled |
…ement. (#8052) Partially addresses #7106 Fundamentally, this changes the aggregation class hierarchy in the following ways: - The base `aggregation` class becomes abstract, with the `clone()` and` finalize()` functions being pure virtual. - Every aggregation type now has a concrete class associated with it, derived from `aggregation`. - "Intermediate" classes such as `rolling_aggregation` are used to allow individual algorithms to only accept aggregation types that are valid for it (as opposed to enforcing this internally at runtime). All of the rolling_window interfaces have been updated to take a `rolling_aggregation`. Other algorithms such as groupby are not yet converted and still take generic `aggregation` objects. Marking this as Do Not Merge for now since this is a breaking change with immediately implications for Spark. Authors: - https://github.com/nvdbaranec Approvers: - Jake Hemstad (https://github.com/jrhemstad) - Mike Wilson (https://github.com/hyperbolic2346) - Vyas Ramasubramani (https://github.com/vyasr) - Ashwin Srinath (https://github.com/shwina) URL: #8052
Done |
@nvdbaranec should this issue still be open? It's not quite clear how much of it was addressed in #8052, what's being done now in #8906, and whether there's further work after #8906 is merged (that perhaps should be in a different issue?). |
…e their usage. (#8906) Followup to #8052 Partially addresses #7106 Adds the `groupby_aggregation` class and forces usage of that type when calling `groupby::aggregate()` Adds the `groupby_scan_aggregation` class and forces usage of that type when calling `groupby::scan()` Authors: - https://github.com/nvdbaranec Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Jake Hemstad (https://github.com/jrhemstad) - David Wendt (https://github.com/davidwendt) - GALI PREM SAGAR (https://github.com/galipremsagar) - Nghia Truong (https://github.com/ttnghia) - Devavret Makkar (https://github.com/devavret) URL: #8906
Is your feature request related to a problem? Please describe.
libcudf uses a common
aggregation
polymorphic base class for uniformly specifying the requested aggregation to APIs likecudf::reduce, cudf::scan, cudf::groupby
, etc. All of these APIs take a base class pointeraggregation*
which can then be cast to a more derived type likemin_aggregation
.The problem is, not all APIs that take an
aggregation*
support all possible aggregation kinds. For example,cudf::scan
doesn't support aLEAD
aggregation asLEAD
is specific to rolling. However, there is nothing to tell a user ofcudf::scan
that attempting to call:wouldn't work. The only recourse a user has is documentation (which is currently lacking for aggregations) or attempting to call an invalid combination and (hopefully) getting a runtime error in the form of an exception.
Describe the solution you'd like
We should use the type system to make it more explicit what aggregations are supported by an algorithm like
scan
orreduce
.One way I've thought of doing this is introducing per-algorithm tag types that can be selectively inherited from depending on if an aggregation is supported by a particular algorithm. For example,
(note the
virtual
inheritance is important here to avoid the diamond problem. Multiple inheritance can be tricky. See https://isocpp.org/wiki/faq/multiple-inheritance)We'd need to modify the aggregation factories to specify which base to use:
Using it would then look like:
Here's a working example: https://godbolt.org/z/6b8KjY
The text was updated successfully, but these errors were encountered: