Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate and standardize grouping API #1212

Merged
merged 3 commits into from
Mar 15, 2022
Merged

Consolidate and standardize grouping API #1212

merged 3 commits into from
Mar 15, 2022

Conversation

reuster986
Copy link
Collaborator

This PR standardizes the grouping API and allows user-defined classes to support GroupBy with other arrays by defining the API methods.

Integral pdarrays, Strings, and Categoricals now suppport the grouping API below. For a user-defined class to be groupable, it must inherit from ak.pdarray and define or overload the grouping API:

  1. a ._get_grouping_keys() method that returns a list of pdarrays that can be (co)argsorted.
  2. (Optional) a .group() method that directly returns the permutation that groups the array

If the input is a single array with a .group() method defined, method 2 will be used to find the permuation; otherwise, method 1 will be used. To find the segment offsets and unique keys, the keys from ._get_grouping_keys() are always used, so that method must always be defined.

In the future, I plan to consolidate grouping even further under the ak.unique() function, so that it returns the permutation, segments, and unique keys all from one place. However, this requires more extensive changes to the server, whereas this PR helps us move incrementally in that direction while meeting an immediate need for grouping on user-defined classes with non-standard grouping keys.

Copy link
Contributor

@Ethan-DeBandi99 Ethan-DeBandi99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@mhmerrill mhmerrill merged commit e479ed0 into master Mar 15, 2022
@reuster986 reuster986 deleted the grouping-API branch April 21, 2022 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants