You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have two goals for this issue, and wanted to start a discussion about them:
Separate grouping and sorting semantics
argsort and coargsort should actually sort the array(s). Currently, calling coargsort on a list including a Strings or Categorical will only group, not sort.
GroupBy should guarantee grouping, but not necessarily sorting. Strings and Categorical should have separate APIs for sorting and grouping, and GroupBy should call the latter.
Consolidate GroupBy/findSegments logic and migrate to Chapel
The uniqueMsg function in Chapel actually does 95% of what GroupBy needs. I propose refactoring uniqueMsg and its sub-fuctions to
handle all groupable types: int64 pdarray, Strings, and Categorical, as well as lists of these
optionally return a permutation, segments, and unique key indices, in addition to the unique values. These are already computed internally (or are trivially derivable from what is), and comprise all the information necessary to construct a GroupBy.
Doing so will reduce code (by rendering findSegmentsMsg unnecessary) and improve performance in some cases (e.g. when arrays are packed into chapel tuples for coargsort).
This work would be greatly simplified by having a MultiArray class in the server -- a GenSymEntry that holds multiple equal-length columns, like a dataframe but without all the methods.
The text was updated successfully, but these errors were encountered:
I have two goals for this issue, and wanted to start a discussion about them:
Separate grouping and sorting semantics
argsort
andcoargsort
should actually sort the array(s). Currently, callingcoargsort
on a list including a Strings or Categorical will only group, not sort.GroupBy
should guarantee grouping, but not necessarily sorting. Strings and Categorical should have separate APIs for sorting and grouping, andGroupBy
should call the latter.Consolidate GroupBy/findSegments logic and migrate to Chapel
uniqueMsg
function in Chapel actually does 95% of whatGroupBy
needs. I propose refactoringuniqueMsg
and its sub-fuctions toint64
pdarray, Strings, and Categorical, as well as lists of theseGroupBy
.findSegmentsMsg
unnecessary) and improve performance in some cases (e.g. when arrays are packed into chapel tuples for coargsort).MultiArray
class in the server -- aGenSymEntry
that holds multiple equal-length columns, like a dataframe but without all the methods.The text was updated successfully, but these errors were encountered: