-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make string sorting more consistent #677
Comments
Good question. Ideally, arkouda would offer users both "grouping" semantics and "sorting" semantics for strings, with So ideally I would like to rework |
Ah, ok -- yeah that makes a lot of sense. I didn't understand how sorting the hashes was useful in terms of actually sorting the raw strings, but that makes a lot of sense that coargsort was primarily be used for grouping. I agree, let's make coargsort sort the raw strings, and have groupby call a new server command. |
@reuster986 @ronawho did we satisfy this issue yet? |
Not so far as I know. |
The high-level actions required by this issue are:
Personally, I think it makes sense to wait on this issue until we have support for complex objects in the symbol table. Then, we can design the new In short, this is a pretty major effort that we will have to plan for in the future. |
@reuster986 @ronawho can we close this? |
I think we should close this issue in favor of #1348, and roll over any requirements might are not currently captured |
String argsort operates on the raw variable-length strings, but coargsort operates on the hash of the strings. Should these be unified? It's not obvious to me why they differ or when it's appropriate to sort the hash vs the raw strings.
From a performance perspective, it seems to be slightly faster to sort small strings directly, but at some point sorting the hash is faster. Sorting the hash should roughly be the time to hash + 4 * int(32) radix sort time (since we're sorting 128-bit hashes).
Here's a branch that compares twoPhaseArgSort to hash+radixSort:master...ronawho:str-argsort-hash
16-node-cs-hdr performance results with varying string sizes:
The text was updated successfully, but these errors were encountered: