-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
searchsorted function implementation #1284
Comments
First version of searchsorted function was implemented via MapReduce approach, but it was found that function performance degraded in comparison with pure Pandas. The reason is that complexity of numpy.searchsorted (that is called in Pandas) is very low and Modin conversions and reduce stage overheads became significant. Looking for a way to solve this issue. |
@amyskov Is it possible with just a MapFunction.register(lambda x, value, side, sorter: x.squeeze(axis=1).searchsorted(value, side, sorter)) That is a start, you will need to convert the result of |
@devin-petersohn i don't think that it can help us to solve this problem, because after calling map function on each partition, we will obtain results from each function call, which have to be processed (reduced) anyway. |
What is the execution time of the |
Script for execution time measurement
|
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
…y case Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
Signed-off-by: Alexander Myskov <[email protected]>
implementation of this function was reverted and now defaults to pandas #2655, new implementation is needed |
No description provided.
The text was updated successfully, but these errors were encountered: