-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor isin implementations #10165
Conversation
… categorical DataFrame.
Codecov Report
@@ Coverage Diff @@
## branch-22.04 #10165 +/- ##
================================================
+ Coverage 10.42% 10.64% +0.21%
================================================
Files 119 122 +3
Lines 20603 20939 +336
================================================
+ Hits 2148 2228 +80
- Misses 18455 18711 +256
Continue to review full report at Codecov.
|
thanks for this very sorely needed PR @vyasr , I am reviewing this over the next few days. One general thought I had going over this is that I think we should create a file |
Co-authored-by: brandon-b-miller <[email protected]>
I'm not opposed, but I feel like this fits into our broader discussion in #9999 about how tests should be organized. Do we want a separate file for each type of functionality? We shouldn't make this discussion differently for different functions, otherwise it will become very difficult to know where to look for things. Also, let me know when you've finished reviewing enough for me to come back to this! |
rerun tests |
@gpucibot merge |
This PR fixes a number of error cases around the implementation of
isin
, particularly involving categorical dtypes and index alignment when called on aDataFrame
. It also makes significant changes to simplify and improve the performance ofDataFrame.isin
, resulting in a 10-40% speedup when called with aDataFrame
orSeries
as the argument (depending on the data sizes).