You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
pandas only supports merge operations for DataFrame and Series objects. cudf's merging code also supports merges involving Index objects. This support makes the internals of the merge code excessively convoluted and likely introduces performance overheads for user-facing merge APIs due to the additional logic required to handle index objects (and therefore, to handle input objects that do not themselves have indexes).
Describe the solution you'd like
We cannot simply disable merges for index objects because various internal code paths in cudf assume that Index objects may be merged. Therefore, we should separate logic for merging Indexes from the implementation of the public merge APIs. By identifying the exact use cases for index merging we may be able to significantly accelerate code paths relying on these merges since the implementation of such a merge is likely to be much simpler than the current merge implementation, which has to handle all the complexities associated with the pandas merge API. The change should also save us from needing to introduce complex multiple dispatch patterns as proposed in #9807 (comment).
The text was updated successfully, but these errors were encountered:
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
After #10689 and #11184 index merging is based only on DataFrame merging. There is nothing index-specific left to do here, although we may in the future consider extracting a simpler version of merging that doesn't involve supporting all the different pieces of the pandas merge API.
Is your feature request related to a problem? Please describe.
pandas only supports merge operations for DataFrame and Series objects. cudf's merging code also supports merges involving Index objects. This support makes the internals of the merge code excessively convoluted and likely introduces performance overheads for user-facing merge APIs due to the additional logic required to handle index objects (and therefore, to handle input objects that do not themselves have indexes).
Describe the solution you'd like
We cannot simply disable merges for index objects because various internal code paths in cudf assume that Index objects may be merged. Therefore, we should separate logic for merging Indexes from the implementation of the public merge APIs. By identifying the exact use cases for index merging we may be able to significantly accelerate code paths relying on these merges since the implementation of such a merge is likely to be much simpler than the current merge implementation, which has to handle all the complexities associated with the pandas merge API. The change should also save us from needing to introduce complex multiple dispatch patterns as proposed in #9807 (comment).
The text was updated successfully, but these errors were encountered: