-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use KDTrees to support nearest neighbor queries/joins on MultiIndexes? #9365
Comments
I think a limited method might be in-scope for pandas (esp if we are using it for something else). How to prevent this from growing into a full-fledged (which needs support) for arbitrary distance functions (e.g. would this be different from |
I agree, we definitely don't want to reimplement KDTrees in pandas. We should use scipy or scikit-learn as run-time dependencies for this feature -- both of them have suitable KDTree implementations. The scikit-learn version is slightly more flexible, though, with support for more distance metrics. See here and here for more background. |
so just because scipy is already a partial dep (meaning we import in several differnt places) I would lean towards that a bit |
Adding more metrics to cKDTree should not be too difficult, but I have not had time to look at it. |
A more comprehensive issue regardinng this topic is in #38650 so closing in favor of that |
If we're willing to construct KDTrees when necessary, we can support efficient nearest neighbor queries even in multiple dimensions (i.e., on a MultiIndex) and even for unsorted indexes.
For an example of what you can do with this, take a look at this question I just answered on SO: http://stackoverflow.com/a/28186940/809705
See also: 1d nearest neighbor queries (#8845) and an implementation for sorted indexes (#9258).
The text was updated successfully, but these errors were encountered: