Use KDTrees to support nearest neighbor queries/joins on MultiIndexes? #9365

shoyer · 2015-01-28T07:49:55Z

If we're willing to construct KDTrees when necessary, we can support efficient nearest neighbor queries even in multiple dimensions (i.e., on a MultiIndex) and even for unsorted indexes.

For an example of what you can do with this, take a look at this question I just answered on SO: http://stackoverflow.com/a/28186940/809705

See also: 1d nearest neighbor queries (#8845) and an implementation for sorted indexes (#9258).

jreback · 2015-01-28T11:04:21Z

I think a limited method might be in-scope for pandas (esp if we are using it for something else). How to prevent this from growing into a full-fledged (which needs support) for arbitrary distance functions (e.g. would this be different from scikit-learn impl?). Should we in effect use their impl for arbitrary things (e.g. import at run-time like we do scipy?) and only have an impl that is needed internally instead?

shoyer · 2015-01-28T18:27:25Z

I agree, we definitely don't want to reimplement KDTrees in pandas. We should use scipy or scikit-learn as run-time dependencies for this feature -- both of them have suitable KDTree implementations. The scikit-learn version is slightly more flexible, though, with support for more distance metrics. See here and here for more background.

jreback · 2015-01-28T22:32:34Z

so just because scipy is already a partial dep (meaning we import in several differnt places) I would lean towards that a bit

sturlamolden · 2015-01-29T07:53:06Z

Adding more metrics to cKDTree should not be too difficult, but I have not had time to look at it.

mroeschke · 2024-01-27T20:41:04Z

A more comprehensive issue regardinng this topic is in #38650 so closing in favor of that

shoyer added Ideas Long-Term Enhancement Discussions Indexing Related to indexing on series/frames, not to indexes themselves labels Jan 28, 2015

shoyer mentioned this issue Feb 19, 2015

API: Index.get_nearest method #8845

Closed

shoyer mentioned this issue Jun 27, 2015

ENH: tolerance argument for limiting pad, backfill and nearest neighbor reindexing #10411

Merged

mroeschke added Performance Memory or execution speed performance and removed Ideas Long-Term Enhancement Discussions labels Apr 4, 2020

jreback mentioned this issue Dec 23, 2020

ENH: improve partial key indexing performance #38650

Open

mroeschke added the Enhancement label Apr 12, 2021

mroeschke closed this as completed Jan 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use KDTrees to support nearest neighbor queries/joins on MultiIndexes? #9365

Use KDTrees to support nearest neighbor queries/joins on MultiIndexes? #9365

shoyer commented Jan 28, 2015

jreback commented Jan 28, 2015

shoyer commented Jan 28, 2015

jreback commented Jan 28, 2015

sturlamolden commented Jan 29, 2015

mroeschke commented Jan 27, 2024

Use KDTrees to support nearest neighbor queries/joins on MultiIndexes? #9365

Use KDTrees to support nearest neighbor queries/joins on MultiIndexes? #9365

Comments

shoyer commented Jan 28, 2015

jreback commented Jan 28, 2015

shoyer commented Jan 28, 2015

jreback commented Jan 28, 2015

sturlamolden commented Jan 29, 2015

mroeschke commented Jan 27, 2024