-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Index.get_nearest method #8845
Comments
I added a couple of cross-refs above. This should ideally encompass / reconcile with:
So having a Only tricky part here is that in the time-domain you can have a simulatenous |
See also my recently opened issue on the scope of @shoyer Some questions:
|
Ah, I knew we had talked about this before. Somehow I forgot about @jreback Thanks for adding the references! I agree that this should be reconciled with
Yes, this would be quite similar, except for the differences you outline. For example, if it really can't find any matches, it should raise an exception rather than returning NA.
Yes, I think so, unless there is an exact match. I'm generally 👎 on methods that make it easy to do inefficient things without realizing it.
I really would like a method that returns the "nearest" nearest. Returning the lower and upper nearest are both useful things to do, but it would be surprising if they were the default for a method named "nearest". A keyword argument Note: Based on autocomplete considerations, I am now thinking that the right name would be |
Hey Stephan, Nice to see this functionality being built-in. I have been used hacked together version of this for scikit-spectra for a while, and really think anyone who uses float-indexed data will find this extremely useful. Just to add my two cents, I think that the "nearest nearest" mentatility makes the most sense. The keyword
As opposed to
Or am I misunderstanding? I also think the name One issue we ran into was dealing with is what to do when the user oversteps the bounds of the data, do you raise an error or just return the nearest value? For our purposes, it made more sense to throw and error, but the data was strictly monotonically increasing and had clear upper and lower limits. I guess the more general case would be that index floats would have no clear limits and would not necessarily be sorted/monotonic. What would happen in the case of duplicate values in the index? And that's how you build a bikeshed. |
Not sure I follow. Suppose the index in your example is given by
Hmm. We could certainly do
For this use case, I think you'll want an
Not supported in my current PR (the result is ambiguous for looking up an indexer). The index needs to have unique and sorted values (either ascending or descending). |
Cool. Sorry, I haven't had a chance to use the IntervalIndex because I'm still bogged down in 0.14. I see what you mean about the side argument now. I was stuck in my own use cases I guess, where we generally know our index, but the float rounding is the pain. IE our data is Are you planning to have a nearest indexer that would work like
IE 2D nearest indexing? |
@hugadams IntervalIndex hasn't been merged yet -- still sitting in a PR :). I think something like |
nice question to show perf of nearest |
xref #3004
xref #841
xref #7873
xref #7223
xref #8815
Building on @immerrr's excellent refactor in #8753, I would like to propose adding a
get_nearest
method to pandas.Index that does nearest neighbor lookups. The idea is that nearest neighbor lookups are usually the desirable/sane thing to do when using inexact indexes. Eventually, we might want to add an alternative "wrapper index" (like IntervalIndex), e.g.,NearestNeighborIndex
which switches the default behavior; this would be an intermediate step in that direction.The implementation would be a simple wrapper that calls
Index.get_slice_bound
twice, once to the left and once to the right. Ideally, this would this would even work for array-like arguments, though perhaps there should be separateget_nearest_loc
andget_nearest_indexer
methods.The text was updated successfully, but these errors were encountered: