Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_.sortedFind to go with _.sortedIndex #68

Closed
shazow opened this issue Dec 1, 2010 · 11 comments
Closed

_.sortedFind to go with _.sortedIndex #68

shazow opened this issue Dec 1, 2010 · 11 comments

Comments

@shazow
Copy link

shazow commented Dec 1, 2010

Function request:

A _.sortedFind function to go with _.sortedIndex (or perhaps call it _.sortedIndexOf), which would behave similarly to the Array's native indexOf except it would assume it's a sorted array thus using binary search.

_.sortedFind(array, obj, ...) would return the index position of the first occurrence of obj within array, else -1 if not found.

The code to implement this is almost identical to _.sortedIndex (and therefore could use it) except it would return a -1 if the object is not found.

@jashkenas
Copy link
Owner

Hmm. Do you really need Underscore to add a function for this?

_.sortedFind = function(list, object) {
  return list[_.sortedIndex(list, object)];
};

@shazow
Copy link
Author

shazow commented Dec 1, 2010

Well really it would be more like...

_.sortedFind = function(list, object) {
  var i = _.sortedIndex(list, object);
  if(list[i] == object) return i;
  return -1;
};

The point isn't that it's difficult to implement but rather that it feels incomplete to have a function which inserts into an array in-order but no function to query for a location of an object in said array while taking advantage of its sortedness.

As far as I can think of, there's only two uses for sortedIndex: Keeping a sorted list for display-purposes (simple iteration), and for maintaining an efficient datastructure. My request is for the latter scenario. If there's no interest in supporting that, then I'll maintain my own helpers. :)

@jashkenas
Copy link
Owner

I'm sorry -- I misunderstood the proposal. I now understand that this is an optimization of indexOf, to (perhaps) perform better on arrays that are already sorted. Unfortunately, I don't think that the case has been made for this particular optimization: you would need to prove that on a real-world array, sortedIndexOf could perform faster than indexOf already does, taking into account the fact that it almost always uses the native version -- and I think that's a very open question. Closing the ticket as a wontfix. If you can demonstrate a dramatic performance difference, we can reopen it.

@shazow
Copy link
Author

shazow commented Dec 3, 2010

Sorry, I didn't think there was any doubt that an O(logn) binary search performs better than an O(n) full scan. Here's a benchmark, inserts a sorted array and tries to find each element in the array plus same number of elements that don't exist in the array:

function sortedIndexOf(array, obj) {
  var low = 0, high = array.length;
  while (low < high) {
    var mid = (low + high) >> 1;
    array[mid] < obj ? low = mid + 1 : high = mid;
  }
  if(array[low] == obj) return low;
  return -1;
}

function run_tests(test_size) {
    var a = [];

    for(var i=0; i<test_size; i++) a.push(i);
    var now = new Date();

    for(var i=test_size*2; i>0; i--) a.indexOf(i);
    console.log("Native indexOf: " + ((new Date()) - now) + 'ms');

    now = new Date();
    for(var i=test_size*2; i>0; i--) sortedIndexOf(a, i);
    console.log("sortedIndexOf: " + ((new Date()) - now) + 'ms');
}

run_tests(10000);

And the results on my MBP:

Native indexOf: 858ms
sortedIndexOf: 6ms

@jashkenas
Copy link
Owner

Thanks -- it's been a bit of a policy at this point to ask for benchmarks before starting on an optimization expansion to the API, regardless of the underlying idea. One more question: do you think it's problematic that, if the sorted array contains more than one copy of the value you're looking for, you'll find the index of a random copy of the value? Adding an unrelated element to the end of the array could change the index returned, the next time you call sortedIndexOf...

@shazow
Copy link
Author

shazow commented Dec 3, 2010

The algorithm could be altered to always return the first such element in the array with minimal impact to the performance to match the behaviour of the native indexOf (just be careful not to fall back into O(n) for cases when the array is full of the same elements), and since JavaScript 1.6 there is also a lastIndexOf if that's something you're interested in replicating.

In my use cases, this doesn't affect me, but I can see an upside to replicating the behaviour of the native indexOf (even at the cost of a few extra comparisons).

Edit: If it were up to me, I'd leave the implementation as is (returning whatever index as long as it matches the element) until there is demand for a different behaviour. Perhaps put a note in the docs.

@shazow
Copy link
Author

shazow commented Dec 3, 2010

Also I think I accidentally closed the ticket, my bad.

@jashkenas
Copy link
Owner

Sure thing -- reopened the ticket.

@jashkenas
Copy link
Owner

Alright -- I've added it as sortedIndexOf, here:

8c2570b

Thanks for tolerating my denseness earlier in the ticket -- I don't know what I was thinking ;)

@jashkenas
Copy link
Owner

Whoops, nevermind. Sam Clay reminded me that since the API is identical, this should really be a flag passed to _.indexOf ... and that works well:

d2d6cfe

@shazow
Copy link
Author

shazow commented Dec 17, 2010

Thank you. :)

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants