Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorted collection: use binary search to insert items #3243

Closed
wants to merge 1 commit into from

Conversation

megawac
Copy link
Collaborator

@megawac megawac commented Jul 26, 2014

Use _.sortedIndex to determine the new position of an element when inserting items in the simple iterator case. Continue delegating to sort for the comparator case.

Resolves #3242

@dts
Copy link

dts commented Jul 26, 2014

The comparator case is pretty important to me - should I open a ticket over at underscore to have _.sortedIndex extended to include support for comparators?

@jashkenas
Copy link
Owner

Can we get a JSPerf along with this change? (this is how it used to work)

@megawac
Copy link
Collaborator Author

megawac commented Jul 28, 2014

It actually turns out to be breaking as sortedIndex doesn't align with sortBy for several cases. See discussion: lodash/lodash@b07337f#commitcomment-7164123
Was going to file an issue in a day or two

var array = [{a: NaN}, {a: NaN}, {a: {}}, {a: 1}, {a: 2}];
_.sortedIndex(array, {a: 3}, 'a'); // => 0
var sorted = _.sortBy(array.concat({a: 3}), 'a');
_.findIndex(sorted, {a: 3}); // => 5

@joshuabambrick
Copy link

If the pull request #3207 is merged, then the search method may provide the functionality you are looking for.

Furthermore, as I have mentioned in another issue, one thing that you may want to consider is whether or not your changes will compromise performance in the condition that the models parameter has many models. I haven't looked at your code in detail so I am not sure whether or not you have accounted for this. At present, the changes you suggested will not make a significant difference when individual models are added unless many are added in separate loop iterations. It is quite common that many models will be added with one call to set, especially on the initial load of an app when performance optimisation is key. The current implementation of adding all these elements, unsorted, to the collection and then re-sorting it all will be faster than adding each of them individually via the binary insert method you mentioned.

As I discussed, if you are adding u items to a collection of ultimately n items, the current implementation will take O(n lg n) time but a binary insertion of each model will take O(u n) time. If u is 1, then the current approach could be better, but if u very large it is optimal. The difference in time for many models will be far worse if this is the approach that is taken, and the improvements offered for individual items are unlikely to be noticed - unless many items will be added separately, in which case we should recommend that users group them together.

@dts
Copy link

dts commented Aug 10, 2014

I don't know the performance impact of inserting items in the middle of a JS list. If insertion is an O(1) operation, then binary insertion sort is O(u log n), if it is slow, then you wind up with O(u n). I had been assuming that it was negligible. Either way, we could do sorting all-at-once with the .sort() method if there are more than one element being added, and sort one-at-a-time if there is only one.

@joshuabambrick
Copy link

Insertion will almost certainly be O(n), see stackoverflow.

It may be possible to estimate on the fly what insertion implementation will be faster, and I am not necessarily opposed to the idea. However, it would certainly make the code more complicated and increase the file size. This decision is a tradeoff between those issues and the performance gains.

My immediate thoughts are that if users are so concerned with the speed of their code to care, then they could easily speed it up themselves by either pausing sorting while items are added and then explicitly calling sort, or grouping models into one array before adding.

@megawac
Copy link
Collaborator Author

megawac commented Aug 10, 2014

@dts run time analysis can be sketchy for this sort of thing... The current approach is technically O([m + ]3(m + n)ln(m + n)) whereas this approach is O(mnln(n)). Requires benchmarking for expected case

@disruptek
Copy link

One must also consider that there may be performance side-effects of emitting a sort event from the Collection; a listener to sort often has to essentially start from scratch with respect to assumptions about the Collection's order.

What we currently convey with a sort event is just this -- "The order may have changed, and we don't know where or how." What we'd like to convey in an ordered insertion is, "There's a new member in the list, and we positioned it here." Thus the sort event confuses the semantics of the operation in a way that is often detrimental to performance without any regard to run time analysis. Debouncing doesn't solve this semantic issue, and it of course comes with its own performance and complexity issues.

For these reasons, I'm currently using Backbone.VirtualCollection in order to add this binary search functionality for insertions. This must of course be layered over a standard unordered Backbone Collection. This solution is working well for me, but I consider the semantic issue above to be a design defect -- albeit one that users are likely relying upon.

@akre54 akre54 mentioned this pull request Apr 16, 2015
3 tasks
@jashkenas
Copy link
Owner

If it's breaking still, let's leave this be for now...

As I said up top — sortedIndex insert at position is how this used to be implemented. We changed to sort'ing for some reason at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Poor "add" performance on sorted collections
5 participants