Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paper cut: is a slice / Iterator sorted? #44370

Closed
3 tasks
gnzlbg opened this issue Sep 6, 2017 · 21 comments
Closed
3 tasks

paper cut: is a slice / Iterator sorted? #44370

gnzlbg opened this issue Sep 6, 2017 · 21 comments
Labels
A-collections Area: `std::collection` C-feature-accepted Category: A feature request that has been accepted pending implementation. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 6, 2017

I just used binary_search inside a loop and realized that it would be nice to check before the loop if my immutable slice was sorted. So I spent 10 minutes going through the standard library just to find out that there aren't any is_sorted, is_sorted_by, is_sorted_by_key, etc. algorithms implemented for neither slices nor iterators.

We should add:

  • Iterator::is_sorted
  • Iterator::is_sorted_by
  • Iterator::is_sorted_by_key

and since Iterator is implemented for slices, we can add those methods to slices as well by just forwarding to the Iterator implementation.

@gnzlbg gnzlbg changed the title paper cut: check if a slice is sorted before using binary search paper cut: is a slice / Iterator sorted? Sep 6, 2017
@arielb1
Copy link
Contributor

arielb1 commented Sep 6, 2017

We also shouldn't change it to be O(n) on debug-assertions builds - debug-assertions should be for the last 2x of performance, not for potential order-of-magnitude slowdowns.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 6, 2017

@arielb1 sorry this issue turned out to be two issues, I've split that part to #44371.

@leonardo-m
Copy link

In the D language standard library there is also a wrapper for sorted iterables (that performs statistical tests inside its constructor), with it you can create functions that document in the code that return/accept sorted iterables.

@bluss
Copy link
Member

bluss commented Sep 6, 2017

and since Iterator is implemented for slices those methods should work on slices as well.

Just to help clarify here: slices are iterable (IntoIterator for &[T], &mut [T]), but not themselves iterators.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 7, 2017

@bluss what I meant is that we should probably implement those methods on slices as well, but that implementation should just be forwarded to the Iterator version. I've reworded the issue.

@Thiez
Copy link
Contributor

Thiez commented Sep 7, 2017

How would it make sense for this operation to be defined on an iterator? It would have to consume the entire iterator to get the result, leaving you only with the information that the iterator that you now can't use anymore would indeed have been sorted. .is_sorted_by_key(select_the_key) seems particularly redundant, given that it would just be .map(select_the_key).is_sorted().

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 7, 2017

It would have to consume the entire iterator to get the result,

Of course.

leaving you only with the information that the iterator that you now can't use anymore would indeed have been sorted.

There are many useful algorithms that consume an iterator. Iterators don't necessarily own the items; some iterators do, some don't. If you don't want the items to get lost then don't consume an iterator that owns them. You can either copy/clone the iterator before you consume it, or you can create it again. For example, given:

let v: Vec<u32> = vec![0, 1, 2, 3, 4, 5];

then this works

if v.iter().is_sorted() {
  for i in v.iter().map(|i| i* 2) {
    println!("{}", i);
  }
} 

but this won't compile

if v.into_iter().is_sorted() { // vec is consumed here
  for i in v.iter().map(|i| i * 2) {
    println!("{}", i);
  }
} 

Given that iterating over an Iterator consumes the iterator:

let v: Vec<u32> = vec![0, 1, 2, 3, 4, 5];
let it = v.iter().map(|i| i* 2);
for i in it { println!("{}", i); }  // it is moved here
for i in it { println!("{}", i); }  // ERROR: it was already moved

and that is_sorted must, by necessity, look at each element on the sequence, maybe you could elaborate on why you think that consuming an iterator in this algorithm is bad, and suggest what could be done about it instead?

.is_sorted_by_key(select_the_key) seems particularly redundant, given that it would just be .map(select_the_key).is_sorted().

I've added them for both convenience and consistency. The stable and unstable sort, sort_by, and sort_by_key functions are redundant as well, yet they are very convenient. To be consistent with the sort algorithms, I think is_sorted should provide these alternatives as well. However, I also think that having them is convenient.

@the8472
Copy link
Member

the8472 commented Sep 7, 2017

How about a non-terminal .while_sorted() operation? All it would have to do is compare adjacent elements and stop iterating when they are not monotonic. .require_sorted() could panic instead. Maybe this could be generalized into .while_pair(|a,b| ...) and put into itertools?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 7, 2017

@the8472 itertools already has .tuple_window which combined with take_while is more generic than .while_pair.

Anyhow, I am neither in favour nor opposed to doing that, but I think that discussion should belong in a different issue.

This issue is about std neither checking the precondition of binary_search nor providing users the algorithm to check it themselves.

There is some prior art for is_sorted in other standard libraries, e.g., in C++, although there it is implemented on top of is_sorted_until.

@the8472
Copy link
Member

the8472 commented Sep 7, 2017

is_sorted_until seems like a good model. In the rust case that could be sorted_prefix() -> (impl Iterator, impl Iterator). Then a consumer could do a binary search with the prefix and either panic if the tail is non-empty or do something else with it.

@Thiez
Copy link
Contributor

Thiez commented Sep 7, 2017

That seems like a lot of complexity to avoid a single call to sort() (or collect::<Vec<_>>() followed by sort()). Especially since your proposed sorted_prefix would have to perform hidden allocations to do its work anyway.

@the8472
Copy link
Member

the8472 commented Sep 7, 2017

Ok, how about num_sorted() -> usize for iterators? That could get you the sorted prefix of a slice for example. It's more powerful than is_sorted() without additional cost.

@alexcrichton alexcrichton added C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 7, 2017
@TimNN TimNN added the A-collections Area: `std::collection` label Sep 17, 2017
@Ixrec
Copy link
Contributor

Ixrec commented Sep 17, 2017

Are there any concrete use cases for a "check sorted-ness while iterating" method like is_sorted_until() or while_sorted()? I can imagine those theoretically being used for optimizations like allowing if v is sorted { for x in v ... } to do only one iteration over v, or abort the iteration very early if v is likely to be randomly shuffled, but I have no idea what use case would require those particular optimizations. In particular, I don't know of a use case where "iterate only over the sorted prefix" is part of the desired semantics. So methods like that seem niche enough that they should probably start out in itertools or some other crate.

In contrast, it's quite obvious what the is_sorted{,_by{,_key}} trio would be useful for. So much so that it's almost surprising they aren't in std already.

@dtolnay
Copy link
Member

dtolnay commented Nov 15, 2017

I agree that it would be good to have an easy way to check whether a slice is sorted. This exists in Go as sort.IsSorted, C++ as std::is_sorted, D std.algorithm.sorting.is_sorted. Plenty of people are already defining their own version of this in Rust (135 results at present).

I am a bit skeptical of the equivalent on Iterator just because the return value does not seem actionable -- you aren't going to "sort" the iterator after you find out it is not already sorted. What are some use cases for this in real code that does not involve iterating over a slice?

@bluss
Copy link
Member

bluss commented Nov 15, 2017

I don't think it is a problem o consume the iterator, we make and consume iterators for many different tasks. For example any use of Iterator::position consumes an iterator just to answer the position question.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 15, 2017

@dtolnay

Consuming an Iterator is orthogonal to consuming an underlying slice:

let mut v: Vec<_> = vec![1,3,2,4];
let not_sorted: bool = !v.iter().is_sorted();
if  not_sorted {
  v.sort();  // works
}
let not_sorted: bool = !v.iter_mut().is_sorted();
if not_sorted {
  v.sort(); // works
}

vs:

let mut v: Vec<_> = vec![1,3,2,4];
let not_sorted: bool = !v.into_iter().is_sorted();
if  not_sorted {
  v.sort();  // ERROR: v has been moved
}

The point of implementing this on Iterator is that it allows the algorithm to work on everything that can be iterated upon, not only slices.

You mentioned:

This exists in Go as sort.IsSorted, C++ as std::is_sorted, D std.algorithm.sorting.is_sorted

C++ std::is_sorted and D's std.algorithm.sorting.is_sorted require only a ForwardRange, that is, anything that can be Iterated except for streaming iterators.

If they were to work only on slices they would require a ContiguousRange or a RandomAccessRange in C++ parlance.

@dtolnay dtolnay added C-feature-accepted Category: A feature request that has been accepted pending implementation. and removed C-feature-request Category: A feature request, i.e: not implemented / a PR. labels Nov 19, 2017
@dtolnay
Copy link
Member

dtolnay commented Nov 19, 2017

I would be prepared to consider a PR or RFC for this (whichever is easier). Either way please include a list of design alternatives that you evaluated -- method on slice, trait method on Iterator, free function like std::cmp::is_sorted that takes an IntoIterator, etc.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Feb 5, 2018

I've implemented this in this PR for the itertools crate in case someone wants to play with them: rust-itertools/itertools#261

@bluss
Copy link
Member

bluss commented Feb 6, 2018

Since Iterator/Itertools conflicts are pretty thorny, I'd prefer if we can merge them to Iterator in libcore directly if that's an option. But they do make good companions with wherever the .sorted() method is on an Iterator too (the method that sorts the elements and returns an iterator).

@LukasKalbertodt
Copy link
Member

LukasKalbertodt commented Feb 24, 2018

Just FYI: I'll try to write an RFC about this in the next couple of days. I hope to summarize everything that has been said in this thread. I will link it here, once I'm finished (or update this issue if I give up :<).

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 30, 2018

The RFC has been merged: #53485

@gnzlbg gnzlbg closed this as completed Aug 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: `std::collection` C-feature-accepted Category: A feature request that has been accepted pending implementation. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests