-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2351 from LukasKalbertodt/rfc-add-is-sorted
RFC: Add `is_sorted` to the standard library
- Loading branch information
Showing
1 changed file
with
302 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,302 @@ | ||
- Feature Name: `is_sorted` | ||
- Start Date: 2018-02-24 | ||
- RFC PR: [rust-lang/rfcs#2351](https://github.com/rust-lang/rfcs/pull/2351) | ||
- Rust Issue: [rust-lang/rust#53485](https://github.com/rust-lang/rust/issues/53485) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Add the methods `is_sorted`, `is_sorted_by` and `is_sorted_by_key` to `[T]`; | ||
add the methods `is_sorted` and `is_sorted_by` to `Iterator`. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
In quite a few situations, one needs to check whether a sequence of elements | ||
is sorted. The most important use cases are probably **unit tests** and | ||
**pre-/post-condition checks**. | ||
|
||
The lack of an `is_sorted()` function in Rust's standard library has led to | ||
[countless programmers implementing their own](https://github.com/search?l=Rust&q=%22fn+is_sorted%22&type=Code&utf8=%E2%9C%93). | ||
While it is possible to write a one-liner using iterators (e.g. | ||
`(0..arr.len() - 1).all(|i| arr[i] <= arr[i + 1])`¹), it is still unnecessary | ||
mental overhead while writing *and* reading the code. | ||
|
||
In [the corresponding issue on the main repository](https://github.com/rust-lang/rust/issues/44370) | ||
(from which a few comments are referenced) everyone seems to agree on the | ||
basic premise: we want such a function. | ||
|
||
Having `is_sorted()` and friends in the standard library would: | ||
- prevent people from spending time on writing their own, | ||
- improve readbility of the code by clearly showing the author's intent, | ||
- and encourage to write more unit tests and/or pre-/post-condition checks. | ||
|
||
Another proof of this functions' usefulness is the inclusion in the | ||
standard library of many other languages: | ||
C++'s [`std::is_sorted`](http://en.cppreference.com/w/cpp/algorithm/is_sorted), | ||
Go's [`sort.IsSorted`](https://golang.org/pkg/sort/#IsSorted), | ||
D's [`std.algorithm.sorting.is_sorted`](https://dlang.org/library/std/algorithm/sorting/is_sorted.html) | ||
and others. (Curiously, many (mostly) more high-level programming language – | ||
like Ruby, Javascript, Java, Haskell and Python – seem to lack such a function.) | ||
|
||
¹ In the initial version of this RFC, this code snippet contained a bug | ||
(`<` instead of `<=`). This subtle mistake happens very often: in this RFC, | ||
[in the discussion thread about this RFC](https://github.com/rust-lang/rfcs/pull/2351#issuecomment-370126518), | ||
in [this StackOverflow answer](https://stackoverflow.com/posts/51272639/revisions) | ||
and in many more places. Thus, avoiding this common bug is another good | ||
reason to add `is_sorted()`. | ||
|
||
## Fast Implementation via SIMD | ||
|
||
Lastly, it is possible to implement `is_sorted` for many common types with SIMD | ||
instructions which improves speed significantly. It is unlikely that many | ||
programmers will take the time to write SIMD code themselves, thus everyone | ||
would benefit if this rather difficult implementation work is done in the | ||
standard library. | ||
|
||
|
||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
Possible documentation of the two new methods of `Iterator` as well as | ||
`[T]::is_sorted_by_key`: | ||
|
||
> ```rust | ||
> fn is_sorted(self) -> bool | ||
> where | ||
> Self::Item: PartialOrd, | ||
> ``` | ||
> Checks if the elements of this iterator are sorted. | ||
> | ||
> That is, for each element `a` and its following element `b`, `a <= b` | ||
> must hold. If the iterator yields exactly zero or one element, `true` | ||
> is returned. | ||
> | ||
> Note that if `Self::Item` is only `PartialOrd`, but not `Ord`, the above | ||
> definition implies that this function returns `false` if any two | ||
> consecutive items are not comparable. | ||
> | ||
> ## Example | ||
> | ||
> ```rust | ||
> assert!([1, 2, 2, 9].iter().is_sorted()); | ||
> assert!(![1, 3, 2, 4).iter().is_sorted()); | ||
> assert!([0].iter().is_sorted()); | ||
> assert!(std::iter::empty::<i32>().is_sorted()); | ||
> assert!(![0.0, 1.0, std::f32::NAN].iter().is_sorted()); | ||
> ``` | ||
> --- | ||
> | ||
> ```rust | ||
> fn is_sorted_by<F>(self, compare: F) -> bool | ||
> where | ||
> F: FnMut(&Self::Item, &Self::Item) -> Option<Ordering>, | ||
> ``` | ||
> Checks if the elements of this iterator are sorted using the given | ||
> comparator function. | ||
> | ||
> Instead of using `PartialOrd::partial_cmp`, this function uses the given | ||
> `compare` function to determine the ordering of two elements. Apart from | ||
> that, it's equivalent to `is_sorted`; see its documentation for more | ||
> information. | ||
> | ||
> --- | ||
> | ||
> (*for `[T]`*) | ||
> | ||
> ```rust | ||
> fn is_sorted_by_key<F, K>(&self, f: F) -> bool | ||
> where | ||
> F: FnMut(&T) -> K, | ||
> K: PartialOrd, | ||
> ``` | ||
> Checks if the elements of this slice are sorted using the given | ||
> key extraction function. | ||
> | ||
> Instead of comparing the slice's elements directly, this function | ||
> compares the keys of the elements, as determined by `f`. Apart from | ||
> that, it's equivalent to `is_sorted`; see its documentation for more | ||
> information. | ||
> | ||
> ## Example | ||
> | ||
> ```rust | ||
> assert!(["c", "bb", "aaa"].is_sorted_by_key(|s| s.len())); | ||
> assert!(![-2i32, -1, 0, 3].is_sorted_by_key(|n| n.abs())); | ||
> ``` | ||
The methods `[T]::is_sorted` and `[T]::is_sorted_by` will have analogous | ||
documentations to the ones shown above. | ||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
This RFC proposes to add the following methods to `[T]` (slices) and | ||
`Iterator`: | ||
```rust | ||
impl<T> [T] { | ||
fn is_sorted(&self) -> bool | ||
where | ||
T: PartialOrd, | ||
{ ... } | ||
fn is_sorted_by<F>(&self, compare: F) -> bool | ||
where | ||
F: FnMut(&T, &T) -> Option<Ordering>, | ||
{ ... } | ||
fn is_sorted_by_key<F, K>(&self, f: F) -> bool | ||
where | ||
F: FnMut(&T) -> K, | ||
K: PartialOrd, | ||
{ ... } | ||
} | ||
trait Iterator { | ||
fn is_sorted(self) -> bool | ||
where | ||
Self::Item: PartialOrd, | ||
{ ... } | ||
fn is_sorted_by<F>(mut self, compare: F) -> bool | ||
where | ||
F: FnMut(&Self::Item, &Self::Item) -> Option<Ordering>, | ||
{ ... } | ||
} | ||
``` | ||
In addition to the changes shown above, the three methods added to `[T]` should | ||
also be added to `core::slice::SliceExt` as they don't require heap | ||
allocations. | ||
|
||
To repeat the exact semantics from the prior section: the methods return | ||
`true` if and only if for each element `a` and its following element `b`, the | ||
condition `a <= b` holds. For slices/iterators with zero or one element, | ||
`true` is returned. For elements which implement `PartialOrd`, but not `Ord`, | ||
the function returns `false` if any two consecutive elements are not | ||
comparable (this is an implication of the `a <= b` condition from above). | ||
|
||
A sample implementation can be found | ||
[here](https://play.rust-lang.org/?gist=431ff42fe8ba5980fcf9250c8bc4492b&version=stable). | ||
|
||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
It increases the size of the standard library by a tiny bit. | ||
|
||
# Rationale and alternatives | ||
[alternatives]: #alternatives | ||
|
||
### Only add the methods to `Iterator`, but not to `[T]` | ||
Without `is_sorted()` defined for slices directly, one can still fairly easily | ||
test if a slice is sorted by obtaining an iterator via `iter()`. So instead of | ||
`v.is_sorted()`, one would need to write `v.iter().is_sorted()`. | ||
|
||
This always works for `is_sorted()` because of the `PartialOrd` blanket impl | ||
which implements `PartialOrd` for all references to an `PartialOrd` type. For | ||
`is_sorted_by` it would introduce an additional reference to the closures' | ||
arguments (i.e. `v.iter().is_sorted_by(|a, b| ...))` where `a` and `b` are | ||
`&&T`). | ||
|
||
While these two inconveniences are not deal-breakers, being able to call those | ||
three methods on slices (and all `Deref<Target=[T]>` types) directly, could be | ||
favourable for many programmers (especially given the popularity of slice-like | ||
data structures, like `Vec<T>`). Additionally, the `sort` method and friends | ||
are defined for slices, thus one might expect the `is_sorted()` method there, | ||
too. | ||
|
||
|
||
### Add the three methods to additional data structures (like `LinkedList`) as well | ||
Adding these methods to every data structure in the standard libary is a lot of | ||
duplicate code. Optimally, we would have a trait that represents sequential | ||
data structures and would only add `is_sorted` and friends to said trait. We | ||
don't have such a trait as of now; so `Iterator` is the next best thing. Slices | ||
deserve special treatment due to the reasons mentioned above (popularity and | ||
`sort()`). | ||
|
||
|
||
### `Iterator::while_sorted`, `is_sorted_until`, `sorted_prefix`, `num_sorted`, ... | ||
[In the issue on the main repository](https://github.com/rust-lang/rust/issues/44370), | ||
concerns about completely consuming the iterator were raised. Some alternatives, | ||
such as [`while_sorted`](https://github.com/rust-lang/rust/issues/44370#issuecomment-327873139), | ||
were suggested. However, consuming the iterator is neither uncommon nor a | ||
problem. Methods like `count()`, `max()` and many more consume the iterator, | ||
too. [One comment](https://github.com/rust-lang/rust/issues/44370#issuecomment-344516366) mentions: | ||
|
||
> I am a bit skeptical of the equivalent on Iterator just because the return | ||
> value does not seem actionable -- you aren't going to "sort" the iterator | ||
> after you find out it is not already sorted. What are some use cases for this | ||
> in real code that does not involve iterating over a slice? | ||
As mentioned above, `Iterator` is the next best thing to a trait representing | ||
sequential data structures. So to check if a `LinkedList`, `VecDeque` or | ||
another sequential data structure is sorted, one would simply call | ||
`collection.iter().is_sorted()`. It's likely that this is the main usage for | ||
`Iterator`'s `is_sorted` methods. Additionally, code like | ||
`if v.is_sorted() { v.sort(); }` is not very useful: `sort()` already runs in | ||
O(n) for already sorted arrays. | ||
|
||
Suggestions like `is_sorted_until` are not really useful either: one can easily | ||
get a subslice or a part of an iterator (via `.take()`) and call `is_sorted()` | ||
on that part. | ||
|
||
|
||
# Unresolved questions | ||
[unresolved]: #unresolved-questions | ||
|
||
|
||
### Should `Iterator::is_sorted_by_key` be added as well? | ||
|
||
This RFC proposes to add `is_sorted_by_key` only to `[T]` but not to | ||
`Iterator`. The latter addition wouldn't be too useful since once could easily | ||
achieve the same effect as `.is_sorted_by_key(...)` by calling | ||
`.map(...).is_sorted()`. It might still be favourable to include said function | ||
for consistency and ease of use. The standard library already hosts a number of | ||
sorting-related functions all of which come in three flavours: *raw*, `_by` and | ||
`_by_key`. By now, programmers could expect there to be an `is_sorted_by_key` | ||
as well. | ||
|
||
|
||
### Add `std::cmp::is_sorted` instead | ||
|
||
As suggested [here](https://github.com/rust-lang/rust/issues/44370#issuecomment-345495831), | ||
one could also add this free function (plus the `_by` and `_by_key` versions) | ||
to `std::cmp`: | ||
|
||
```rust | ||
fn is_sorted<C>(collection: C) -> bool | ||
where | ||
C: IntoIterator, | ||
C::Item: Ord, | ||
``` | ||
|
||
This can be seen as a better design as it avoids the question about which data | ||
structure should get `is_sorted` methods. However, it might have the | ||
disadvantage of being less discoverable and also less convenient (long path or | ||
import). | ||
|
||
|
||
### Require `Ord` instead of only `PartialOrd` | ||
|
||
As proposed in this RFC, `is_sorted` only requires its elements to be | ||
`PartialOrd`. If two non-comparable elements are encountered, `false` is | ||
returned. This is probably the only useful way to define the function for | ||
partially orderable elements. | ||
|
||
While it's convenient to call `is_sorted()` on slices containing only | ||
partially orderable elements (like floats), we might want to use the stronger | ||
`Ord` bound: | ||
|
||
- Firstly, for most programmers it's probably not *immediately* clear how the | ||
function is defined for partially ordered elements (the documentation should | ||
be sufficient as explanation, though). | ||
- Secondly, being able to call `is_sorted` on something will probably make | ||
most programmers think, that calling `sort` on the same thing is possible, | ||
too. Having different bounds for `is_sorted` and `sort` thus might lead to | ||
confusion. | ||
- Lastly, the `is_sorted_by` function currently uses a closure which returns | ||
`Option<Ordering>`. This differs from the closure for `sort_by` and looks a | ||
bit more complicated than necessary for most cases. |