Add an overflow check in the Iter::next() impl for Range<_> to help with vectorization. #43595

oyvindln · 2017-08-01T17:48:05Z

This helps with vectorization in some cases, such as (0..u16::MAX).collect::<Vec>(),
as LLVM is able to change the loop condition to use equality instead of less than and should help with #43124. (See also my last comment there.) This PR makes collect on ranges of u16, i16, i8, and u8 significantly faster (at least on x86-64 and i686), and pretty close, though not quite equivalent to a manual unsafe implementation. 32 ( and 64-bit values on x86-64) bit values were already vectorized without this change, and they still are. This PR doesn't seem to help with 64-bit values on i686, as they still don't vectorize well compared to doing a manual loop.

I'm a bit unsure if this was the best way of implementing this, I tried to do it with as little changes as possible and avoided changing the step trait and the behavior in RangeFrom (I'll leave that for others like #43127 to discuss wider changes to the trait). I tried simply changing the comparison to self.start != self.end though that made the compiler segfault when compiling stage0, so I went with this method instead for now.

As for next_back(), reverse ranges seem to optimise properly already.

This helps with vectorization in some cases, such as (0..u16::MAX).collect::<Vec<u16>>(), as LLVM is able to change the loop condition to use equality instead of less than

rust-highfive · 2017-08-01T17:48:17Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @BurntSushi (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

oyvindln · 2017-08-01T18:16:31Z

A slight downside to this change is that it makes the function slightly slower in non-optimized builds due to the extra check.

scottmcm · 2017-08-03T07:47:30Z

If the problem here is that external iteration is slow, as it needs to recheck things, should vec's SpecExtend switch to using the new Iterator::for_each? Then the ranges could override that to have one reachability check, then the != loop...

oyvindln · 2017-08-03T21:07:12Z

@scottmcm I wasn't aware of Iterator::for:each but that sounds like a good idea in any case, maybe it will make it possible to add some check or something outside the loop that would help llvm to allow vectorization, or maybe doing a != loop would work in this case. I'm not entirely certain that's what causes trouble for llvm, but that was the main difference between u32 and u16 in the llvm output with vectorization disabled at least. (The segmentation fault that I got when trying that in .next is something else that may be worth investigating...).

I suppose using for_each may also be helpful for performance in non-optimized builds, depending on how closures work before optimizations. The change suggested in your pr (#43351) may also help a bit on this front, given how swap does a bunch of stuff that's not needed for the small types used in ranges, but I haven't done any tests on that yet. If we want somewhat fast iteration over ranges in non-optimized builds I think having some specializations on ranges primitive integers that avoids all the indirection would make sense. Anyhow, all this may be better suited for a later PR.

As for this PR, I still think it makes sense to have this change in .next for now though as it may help with vectorization in other cases as well, and I think a 20(+)x speedup in release mode is worth a 10-20% slowdown in non-optimized builds (that has a fair bit of potential for improvement later anyhow.) I might have a look at the other suggested improvements to vector extends in particular later, maybe we can also find out what is the cause of the remaining difference between this and the unsafe implementation. It's going to be a few days until I get home and can put in some larger amount of work on this anyhow.

oyvindln · 2017-08-03T22:30:46Z

I did a small test by creating a new Iterator type with the relevant parts of the Range struct, and using != as the condition does indeed do the same job as an overflow check but avoids having an extra check in non-optimized mode. This was provided the struct implemented TrustedLen like Range so that the specialization for those types is used. (Interestingly, when using <, not implementing TrustedLen made the collect function twice as fast.) So that could be used for Range::next() for a for_each or other specialization where one can be sure start hasn't been set higher than end (which is possible as the fields are public).

arielb1 · 2017-08-08T10:28:46Z

r? @aturon

aturon · 2017-08-08T22:58:11Z

@bors: r+

Thanks for this PR!

bors · 2017-08-08T22:58:12Z

📌 Commit 4bb9a8b has been approved by aturon

bors · 2017-08-09T01:30:08Z

⌛ Testing commit 4bb9a8b with merge 0f9317d...

Add an overflow check in the Iter::next() impl for Range<_> to help with vectorization. This helps with vectorization in some cases, such as (0..u16::MAX).collect::<Vec<u16>>(), as LLVM is able to change the loop condition to use equality instead of less than and should help with #43124. (See also my [last comment](#43124 (comment)) there.) This PR makes collect on ranges of u16, i16, i8, and u8 **significantly** faster (at least on x86-64 and i686), and pretty close, though not quite equivalent to a [manual unsafe implementation](https://is.gd/nkoecB). 32 ( and 64-bit values on x86-64) bit values were already vectorized without this change, and they still are. This PR doesn't seem to help with 64-bit values on i686, as they still don't vectorize well compared to doing a manual loop. I'm a bit unsure if this was the best way of implementing this, I tried to do it with as little changes as possible and avoided changing the step trait and the behavior in RangeFrom (I'll leave that for others like #43127 to discuss wider changes to the trait). I tried simply changing the comparison to `self.start != self.end` though that made the compiler segfault when compiling stage0, so I went with this method instead for now. As for `next_back()`, reverse ranges seem to optimise properly already.

bors · 2017-08-09T04:03:43Z

☀️ Test successful - status-appveyor, status-travis
Approved by: aturon
Pushing 0f9317d to master...

Add an overflow check in the Iter::next() impl for Range<_>

4bb9a8b

This helps with vectorization in some cases, such as (0..u16::MAX).collect::<Vec<u16>>(), as LLVM is able to change the loop condition to use equality instead of less than

rust-highfive assigned BurntSushi Aug 1, 2017

oyvindln changed the title ~~Add an overflow check in the Iter::next() impl for Range<_>~~ Add an overflow check in the Iter::next() impl for Range<_> to help with vectorization. Aug 1, 2017

aidanhs added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 3, 2017

arielb1 assigned aturon and unassigned BurntSushi Aug 8, 2017

bors mentioned this pull request Aug 9, 2017

Put intrinsics::unreachable on a possible path to stabilization #43750

Merged

bors merged commit 4bb9a8b into rust-lang:master Aug 9, 2017

arielb1 mentioned this pull request Nov 9, 2017

Can't optimize loop assignment into memset #45466

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an overflow check in the Iter::next() impl for Range<_> to help with vectorization. #43595

Add an overflow check in the Iter::next() impl for Range<_> to help with vectorization. #43595

oyvindln commented Aug 1, 2017 •

edited

Loading

rust-highfive commented Aug 1, 2017

oyvindln commented Aug 1, 2017

scottmcm commented Aug 3, 2017

oyvindln commented Aug 3, 2017 •

edited

Loading

oyvindln commented Aug 3, 2017 •

edited

Loading

arielb1 commented Aug 8, 2017

aturon commented Aug 8, 2017

bors commented Aug 8, 2017

bors commented Aug 9, 2017

bors commented Aug 9, 2017

Add an overflow check in the Iter::next() impl for Range<_> to help with vectorization. #43595

Add an overflow check in the Iter::next() impl for Range<_> to help with vectorization. #43595

Conversation

oyvindln commented Aug 1, 2017 • edited Loading

rust-highfive commented Aug 1, 2017

oyvindln commented Aug 1, 2017

scottmcm commented Aug 3, 2017

oyvindln commented Aug 3, 2017 • edited Loading

oyvindln commented Aug 3, 2017 • edited Loading

arielb1 commented Aug 8, 2017

aturon commented Aug 8, 2017

bors commented Aug 8, 2017

bors commented Aug 9, 2017

bors commented Aug 9, 2017

oyvindln commented Aug 1, 2017 •

edited

Loading

oyvindln commented Aug 3, 2017 •

edited

Loading

oyvindln commented Aug 3, 2017 •

edited

Loading