Improve the copying code for slices and Vec #13539

Aatch · 2014-04-15T23:48:14Z

LLVM wasn't recognising the loops as memcpy loops and was therefore failing to optimise them properly. While improving LLVM is the "proper" way to fix this, I think that these cases are important enough to warrant a little low-level optimisation.

Fixes #13472

r? @thestinger

Benchmark Results:

--- Before ---
test clone_owned          ... bench:   6126104 ns/iter (+/- 285962) = 170 MB/s
test clone_owned_to_owned ... bench:   6125054 ns/iter (+/- 271197) = 170 MB/s
test clone_str            ... bench:     80586 ns/iter (+/- 11489) = 13011 MB/s
test clone_vec            ... bench:   3903220 ns/iter (+/- 658556) = 268 MB/s
test test_memcpy          ... bench:     69401 ns/iter (+/- 2168) = 15108 MB/s

--- After ---
test clone_owned          ... bench:     70839 ns/iter (+/- 4931) = 14801 MB/s
test clone_owned_to_owned ... bench:     70286 ns/iter (+/- 4836) = 14918 MB/s
test clone_str            ... bench:     78519 ns/iter (+/- 5511) = 13353 MB/s
test clone_vec            ... bench:     71415 ns/iter (+/- 1999) = 14682 MB/s
test test_memcpy          ... bench:     70980 ns/iter (+/- 2126) = 14772 MB/s

huonw · 2014-04-15T23:54:50Z

What happens if clone fails? It seems that destructors would run on uninit memory due to the early set_len call.

Aatch · 2014-04-16T00:08:29Z

@huonw heh, @thestinger was just telling me that on IRC. I'm fixing it now.

Aatch · 2014-04-16T04:12:42Z

Interestingly, with these changes, Vec::clone for primitive types (u8, i32 etc.) lets LLVM do vectorisation and produces 128-bit loads/stores normally and 256-bit loads/stores if you pass -C target-cpu=core-avx2 to rustc.

huonw · 2014-04-16T09:12:01Z

I wonder if we could take a more general approach and apply this to the FromIterator impl, benefiting all .collect() -> Vec calls. (e.g. reserve the bottom end of the size_hint and then use the fast loop up until that, .pushing any additional values in a separate loop.)

(@eddyb and I discussed this on IRC, but neither of us wrote any code to experiment with it yet.)

@thestinger

LLVM wasn't recognising the loops as memcpy loops and was therefore failing to optimise them properly. While improving LLVM is the "proper" way to fix this, I think that these cases are important enough to warrant a little low-level optimisation. Fixes #13472 r? @thestinger --- Benchmark Results: ``` --- Before --- test clone_owned ... bench: 6126104 ns/iter (+/- 285962) = 170 MB/s test clone_owned_to_owned ... bench: 6125054 ns/iter (+/- 271197) = 170 MB/s test clone_str ... bench: 80586 ns/iter (+/- 11489) = 13011 MB/s test clone_vec ... bench: 3903220 ns/iter (+/- 658556) = 268 MB/s test test_memcpy ... bench: 69401 ns/iter (+/- 2168) = 15108 MB/s --- After --- test clone_owned ... bench: 70839 ns/iter (+/- 4931) = 14801 MB/s test clone_owned_to_owned ... bench: 70286 ns/iter (+/- 4836) = 14918 MB/s test clone_str ... bench: 78519 ns/iter (+/- 5511) = 13353 MB/s test clone_vec ... bench: 71415 ns/iter (+/- 1999) = 14682 MB/s test test_memcpy ... bench: 70980 ns/iter (+/- 2126) = 14772 MB/s ```

Allow to go through clippy lints page without javascript Fixes rust-lang#13536. This is the follow-up of rust-lang/rust-clippy#13269. This PR makes it possible to expand/collapse lints (individually) without JS. To achieve this result, there are two ways: 1. Use `details` and `summary` tags. Problem with this approach is that the web browser search may open the `details` tags automatically if content matching it is inside. From a previous discussion with `@Alexendoo,` it seems to not be a desired behaviour. 2. Use a little trick where you use a `label` and a checkbox where the checkbox is in fact hidden. Then it's just a matter of CSS. r? `@Alexendoo` changelog: Allow to go through clippy lints page without JS

Improve the copying code for slices and Vec

42b3992

Make Vec::clone and slice::to_owned failure-safe

be334d5

bors closed this Apr 16, 2014

bors merged commit be334d5 into rust-lang:master Apr 16, 2014

Aatch deleted the vector-copy-faster branch April 16, 2014 22:29

alexcrichton mentioned this pull request Apr 17, 2014

&[u8].to_owned() does not compile to a memcpy #11015

Closed

schomatis mentioned this pull request Feb 7, 2019

zigzag: cache optimizations filecoin-project/rust-fil-proofs#465

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the copying code for slices and Vec #13539

Improve the copying code for slices and Vec #13539

Aatch commented Apr 15, 2014

huonw commented Apr 15, 2014

Aatch commented Apr 16, 2014

Aatch commented Apr 16, 2014

huonw commented Apr 16, 2014

Improve the copying code for slices and Vec #13539

Improve the copying code for slices and Vec #13539

Conversation

Aatch commented Apr 15, 2014

huonw commented Apr 15, 2014

Aatch commented Apr 16, 2014

Aatch commented Apr 16, 2014

huonw commented Apr 16, 2014