Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the copying code for slices and Vec #13539

Merged
merged 2 commits into from
Apr 16, 2014

Conversation

Aatch
Copy link
Contributor

@Aatch Aatch commented Apr 15, 2014

LLVM wasn't recognising the loops as memcpy loops and was therefore failing to optimise them properly. While improving LLVM is the "proper" way to fix this, I think that these cases are important enough to warrant a little low-level optimisation.

Fixes #13472

r? @thestinger


Benchmark Results:

--- Before ---
test clone_owned          ... bench:   6126104 ns/iter (+/- 285962) = 170 MB/s
test clone_owned_to_owned ... bench:   6125054 ns/iter (+/- 271197) = 170 MB/s
test clone_str            ... bench:     80586 ns/iter (+/- 11489) = 13011 MB/s
test clone_vec            ... bench:   3903220 ns/iter (+/- 658556) = 268 MB/s
test test_memcpy          ... bench:     69401 ns/iter (+/- 2168) = 15108 MB/s

--- After ---
test clone_owned          ... bench:     70839 ns/iter (+/- 4931) = 14801 MB/s
test clone_owned_to_owned ... bench:     70286 ns/iter (+/- 4836) = 14918 MB/s
test clone_str            ... bench:     78519 ns/iter (+/- 5511) = 13353 MB/s
test clone_vec            ... bench:     71415 ns/iter (+/- 1999) = 14682 MB/s
test test_memcpy          ... bench:     70980 ns/iter (+/- 2126) = 14772 MB/s

@huonw
Copy link
Member

huonw commented Apr 15, 2014

What happens if clone fails? It seems that destructors would run on uninit memory due to the early set_len call.

@Aatch
Copy link
Contributor Author

Aatch commented Apr 16, 2014

@huonw heh, @thestinger was just telling me that on IRC. I'm fixing it now.

@Aatch
Copy link
Contributor Author

Aatch commented Apr 16, 2014

Interestingly, with these changes, Vec::clone for primitive types (u8, i32 etc.) lets LLVM do vectorisation and produces 128-bit loads/stores normally and 256-bit loads/stores if you pass -C target-cpu=core-avx2 to rustc.

@huonw
Copy link
Member

huonw commented Apr 16, 2014

I wonder if we could take a more general approach and apply this to the FromIterator impl, benefiting all .collect() -> Vec calls. (e.g. reserve the bottom end of the size_hint and then use the fast loop up until that, .pushing any additional values in a separate loop.)

(@eddyb and I discussed this on IRC, but neither of us wrote any code to experiment with it yet.)

bors added a commit that referenced this pull request Apr 16, 2014
LLVM wasn't recognising the loops as memcpy loops and was therefore failing to optimise them properly. While improving LLVM is the "proper" way to fix this, I think that these cases are important enough to warrant a little low-level optimisation.

Fixes #13472 

r? @thestinger 

---

Benchmark Results:

```
--- Before ---
test clone_owned          ... bench:   6126104 ns/iter (+/- 285962) = 170 MB/s
test clone_owned_to_owned ... bench:   6125054 ns/iter (+/- 271197) = 170 MB/s
test clone_str            ... bench:     80586 ns/iter (+/- 11489) = 13011 MB/s
test clone_vec            ... bench:   3903220 ns/iter (+/- 658556) = 268 MB/s
test test_memcpy          ... bench:     69401 ns/iter (+/- 2168) = 15108 MB/s

--- After ---
test clone_owned          ... bench:     70839 ns/iter (+/- 4931) = 14801 MB/s
test clone_owned_to_owned ... bench:     70286 ns/iter (+/- 4836) = 14918 MB/s
test clone_str            ... bench:     78519 ns/iter (+/- 5511) = 13353 MB/s
test clone_vec            ... bench:     71415 ns/iter (+/- 1999) = 14682 MB/s
test test_memcpy          ... bench:     70980 ns/iter (+/- 2126) = 14772 MB/s
```
@bors bors closed this Apr 16, 2014
@bors bors merged commit be334d5 into rust-lang:master Apr 16, 2014
@Aatch Aatch deleted the vector-copy-faster branch April 16, 2014 22:29
flip1995 pushed a commit to flip1995/rust that referenced this pull request Nov 7, 2024
Allow to go through clippy lints page without javascript

Fixes rust-lang#13536.

This is the follow-up of rust-lang/rust-clippy#13269.

This PR makes it possible to expand/collapse lints (individually) without JS. To achieve this result, there are two ways:
1. Use `details` and `summary` tags. Problem with this approach is that the web browser search may open the `details` tags automatically if content matching it is inside. From a previous discussion with `@Alexendoo,` it seems to not be a desired behaviour.
2. Use a little trick where you use a `label` and a checkbox where the checkbox is in fact hidden. Then it's just a matter of CSS.

r? `@Alexendoo`

changelog: Allow to go through clippy lints page without JS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cloning a 1MB vector is 30x slower than cloning a 1MB ~str
4 participants