Collecting into a Result<Vec<_>> doesn't reserve the capacity in advance #48994

scurest · 2018-03-13T21:18:07Z

Couldn't find an issue for this and don't know if it counts but filing anyway.

If you have

fn foo(s: &[i8]) -> Vec<u8> {
    s.iter()
    .map(|&x| x as u8)
    .collect()
}

the SpecExtend machinery ensures that the vector has s.len() space reserved in advance. However if you change it to return a result

fn foo(s: &[i8]) -> Result<Vec<u8>> {
    s.iter()
    .map(|&x| if x < 0 { Err(...) } else { Ok(x as u8) } )
    .collect()
}

then (based on examining the LLVM IR and heaptracker's "Temporary" measurements) that optimization has quietly been lost.

This is technically correct in the sense that the first element yielded could be an Err of course (the size hint for the Adapter in Result's FromIterator impl has a lower bound of 0). But this pessimizes the good path to take more memory and be slower in favor of possibly saving memory on the bad one, which seems backwards.

Is there a specialization that could be added to fix this?

The text was updated successfully, but these errors were encountered:

scottmcm · 2018-03-14T03:50:22Z

I wonder if there's a more general tweak. It looks like collect only looks at the low end of the hint (https://doc.rust-lang.org/src/alloc/vec.rs.html#1804), so it feels like there ought to be a way to get some use out of the high end as well.

Strawman proposal:

Do the same as today if high is None
- Mostly to avoid trying to do anything with the default (0, None) "hint"
Otherwise reserve a rough approximation of √(low × high)
- So (0, 1GiB) pre-reserves 32KiB, for example
- Very cheap with leading_zeros and shifting
- Hopefully never so big as to be a problem (and could try_reserve to never be, I suppose)
At the end, shrink_to_fit if len() <= cap()/2
- So if there really is only a handful of elements, it doesn't waste too much space

Probably needs the benchmark set to decide whether it's constructive, though...

This shaves off about 20% of our maximum heap usage in my tests when loading many animations. There are other places where we collect into a Result, but since those vectors are smaller they make less of a difference. See rust-lang/rust#48994. Also did some readability edits nearby.

Calculate capacity when collecting into Option and Result I was browsing the [perf page](http://perf.rust-lang.org) to see the impact of my recent changes (e.g. rust-lang#52697) and I was surprised that some of the results were not as awesome as I expected. I dug some more and found an issue that is the probable culprit: [Collecting into a Result<Vec<_>> doesn't reserve the capacity in advance](rust-lang#48994). Collecting into `Option` or `Result` might result in an empty collection, but there is no reason why we shouldn't provide a non-zero lower bound when we know the `Iterator` we are collecting from doesn't contain any `None` or `Err`. We know this, because the `Adapter` iterator used in the `FromIterator` implementations for `Option` and `Result` registers if any `None` or `Err` are present in the `Iterator` in question; we can use this information and return a more accurate lower bound in case we know it won't be equal to zero. I [have benchmarked](https://gist.github.com/ljedrz/c2fcc19f6260976ae7a46ae47aa71fb5) collecting into `Option` and `Result` using the current implementation and one with the proposed changes; I have also benchmarked a push loop with a known capacity as a reference that should be slower than using `FromIterator` (i.e. `collect()`). The results are quite promising: ``` test bench_collect_to_option_new ... bench: 246 ns/iter (+/- 23) test bench_collect_to_option_old ... bench: 954 ns/iter (+/- 54) test bench_collect_to_result_new ... bench: 250 ns/iter (+/- 25) test bench_collect_to_result_old ... bench: 939 ns/iter (+/- 104) test bench_push_loop_to_option ... bench: 294 ns/iter (+/- 21) test bench_push_loop_to_result ... bench: 303 ns/iter (+/- 29) ``` Fixes rust-lang#48994.

@ljedrz

Try to guess a smarter initial capacity in Vec::from_iter > Another possibility is collect could look at the upper bound and be smarter about what capacity to use? ~ #45840 (comment) This is obviously good for hints like `(60, Some(61))` where today we allocate space for 60, then double it should the additional element show up, and it'd be much better to just always allocate 61. More nuanced are hints like `(0, Some(150))`, where today the code uses just the `0`, and thus starts at a capacity of `1`, but with this change will start at `10` instead. This can undeniably increase memory pressure over where it is today, so I expect at least some controversy 🙂 It does use `try_reserve` for the allocation that's more than the lower bound, and thus shouldn't introduce new aborts, at least. And the starting point grows by the root, keeping things fairly contained: even with an hint of `(0, Some(1_000_000_000))` it'll only start at `30_517`. cc @ljedrz cc #48994

[WIP] Calculate capacity when collecting into Option and Result I was browsing the [perf page](http://perf.rust-lang.org) to see the impact of my recent changes (e.g. #52697) and I was surprised that some of the results were not as awesome as I expected. I dug some more and found an issue that is the probable culprit: [Collecting into a Result<Vec<_>> doesn't reserve the capacity in advance](#48994). Collecting into `Option` or `Result` might result in an empty collection, but there is no reason why we shouldn't provide a non-zero lower bound when we know the `Iterator` we are collecting from doesn't contain any `None` or `Err`. We know this, because the `Adapter` iterator used in the `FromIterator` implementations for `Option` and `Result` registers if any `None` or `Err` are present in the `Iterator` in question; we can use this information and return a more accurate lower bound in case we know it won't be equal to zero. I [have benchmarked](https://gist.github.com/ljedrz/c2fcc19f6260976ae7a46ae47aa71fb5) collecting into `Option` and `Result` using the current implementation and one with the proposed changes; I have also benchmarked a push loop with a known capacity as a reference that should be slower than using `FromIterator` (i.e. `collect()`). The results are quite promising: ``` test bench_collect_to_option_new ... bench: 246 ns/iter (+/- 23) test bench_collect_to_option_old ... bench: 954 ns/iter (+/- 54) test bench_collect_to_result_new ... bench: 250 ns/iter (+/- 25) test bench_collect_to_result_old ... bench: 939 ns/iter (+/- 104) test bench_push_loop_to_option ... bench: 294 ns/iter (+/- 21) test bench_push_loop_to_result ... bench: 303 ns/iter (+/- 29) ``` Fixes #48994.

See rust-lang/rust#48994 for rationale.

JulianKnodt · 2020-08-02T03:50:19Z

I don't know if this issue is dead or not, but I was wondering if this could be solved by creating a newtype that optimistically allocates capacity for the vec, such as this?

…rom_iter_48994_2, r=Mark-Simulacrum Improve capacity estimation in Vec::from_iter Iterates on the attempt made in rust-lang#53086. Closes rust-lang#48994

dtolnay · 2022-04-16T02:44:13Z

This is not closed by #92138. That PR involves a totally different impl.

dylni · 2022-04-18T22:01:29Z

Thanks @dtolnay!

TennyZhuang · 2022-10-18T23:15:34Z

Paste my potential solution from
https://internals.rust-lang.org/t/an-optimistic-version-of-collect-result-t-e/17584/4, do we need an RFC for that?

I’d prefer to implement two versions, collect_optimistic and collect_pessimistic, which will use size_hint.0 and size_hint.1, and leave collect as an opaque implementation (even leave for PGO?). When in the critical path, users can choose the version manually.

The collect can use a specialized implementation for Result<I, E> that forward to optimistic version, and keep the original behavior for other types.

We can also introduce a new unsafe trait TrustedUpperBound for optimization.

scottmcm · 2022-10-19T00:56:23Z

My instinct is that the code doing the collect isn't the one that knows whether to call optimistic or pessimistic. It feels to me like filter_optimistic or filter_pessimistic is the place that's more likely to know whether it'd be better to use a small or large reservation in a later collect. And that way it can be passed to other things too.

stepancheg · 2023-02-16T02:20:14Z

Can someone advice how it can be implemented in Rust?

There are two possibilities:

Easy: add `expected_size_lower_bound` to `Iterator` trait

trait Iterator {
  // unsable, doc-hidden etc
  fn expected_size_lower_bound(&self) -> usize {
    self.size_hint().0
  }
}

And then patch GenericShunt Iterator implementation to override estimated_capacity_lower_bound:

rust/library/core/src/iter/adapters/mod.rs

Line 148 in c528357

pub(crate) struct GenericShunt<'a, I, R> {

And finally, patch Vec::from_iter to use expected_size_lower_bound instead of size_hint to reserve capacity.

Hard: use some specialization

Introduce a trait like:

trait ExpectedSize: Iterator {
  fn expected_size_lower_bound(&self) -> usize;
}

And then implement it for GenericShunt.

My specialization-fu is not good enough to implement a function like this:

trait ExpectedSize {
  fn expected_size_lower_bound(&self) -> usize;
}

impl<I: Iterator> ExpectedSize for I {
  fn expected_size_lower_bound(&self) -> usize {
    self.size_hint().0
  }
}

// And here I get compilation errors.
impl<I, R> ExpectedSize for GenericShunt<I, R> { ... }

Full example on playground, where I get either "conflicting implementations of trait" or "cannot specialize on trait" error depending on whether I add I: Iterator constraint on implementation.

Centril added I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Mar 16, 2018

XAMPPRocky added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label May 21, 2018

ljedrz mentioned this issue Jul 31, 2018

[WIP] Calculate capacity when collecting into Option and Result #52910

Closed

scottmcm mentioned this issue Aug 5, 2018

Try to guess a smarter initial capacity in Vec::from_iter #53086

Closed

ljedrz mentioned this issue Nov 5, 2018

Preallocate the vector containing predicates in decode_predicates #55534

Closed

kennytm added a commit to kennytm/dbgen that referenced this issue Jan 25, 2020

replace a collect<Result> call by direct loop.

fe590fa

See rust-lang/rust#48994 for rationale.

Dushistov mentioned this issue Mar 24, 2020

Use own proc macro to generate code to deal with packets ublox-rs/ublox#7

Merged

AngelicosPhosphoros mentioned this issue Dec 20, 2021

Improve capacity estimation in Vec::from_iter #92138

Merged

bors closed this as completed in ea570c6 Jan 20, 2022

Serial-ATA mentioned this issue Apr 9, 2022

Simplify String::from_utf16 #95853

Closed

dtolnay reopened this Apr 16, 2022

dtolnay added T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Apr 16, 2022

This was referenced Aug 15, 2022

Implement from_utf16le, from_utf16be, from_utf16ne ParkMyCar/compact_str#207

Merged

Add explicit-endian String::from_utf16 variants #95967

Merged

waruto210 mentioned this issue Oct 18, 2022

perf: improve deserialization of value encoding for struct/list risingwavelabs/risingwave#5897

Merged

1 task

CAD97 mentioned this issue Sep 29, 2023

Tracking Issue for explicit-endian String::from_utf16 #116258

Open

6 tasks

workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023

ryan-johnson-databricks mentioned this issue Jan 29, 2024

Leverage more engine capabilities in data skipping 2/n delta-io/delta-kernel-rs#83

Merged

BugenZhao mentioned this issue Jun 10, 2024

perf(source): reserve capacity when building struct fields risingwavelabs/risingwave#17170

Merged

9 tasks

dragonJACson mentioned this issue Sep 22, 2024

feat: introduce PostSendGuard for extended QP with basic support for polling extended CQ RDMA-Rust/sideway#22

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collecting into a Result<Vec<_>> doesn't reserve the capacity in advance #48994

Collecting into a Result<Vec<_>> doesn't reserve the capacity in advance #48994

scurest commented Mar 13, 2018 •

edited

Loading

scottmcm commented Mar 14, 2018 •

edited

Loading

JulianKnodt commented Aug 2, 2020

dtolnay commented Apr 16, 2022

dylni commented Apr 18, 2022

TennyZhuang commented Oct 18, 2022 •

edited

Loading

scottmcm commented Oct 19, 2022

stepancheg commented Feb 16, 2023

Collecting into a Result<Vec<_>> doesn't reserve the capacity in advance #48994

Collecting into a Result<Vec<_>> doesn't reserve the capacity in advance #48994

Comments

scurest commented Mar 13, 2018 • edited Loading

scottmcm commented Mar 14, 2018 • edited Loading

JulianKnodt commented Aug 2, 2020

dtolnay commented Apr 16, 2022

dylni commented Apr 18, 2022

TennyZhuang commented Oct 18, 2022 • edited Loading

scottmcm commented Oct 19, 2022

stepancheg commented Feb 16, 2023

Easy: add expected_size_lower_bound to Iterator trait

Hard: use some specialization

scurest commented Mar 13, 2018 •

edited

Loading

scottmcm commented Mar 14, 2018 •

edited

Loading

TennyZhuang commented Oct 18, 2022 •

edited

Loading

Easy: add `expected_size_lower_bound` to `Iterator` trait