Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add with_exact_size and with_size_hint to Iterator #68995

Open
LukasKalbertodt opened this issue Feb 9, 2020 · 5 comments
Open

Add with_exact_size and with_size_hint to Iterator #68995

LukasKalbertodt opened this issue Feb 9, 2020 · 5 comments
Labels
A-iterators Area: Iterators C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@LukasKalbertodt
Copy link
Member

Usually iterator adaptors do a fine job at keeping information about the length of an iterator. But once you throw something like filter in your iterator chain, the lower bound is 0, meaning that collecting into a collection can't properly pre-allocate. Also, some iterator adaptors (like chain) cannot implemented ExactSizeIterator although in many real world cases, the size is perfectly known.

Currently, working around those limitations is often tedious and requires quite a bit of additional code. I think it would be a lot nicer to just add a .with_exact_size(27) somewhere into your chain. So I'd like to propose adding the following to Iterator:

  • fn with_size_hint(self, lower: usize, upper: Option<usize>) -> SizeHint
    • SizeHint would implement Iterator::size_hint with the given values, forwarding all other methods to the inner iterator (and implementing the same traits)
  • fn with_exact_size(self, size: usize) -> ExactSize
    • ExactSize would implement ExactSizeIterator with the given value and override Iterator::size_hint. All other methods are forwarded to the inner iterator (and implementing the same traits, plus ExactSizeIterator)

I would have created this as a PR, but I'm short on time and wanted to hear some opinions first.

@LukasKalbertodt LukasKalbertodt added C-feature-request Category: A feature request, i.e: not implemented / a PR. A-iterators Area: Iterators labels Feb 9, 2020
@jonas-schievink jonas-schievink added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Feb 9, 2020
@hexane360
Copy link

Should ExactSize panic if the underlying iterator fails to yield exactly size values? Should SizeHint panic if the underlying iterator fails to yield at least lower values?

@Centril
Copy link
Contributor

Centril commented Feb 10, 2020

I recall suggesting this to @scottmcm, who was not a fan, in a private setting at one point.
cc also @jswrenn re. rust-itertools/itertools#341.

@LukasKalbertodt
Copy link
Member Author

Should ExactSize panic if the underlying iterator fails to yield exactly size values? Should SizeHint panic if the underlying iterator fails to yield at least lower values?

I initially would have said: no, they should not panic. len() says it has the same safety guarantees as size_hint and size_hint stresses that it's basically just for optimization and nothing bad should happen if the returned size is incorrect.

But thinking about it again... maybe checking this might not actually incur any notable performance overhead and be useful. Lets consider the implementation for ExactSize:

struct ExactSize<I> {
    inner: I,
    size: usize,
}

impl<I: Iterator> Iterator for ExactSize<I> {
    type Item = I::Item;

    fn next(&mut self) -> Option<Self::Item> {
        match self.inner.next() {
            None => None,
            Some(x) => {
                self.size -= 1;
                Some(x)
            }
        }
    }
    
    fn size_hint(&self) -> (usize, Option<usize>) {
        (self.size, Some(self.size))
    }
}

This implementation does not contain those checks. Well, in debug mode, the integer underflow is checked. Which is good, as an underflow would usually result in "no you can't allocate a gazillion GB". This would be next() with those checks:

match self.inner.next() {
    None => {
        assert_eq!(self.size, 0);
        None
    }
    Some(x) => {
        self.size = self.size.checked_sub(1).unwrap();
        Some(x)
    }
}

The None case is a cold path, so the additional check there probably doesn't matter. The checked_sub in the Some branch might be fine, too.

Well, this is certainly something we should benchmark and test before stabilization.

@the8472
Copy link
Member

the8472 commented Feb 10, 2020

For maximal optimization you'd need some sort of unsafe fn set_trusted_len() that provides a TrustedLen wrapper.

@scottmcm
Copy link
Member

scottmcm commented Feb 20, 2020

size_hint is pretty situational, and it turns out that it's half-unused anyway. And while .with_size_hint(low, high) is of course safe, it's also really easy to get into a garbage-in-garbage-out situation with because an iterator doesn't need to work if the size_hint is wrong -- which is easy to happen if you do something like "the predicate in my filter is almost always true so I'll just hint that it's the same size as the input". Additionally, SizeHint<I> cannot propagate TrustedLen, so something like (0..100).with_exact_size(100) looks like you're being helpful but actually makes things worse.

So I've been more and more thinking that what's really needed is a different iterator API that's allowed to be wrong, but where being wrong just over- or under-allocates, rather than permitting incorrect behaviour. Then things like ResultShunt can say "you should allocate enough for everything", even though it might return fewer things if there's an Err. And if it's just one number, it could just be usize::saturating_added for Chain or whatever, rather than the current more-complicated dance. It has a useful default implementation of self.size_hint().0 as well, so we could move collect to using it immediately without regressing third-party iterators.

(I also still like .collect_into(Vec::with_capacity(100)), or similar, from #45840.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-iterators Area: Iterators C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants