Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core::iter::Fuse Does Not Fuse Iterators #100518

Open
asquared31415 opened this issue Aug 14, 2022 · 22 comments
Open

core::iter::Fuse Does Not Fuse Iterators #100518

asquared31415 opened this issue Aug 14, 2022 · 22 comments
Labels
C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@asquared31415
Copy link
Contributor

asquared31415 commented Aug 14, 2022

This is acknowledged in Iterator::fuse with

fuse() may therefore behave incorrectly if the FusedIterator trait is improperly implemented.

But I believe this should not be the case. Fuse still incurs a memory overhead for all iterators, whether or not it specializes, and it's useless because it currently doesn't actually do anything in the general case.

Example code:

struct UwU<'a>(&'a (), bool);

impl<'a> Iterator for UwU<'a> {
    type Item = bool;
    fn next(&mut self) -> Option<Self::Item> {
        self.1 = !self.1;
        match self.1 {
            true => Some(true),
            false => None,
        }
    }
}

impl<'a> std::iter::FusedIterator for UwU<'a> { }

fn main() {
    let a = ();
    let mut owo = UwU(&a, true).fuse();
    assert_eq!(owo.next(), None);

    // this should ideally hold because we called `fuse`, and 
    // the first call was `None` but it returns `Some`
    assert_eq!(owo.next(), None);
}

It fails with

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `Some(true)`,
 right: `None`', /app/example.rs:23:5

Since rustc 1.26.0, when this type was added, through current nightly (2022-08-13).

@asquared31415 asquared31415 added the C-bug Category: This is a bug. label Aug 14, 2022
@scottmcm scottmcm added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Aug 14, 2022
@scottmcm
Copy link
Member

and it's useless because it currently doesn't actually do anything in the general case.

This is a massive overstatement. Agreed, you can't depend on it for the soundness of your unsafe { code }.

But that's normal. You can't depend on Eq for the soundness of your unsafe { code } either, but that doesn't make Eq useless. See this IRLO thread for a recent discussion: https://internals.rust-lang.org/t/documentation-must-vs-should/16967?u=scottmcm

And, practically, I don't think copy-pasting a big "you can't rely on this for soundness" section onto every single non-unsafe trait is productive. That's a general rule of Rust, so it belongs in the book or similar, not on every trait definition.

@asquared31415
Copy link
Contributor Author

This issue is specifically about the type core::iter::Fuse not behaving as it claims it does in its docs:

An iterator that yields None forever after the underlying iterator yields None once.

It would be like if Vec::push didn't actually push sometimes for certain T, or Rc not properly decrementing the reference count sometimes. These types would not be behaving as documented, and that's clearly a bug either in the docs or the implementation.

The only reason the FusedIterator trait comes up is because Fuse specializes on it, and when it does, it loses all guarantees about behavior.

@the8472
Copy link
Member

the8472 commented Aug 14, 2022

That simply means Fuse relies on/delegates to the API contract of FusedIterator in some cases. APIs build on each other all the time.

It would be like if Vec::push didn't actually push sometimes for certain T

I mean if you have a Vec<T, A> with a magic allocator that returns FUSE-backed memory-mapped memory that occasionally makes your values vanish (i.e. breaks the Alloc API contract), then yeah Vec will misbehave.

and it's useless because it currently doesn't actually do anything in the general case.

Fuse is useful because it gives generic code that needs a fused iterator to take an iterator that already is self-fusing or fuse it through that API. It glues FusedIterator behavior onto the iterator. With the same guarantees you always (or never) had from FusedIterator in the first place.

@asquared31415
Copy link
Contributor Author

asquared31415 commented Aug 14, 2022

Fuse is useful because it gives generic code that needs a fused iterator to take an iterator that already is self-fusing or fuse it through that API.

This is not true, the API doesn't actually do anything for generic code that needs a fused iterator, despite Fuse being a std type that claims it does. The best case is when an iterator does not implement FusedIterator or is a cooperative impl, when it actually fuses, but the worst case is that it doesn't fuse at all when FusedIterator is implemented when it does not hold.

The fact that not implementing the marker trait can be more useful than implementing it badly is a bug, is extremely unintuitive behavior, and because generic code needs to be able to handle the worst case always, makes this type useless.

I don't see why Fuse can't be changed to actually work 100% of the time, like it's docs say? Surely that makes it actually useful, since it's always a memory overhead, but as is it can't work for generic code.

The main problem is that Fuse does not behave as its docs claim. Either the docs are incorrect, and they need to say "do not use Fuse if you actually need a fused iterator in generic code", or the implementation needs to be fixed.

@the8472
Copy link
Member

the8472 commented Aug 14, 2022

This is not true, the API doesn't actually do anything for generic code that needs a fused iterator,

The generic code takes any iter: impl Iterator and then calls iter.fuse(). It now has an iterator that implements FusedIterator. That is all it does. You get the FusedIterator guarantees. And those guarantees always come with the caveat "well, someone might have implemented this incorrectly". We call those bugs and they need to be fixed. This caveat also applies to unsafe traits.

safe trait: someone implements this incorrectly -> it'll OOM, make your rockets explode mid-flight, irradiate people, drain your bank account or make planes fall out of the sky
unsafe trait: someone implements this incorrectly -> it'll OOM, make your rockets explode mid-flight, irradiate people, drain your bank account, make planes fall out of the sky, cause miscompilations or memory corruption

API contract violations are forbidden. The only difference is which sets of consequences a violation may have if someone does the forbidden thing.

@asquared31415
Copy link
Contributor Author

asquared31415 commented Aug 14, 2022

The return type is a core::iter::Fuse, which claims it properly fuses, not an opaque impl FusedIterator. Fuse has additional logic to not actually fuse when the wrapped type implements FusedIterator. Fuse is the std type that is behaving incorrectly, because it does not require FusedIterator nor does it say that it "might sometimes fuse", it says

An iterator that yields None forever after the underlying iterator yields None once.

which is strictly not true. No caveats for bad impls are talked about, no exceptions are made, it is not a generic API taking FusedIterator it is a concrete type that behaves poorly. There is no API contract on Fuse for user code to violate, it claims to always work.

@asquared31415
Copy link
Contributor Author

I mean if you have a Vec<T, A> with a magic allocator that returns FUSE-backed memory-mapped memory that occasionally makes your values vanish (i.e. breaks the Alloc API contract), then yeah Vec will misbehave.

Additionally this example does not apply, because there is a contract on the allocator for Vec, violating that would be wrong, obviously. But there's no such contract for Fuse, no generic traits in its API, nothing. It's a direct result of specialization that is hidden from the user.

@asquared31415
Copy link
Contributor Author

Also even if Fuse is technically correct, why would we want it to be intentionally malicious? As far as I am aware, it very well could be correct always, and it would be a simpler implementation, and incur no major additional costs. Being as well behaved as possible under sub-optimal circumstances is a positive. There's many places library types or the compiler could do a lot more, but choose to be benevolent because why would a user want code to actively fight them for at best a minor improvement.

@the8472
Copy link
Member

the8472 commented Aug 14, 2022

No caveats for bad impls are talked about

APIs are not required to state "does misbehave if traits aren't upheld". Adding those notes is a courtesy, usually added when multiple people tripped over a requirement. The absence of such statements is not a guarantee in itself. Otherwise we would be forced to document implementation details (this calls X, Y and Z and relies on their behavior) on even the most trivial things.

E.g. iterator adapters don't tell you either which methods they forward. The default behavior is phrased in terms of next but often fold uses the inner's fold. So even if the inner's next is correct but the fold is buggy then so will be the outer fold. But we don't document the exact behavior because that might change.

If something is buggy somewhere then bugs will cascade until it hits something that is more defensive than whatever they cascaded through. GIGO, garbage in - garbage out.

why would we want it to be intentionally malicious?

Why are you assuming malice? This is a performance optimization that eliminates a branch inside loops which in turn can be vital for auto-vectorization.

The return type is a core::iter::Fuse, which claims it properly fuses, not an opaque impl FusedIterator.

They offer the same guarantees, so one can be substituted for the other.

@asquared31415
Copy link
Contributor Author

asquared31415 commented Aug 14, 2022

They offer the same guarantees, so one can be substituted for the other.

If they offer the same guarantees, at least one has incorrect docs, because a safe marker trait offers no guarantees, but Fuse's docs claim to have some specified behavior.

@asquared31415
Copy link
Contributor Author

Additionally, after some (admittedly limited) testing, I have not found a case where the specialization on FusedIterator improved codegen. It either was equivalent when the iterator was actually fused, or regressed codegen because it did not actually fuse.

@the8472
Copy link
Member

the8472 commented Aug 14, 2022

std::iter::FusedIterator
An iterator that always continues to yield None when exhausted.

fn fuse(self) -> Fuse
Creates an iterator which ends after the first None.

Struct std::iter::Fuse
An iterator that yields None forever after the underlying iterator yields None once.

They all promise the same thing, phrased slightly differently.

@the8472
Copy link
Member

the8472 commented Aug 15, 2022

Ah, I realize I was arguing from the perspective of safe code. For unsafe code the situation changes somewhat. At least the standard library knows its own implementation and so knows which safe parts unsafe code can trust. The situation is less clear for 3rd party unsafe code because it's not allowed to rely on implementation details.

@5225225
Copy link
Contributor

5225225 commented Aug 15, 2022

Iterator::fuse() does say

Note that the Fuse wrapper is a no-op on iterators that implement the FusedIterator trait. fuse() may therefore behave incorrectly if the FusedIterator trait is improperly implemented.

(And by extension, Fuse acts like that too, sine fuse() is the way to get a Fuse)

Which means if you actually need a fused I: Iterator for safety, .fuse() is not useful to you, you'll presumably need to just write your own Fuse. There's no .fuse_but_usable_for_safety().

If we did want .fuse() on Iterator to always fuse, then we'd presumably need to get rid of the specialisation on FusedIterator, and possibly deprecate FusedIterator, introduce an (internal?) unsafe trait TrustedFusedIterator and implement that for Fuse<T>, and maybe our other iterators.

To be honest, it is somewhat surprising to me that you can't trust .fuse(), since that does return a stdlib type. I don't know if any unsafe code is relying on .fuse() to actually fuse (on a generic iterator), but I wouldn't be surprised if there was, I just don't know a good way to find it.


Do we have a good way to measure how much this specialisation is saving us? Is it helpful only in the case of .fuse().fuse(), only in the case of stdlib iterators (where we can specialise on a private unsafe trait), or does it need to exist even for external types that implement FusedIterator?

We have perf runs, but that won't help measure the cost of a change on The Ecosystem.

@the8472
Copy link
Member

the8472 commented Aug 15, 2022

Flatten and FlatMap rely on Fused for correctness (but not for safety afaict). So changing it could affect performance of any flattening iterator.

@scottmcm
Copy link
Member

Note that if you need an iterator to be forced fused for certain, you can always use

struct DefinitelyNotFused<I>(I);
impl<I: Iterator> Iterator for DefinitelyNotFused<I> {
    type Item = I::Item;
    fn next(&mut self) -> Option<Self::Item> { self.0.next() }
}

And do DefinitelyNotFused(it).fuse() instead of just it.fuse().

That's really short because it doesn't forward anything but next, but if you can't trust FusedIterator implementations, I don't see why you'd be able to trust its any or fold impls either -- they might be wrong/inconsistent too.

Because of that, this isn't really a problem specific to .fuse(). You can't be sure that .skip(n) will skip n things either, since it currently uses nth, which might be implemented incorrectly.

@scottmcm scottmcm linked a pull request Sep 19, 2022 that will close this issue
@scottmcm scottmcm added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Sep 19, 2022
@scottmcm
Copy link
Member

Hello libs-api folks!

I had #100517 on me about this, which was closed, and now there's #102006 open for something different. But I think it needs an API decision for what, if anything, should be done, before I can do a review.

A couple of possibilities:

  • There's no reason to do anything here without a concrete example of this causing non-contrived problems in the wild.
  • Things are fine as they are, because of the general rule that unsafe code can't trust safe trait implementations.
  • There should be a separate .fuse_but_my_unsafe_code_relies_on_it_for_soundness() call for those that need it.
  • The optimization to pass-through for Fused should switch to depending on a new unsafe trait instead.
  • Something else.

@asquared31415
Copy link
Contributor Author

Sorry about that previous PR, that was not a productive solution.

What motivated me to make a PR actually making the behavior consistent was yet another complaint from someone that they found a case where they wanted a fused iterator from a generic iterator that may or may not actually fuse, but couldn’t use Iterator::fuse because it can’t be trusted, despite returning a type that is impossible to construct or modify outside of core. I don’t have a concrete example of code, but I believe they were working on some actual code as a library that would consume an iterator.

The fact that you must use a workaround like the above to get the desired behavior to me seems like it is a sign of a poorly designed API, and this is one of the rare cases where an std API can be fixed.

@asquared31415
Copy link
Contributor Author

In relation to the possible changes listed, #102006 currently uses a new unsafe trait for specialization that only iter::Fuse implements, to avoid setting the inner iterator to None at each level of nesting, since that seemed to be the primary complaint that the FusedIterator RFC aimed to solve.

@the8472
Copy link
Member

the8472 commented Sep 19, 2022

they found a case where they wanted a fused iterator from a generic iterator that may or may not actually fuse, but couldn’t use Iterator::fuse because it can’t be trusted

Were they writing unsafe code?

@m-ou-se
Copy link
Member

m-ou-se commented Sep 20, 2022

We discussed this in the libs-api meeting just now. Those of us present think we shouldn't change anything about this method and trait right now. The behaviour as expected (and documented), similar to how an invalid Hash implementation would break a HashMap.

However, we'd be open to seeing concrete examples that show the value of an unsafe 'TrustedFusedIterator' trait.

@BoxyUwU
Copy link
Member

BoxyUwU commented Sep 20, 2022

an unsafe TrustedFusedIterator is a sketchy idea, unless #102006 is merged it would make the existing unsound specialization even "more" unsound since now you'd be incorrectly assuming a type implements an unsafe trait instead of a safe one

@dtolnay dtolnay removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants