Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: efficient product iterator #14596

Merged
merged 1 commit into from
Jan 26, 2016
Merged

RFC: efficient product iterator #14596

merged 1 commit into from
Jan 26, 2016

Conversation

JeffBezanson
Copy link
Member

This uses the same general trick as the zip iterator to get performance without generated functions. I'm sure it could be better, but it's at least reasonably fast:

julia> function f()
         @time for k=1:10;[ i+j for i=1:1000, j=1:1000 ];end
         @time for k=1:10;[ i+j for (i,j) in product(1:1000,1:1000) ];end
       end
f (generic function with 1 method)

julia> f();
  0.011721 seconds (20 allocations: 76.295 MB, 18.53% gc time)
  0.022262 seconds (20 allocations: 76.295 MB, 4.50% gc time)

julia> function f()
         @time for k=1:10;[ i+j+l for i=1:1000, j=1:100, l=1:10 ];end
         @time for k=1:10;[ i+j+l for (i,j,l) in product(1:1000,1:100,1:10) ];end
       end
f (generic function with 1 method)

julia> f();
  0.013009 seconds (20 allocations: 76.295 MB, 11.70% gc time)
  0.046552 seconds (20 allocations: 76.295 MB, 4.16% gc time)

julia> f();
  0.020044 seconds (20 allocations: 76.295 MB, 11.20% gc time)
  0.029772 seconds (20 allocations: 76.295 MB, 6.82% gc time)

The times are pretty variable but this looks something like a factor of 2.

This implements the same iteration order as comprehensions (first iterator is the innermost loop). Would anybody want lexicographic order in addition or instead?

@JeffBezanson
Copy link
Member Author

Also, I tried to be careful to make the same start/done/next calls that nested loops would. I think it's pretty close but it makes the code slightly more awkward than it could be. It's really quite amazing how many ways there are to write prod_next; I went through quite a few.

@timholy
Copy link
Member

timholy commented Jan 7, 2016

I suspect this would be easier, and probably faster, with #9182---you almost surely wouldn't need the whole Nullable business at all.

With the current architecture, is there anyway to avoid the Nullable? I haven't thought it through in detail, but I wonder about a Prod1 "inner" iterator that gets used by a Prod2 "outer" iterator.

s1, s2, Nullable{eltype(p.b)}(), (done(p.a,s1) || done(p.b,s2))
end

@inline function prod_next(p, st)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename as next and delete the definition below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used by both Prod and Prod2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that.

@JeffBezanson
Copy link
Member Author

I'm not sure that's true --- to avoid calling next repeatedly with the same state, you need to save the previous value in the state. But initially there is no value, so what do you put in the state returned by start?

@timholy
Copy link
Member

timholy commented Jan 7, 2016

You could call nextval repeatedly without calling nextstate. Then you're re-fetching the next value, and don't have to hold on to it.

@JeffBezanson
Copy link
Member Author

True, that would work for iterators where you want to avoid extra state changes, but it still depends on the iterator being able to return the same value again, so somebody has to store it.

@timholy
Copy link
Member

timholy commented Jan 8, 2016

I'm afraid I don't follow. Currently next has to be able to fetch a value and also increment a state. If you separate the two, I don't think you lose the ability to fetch the value. But I'm probably not thinking generally enough about different types of containers. Are you worried about a "streaming" container where, once fetched, the object might vanish? I suspect there's a way around that if one designs nextval to receive & pass a "value state" variable---and since you only ever call nextval if you know there is a value waiting, there's no need for a Nullable. But like I said, I haven't thought this through carefully.

It seems to me that with #9182, this would reduce to

start(p::Prod2) = (start(p.a), start(p.b))
nextval(p::Prod2, s) = nextval(p.a, s[1]), nextval(p.b, s[2])
nextstate(p::Prod2, s) = isdone(p.a, s[1]) ? (start(p.a), nextstate(p.b, s[2])) : (nextstate(p.a, s[1]), s[2])

Or, if nextval also needs to pass back a value-state, the middle one would become something like

nextval(p::Prod2, s, vs) = (v1,vs1 = nextval(p.a, s[1], vs[1]); v2,vs2 = nextval(p.b, s[2], vs[2]); (v1,v2), (vs1,vs2)```

@JeffBezanson
Copy link
Member Author

Are you worried about a "streaming" container where, once fetched, the object might vanish?

Yes.

If you allow repeated calls to next with the same state, then prod_next becomes basically as simple as those nextval+nextstate definitions. Uncooperative iterators could potentially support that by remembering one value and state and returning the value as long as the state is the same. Less elegant than nextval+nextstate probably, but approximately the same work as implementing the value-state argument.

@timholy
Copy link
Member

timholy commented Jan 8, 2016

approximately the same work as implementing the value-state argument

Wouldn't it just be

nextval(iter, state, valstate) = fetch_value(iter, state), 0

when you can look up a value (which is probably >90% of all cases?), and

function nextval(iter, state, valstate)
    state == valstate.state && return get(valstate.value), valstate
    nv = fetch_value(iter, state)
    nv, ValState(Nullable(nv), state)
end

for a streaming container? This restricts usage of Nullable to the cases where you need it.

The main point being that, once you've implemented "persistence" for any iterators that need it, building more complex iterators becomes a more composable operation.

@timholy
Copy link
Member

timholy commented Jan 8, 2016

But you're probably right that it's not much harder for an "uncooperative" iterator with the current architecture (EDIT: if we required that iterators support repeated calls to next).

@blakejohnson
Copy link
Contributor

Regarding ordering, @marcusps had a PR JuliaCollections/Iterators.jl#40 to implement Lexicographic ordering. So, there are use cases.

@mschauer
Copy link
Contributor

mschauer commented Jan 8, 2016

[(i,j) for i in 1:3, j in 1:4][:] == collect(product(1:3,1:4)) is a good justification for the initial choice.

@mschauer
Copy link
Contributor

mschauer commented Jan 8, 2016

It would be quite natural to also define

size(p::AbstractProdIterator) = (size(p.a)..., size(p.b)...)

and declare size an optional iterator method.

@tkelman
Copy link
Contributor

tkelman commented Jan 8, 2016

I don't love exporting the name for this, could be confusing. Why can't this be/stay in Iterators.jl?

@StefanKarpinski
Copy link
Member

@JeffBezanson's policy has been to move commonly useful iterators into Base if they have really efficient implementations (i.e. as fast as what you would write out by hand).

@tkelman
Copy link
Contributor

tkelman commented Jan 8, 2016

Why should performance determine whether something is in base or a package? Without context, exporting product with no type specification at all doesn't seem to be intuitively referencing a (outer) iterator product. prod for product of elements, product for outer iterator product, max for elementwise max and maximum for max element is inconsistent and confusing.

We could use a bit more namespacing and modularization in base for this kind of thing.

@StefanKarpinski
Copy link
Member

Obviously not everything that is fast should be in base, but certainly things that are slow should generally not be. This is not my argument, I'm just telling you what the policy has been.

@JeffBezanson
Copy link
Member Author

Why should performance determine whether something is in base or a package?

Stuff in Base has a stronger implied warranty because you didn't choose to install it. So it's silly to put things in there that we then have to tell people not to use because they're too slow. The next question is whether the functionality is important enough. I suspect it is; I'm considering basing N-d comprehensions on this.

I can understand the naming concern; the term "product" is quite overloaded. The same applies to iterators that might have both a lazy and eager version. So maybe a module makes sense.

@StefanKarpinski
Copy link
Member

+1 to having these in a Base.Iterators module that needs to be explicitly used.

@mschauer
Copy link
Contributor

mschauer commented Jan 8, 2016

I'm considering basing N-d comprehensions on this.

If you are thinking about it currently: a shape/size-preserving splat operator, say .., a shaped tuple type or a tensor type inclusive literal notation and a shaped iterator protocol for N-d comprehensions and a array concatenation syntax [A.. B..] based on shaped splatting would give a new perspective on quite a number of open issues.

@JeffBezanson
Copy link
Member Author

How should we name Base.Iterators given the Iterators package? Base.Iter?

@tkelman
Copy link
Contributor

tkelman commented Jan 14, 2016

Base.Iter sounds okay. Thinking to export or not export the Iter module?

@yuyichao
Copy link
Contributor

🚲 🏠 Should probably be Base.Iters?

@kmsquire
Copy link
Member

Not loving it. I would propose not exporting the module.

If we had #1255, I would prefer leaving them both with the name Iterators, and requiring at least one to be renamed on import:

import Base.Iterators
import Iterators as Iterators2

or

using Base.Iterators
using Iteratos as Iterators2  # since "using" imports the module name into the current scope

@StefanKarpinski
Copy link
Member

I'm with @kmsquire on this one.

@iamed2
Copy link
Contributor

iamed2 commented Jan 14, 2016

Alternatively, Base.Iteration?

@JeffBezanson
Copy link
Member Author

@kmsquire I agree that's nicer, but it seems odd to introduce a deliberate name conflict with a package.

@StefanKarpinski
Copy link
Member

What if the Iterators package just imported the Base.Iterators module into Main and added stuff to it?

@JeffBezanson
Copy link
Member Author

I suppose that could make sense, but we don't currently allow that. It would especially complicate package precompilation.

@JeffBezanson
Copy link
Member Author

Python and Rust use the name itertools.

@JeffBezanson
Copy link
Member Author

How about this: for now, I will add this functionality without exporting product. We can introduce a namespace for iterators separately. As part of that change, we can fully separate functions that currently have both lazy and eager methods, like filter (so filter(itr) would be deprecated to Base.Iterator.filter).

@tkelman
Copy link
Contributor

tkelman commented Jan 26, 2016

Sounds great.

note: not exported yet
JeffBezanson added a commit that referenced this pull request Jan 26, 2016
RFC: efficient product iterator
@JeffBezanson JeffBezanson merged commit c5704ed into master Jan 26, 2016
@tkelman tkelman deleted the jb/product branch January 29, 2016 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants