RFC: efficient product iterator #14596

JeffBezanson · 2016-01-07T23:11:20Z

This uses the same general trick as the zip iterator to get performance without generated functions. I'm sure it could be better, but it's at least reasonably fast:

julia> function f()
         @time for k=1:10;[ i+j for i=1:1000, j=1:1000 ];end
         @time for k=1:10;[ i+j for (i,j) in product(1:1000,1:1000) ];end
       end
f (generic function with 1 method)

julia> f();
  0.011721 seconds (20 allocations: 76.295 MB, 18.53% gc time)
  0.022262 seconds (20 allocations: 76.295 MB, 4.50% gc time)

julia> function f()
         @time for k=1:10;[ i+j+l for i=1:1000, j=1:100, l=1:10 ];end
         @time for k=1:10;[ i+j+l for (i,j,l) in product(1:1000,1:100,1:10) ];end
       end
f (generic function with 1 method)

julia> f();
  0.013009 seconds (20 allocations: 76.295 MB, 11.70% gc time)
  0.046552 seconds (20 allocations: 76.295 MB, 4.16% gc time)

julia> f();
  0.020044 seconds (20 allocations: 76.295 MB, 11.20% gc time)
  0.029772 seconds (20 allocations: 76.295 MB, 6.82% gc time)

The times are pretty variable but this looks something like a factor of 2.

This implements the same iteration order as comprehensions (first iterator is the innermost loop). Would anybody want lexicographic order in addition or instead?

JeffBezanson · 2016-01-07T23:15:29Z

Also, I tried to be careful to make the same start/done/next calls that nested loops would. I think it's pretty close but it makes the code slightly more awkward than it could be. It's really quite amazing how many ways there are to write prod_next; I went through quite a few.

timholy · 2016-01-07T23:44:49Z

I suspect this would be easier, and probably faster, with #9182---you almost surely wouldn't need the whole Nullable business at all.

With the current architecture, is there anyway to avoid the Nullable? I haven't thought it through in detail, but I wonder about a Prod1 "inner" iterator that gets used by a Prod2 "outer" iterator.

timholy · 2016-01-07T23:45:08Z

base/iterator.jl

+    s1, s2, Nullable{eltype(p.b)}(), (done(p.a,s1) || done(p.b,s2))
+end
+
+@inline function prod_next(p, st)


rename as next and delete the definition below?

It's used by both Prod and Prod2.

Ah, I missed that.

JeffBezanson · 2016-01-07T23:51:48Z

I'm not sure that's true --- to avoid calling next repeatedly with the same state, you need to save the previous value in the state. But initially there is no value, so what do you put in the state returned by start?

timholy · 2016-01-07T23:54:43Z

You could call nextval repeatedly without calling nextstate. Then you're re-fetching the next value, and don't have to hold on to it.

JeffBezanson · 2016-01-07T23:57:54Z

True, that would work for iterators where you want to avoid extra state changes, but it still depends on the iterator being able to return the same value again, so somebody has to store it.

timholy · 2016-01-08T00:17:06Z

I'm afraid I don't follow. Currently next has to be able to fetch a value and also increment a state. If you separate the two, I don't think you lose the ability to fetch the value. But I'm probably not thinking generally enough about different types of containers. Are you worried about a "streaming" container where, once fetched, the object might vanish? I suspect there's a way around that if one designs nextval to receive & pass a "value state" variable---and since you only ever call nextval if you know there is a value waiting, there's no need for a Nullable. But like I said, I haven't thought this through carefully.

It seems to me that with #9182, this would reduce to

start(p::Prod2) = (start(p.a), start(p.b))
nextval(p::Prod2, s) = nextval(p.a, s[1]), nextval(p.b, s[2])
nextstate(p::Prod2, s) = isdone(p.a, s[1]) ? (start(p.a), nextstate(p.b, s[2])) : (nextstate(p.a, s[1]), s[2])

Or, if nextval also needs to pass back a value-state, the middle one would become something like

nextval(p::Prod2, s, vs) = (v1,vs1 = nextval(p.a, s[1], vs[1]); v2,vs2 = nextval(p.b, s[2], vs[2]); (v1,v2), (vs1,vs2)```

JeffBezanson · 2016-01-08T00:31:40Z

Are you worried about a "streaming" container where, once fetched, the object might vanish?

Yes.

If you allow repeated calls to next with the same state, then prod_next becomes basically as simple as those nextval+nextstate definitions. Uncooperative iterators could potentially support that by remembering one value and state and returning the value as long as the state is the same. Less elegant than nextval+nextstate probably, but approximately the same work as implementing the value-state argument.

timholy · 2016-01-08T01:18:34Z

approximately the same work as implementing the value-state argument

Wouldn't it just be

nextval(iter, state, valstate) = fetch_value(iter, state), 0

when you can look up a value (which is probably >90% of all cases?), and

function nextval(iter, state, valstate)
    state == valstate.state && return get(valstate.value), valstate
    nv = fetch_value(iter, state)
    nv, ValState(Nullable(nv), state)
end

for a streaming container? This restricts usage of Nullable to the cases where you need it.

The main point being that, once you've implemented "persistence" for any iterators that need it, building more complex iterators becomes a more composable operation.

timholy · 2016-01-08T01:26:49Z

But you're probably right that it's not much harder for an "uncooperative" iterator with the current architecture (EDIT: if we required that iterators support repeated calls to next).

blakejohnson · 2016-01-08T03:06:46Z

Regarding ordering, @marcusps had a PR JuliaCollections/Iterators.jl#40 to implement Lexicographic ordering. So, there are use cases.

mschauer · 2016-01-08T14:00:29Z

[(i,j) for i in 1:3, j in 1:4][:] == collect(product(1:3,1:4)) is a good justification for the initial choice.

mschauer · 2016-01-08T14:13:48Z

It would be quite natural to also define

size(p::AbstractProdIterator) = (size(p.a)..., size(p.b)...)

and declare size an optional iterator method.

tkelman · 2016-01-08T15:11:24Z

I don't love exporting the name for this, could be confusing. Why can't this be/stay in Iterators.jl?

StefanKarpinski · 2016-01-08T15:27:25Z

@JeffBezanson's policy has been to move commonly useful iterators into Base if they have really efficient implementations (i.e. as fast as what you would write out by hand).

tkelman · 2016-01-08T15:46:12Z

Why should performance determine whether something is in base or a package? Without context, exporting product with no type specification at all doesn't seem to be intuitively referencing a (outer) iterator product. prod for product of elements, product for outer iterator product, max for elementwise max and maximum for max element is inconsistent and confusing.

We could use a bit more namespacing and modularization in base for this kind of thing.

StefanKarpinski · 2016-01-08T15:51:29Z

Obviously not everything that is fast should be in base, but certainly things that are slow should generally not be. This is not my argument, I'm just telling you what the policy has been.

JeffBezanson · 2016-01-08T15:53:36Z

Why should performance determine whether something is in base or a package?

Stuff in Base has a stronger implied warranty because you didn't choose to install it. So it's silly to put things in there that we then have to tell people not to use because they're too slow. The next question is whether the functionality is important enough. I suspect it is; I'm considering basing N-d comprehensions on this.

I can understand the naming concern; the term "product" is quite overloaded. The same applies to iterators that might have both a lazy and eager version. So maybe a module makes sense.

StefanKarpinski · 2016-01-08T15:54:36Z

+1 to having these in a Base.Iterators module that needs to be explicitly used.

mschauer · 2016-01-08T16:38:26Z

I'm considering basing N-d comprehensions on this.

If you are thinking about it currently: a shape/size-preserving splat operator, say .., a shaped tuple type or a tensor type inclusive literal notation and a shaped iterator protocol for N-d comprehensions and a array concatenation syntax [A.. B..] based on shaped splatting would give a new perspective on quite a number of open issues.

JeffBezanson · 2016-01-13T20:28:06Z

How should we name Base.Iterators given the Iterators package? Base.Iter?

tkelman · 2016-01-14T00:51:08Z

Base.Iter sounds okay. Thinking to export or not export the Iter module?

yuyichao · 2016-01-14T00:57:01Z

🚲 🏠 Should probably be Base.Iters?

kmsquire · 2016-01-14T04:08:07Z

Not loving it. I would propose not exporting the module.

If we had #1255, I would prefer leaving them both with the name Iterators, and requiring at least one to be renamed on import:

import Base.Iterators
import Iterators as Iterators2

or

using Base.Iterators
using Iteratos as Iterators2  # since "using" imports the module name into the current scope

StefanKarpinski · 2016-01-14T06:09:57Z

I'm with @kmsquire on this one.

iamed2 · 2016-01-14T18:19:47Z

Alternatively, Base.Iteration?

JeffBezanson · 2016-01-14T19:35:08Z

@kmsquire I agree that's nicer, but it seems odd to introduce a deliberate name conflict with a package.

StefanKarpinski · 2016-01-14T20:48:00Z

What if the Iterators package just imported the Base.Iterators module into Main and added stuff to it?

JeffBezanson · 2016-01-14T20:51:23Z

I suppose that could make sense, but we don't currently allow that. It would especially complicate package precompilation.

JeffBezanson · 2016-01-22T21:08:36Z

Python and Rust use the name itertools.

JeffBezanson · 2016-01-26T04:48:47Z

How about this: for now, I will add this functionality without exporting product. We can introduce a namespace for iterators separately. As part of that change, we can fully separate functions that currently have both lazy and eager methods, like filter (so filter(itr) would be deprecated to Base.Iterator.filter).

tkelman · 2016-01-26T04:50:51Z

Sounds great.

note: not exported yet

RFC: efficient product iterator

timholy reviewed Jan 7, 2016
View reviewed changes

add efficient product iterator

c4c826a

note: not exported yet

JeffBezanson force-pushed the jb/product branch from 5bb4184 to c4c826a Compare January 26, 2016 04:59

JeffBezanson added a commit that referenced this pull request Jan 26, 2016

Merge pull request #14596 from JuliaLang/jb/product

c5704ed

RFC: efficient product iterator

JeffBezanson merged commit c5704ed into master Jan 26, 2016

tkelman deleted the jb/product branch January 29, 2016 06:25

tkelman mentioned this pull request May 18, 2016

RFC: Change iteratorsize trait of product(itr1, itr2) #16437

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: efficient product iterator #14596

RFC: efficient product iterator #14596

JeffBezanson commented Jan 7, 2016

JeffBezanson commented Jan 7, 2016

timholy commented Jan 7, 2016

timholy Jan 7, 2016

JeffBezanson Jan 7, 2016

timholy Jan 8, 2016

JeffBezanson commented Jan 7, 2016

timholy commented Jan 7, 2016

JeffBezanson commented Jan 7, 2016

timholy commented Jan 8, 2016

JeffBezanson commented Jan 8, 2016

timholy commented Jan 8, 2016

timholy commented Jan 8, 2016

blakejohnson commented Jan 8, 2016

mschauer commented Jan 8, 2016

mschauer commented Jan 8, 2016

tkelman commented Jan 8, 2016

StefanKarpinski commented Jan 8, 2016

tkelman commented Jan 8, 2016

StefanKarpinski commented Jan 8, 2016

JeffBezanson commented Jan 8, 2016

StefanKarpinski commented Jan 8, 2016

mschauer commented Jan 8, 2016

JeffBezanson commented Jan 13, 2016

tkelman commented Jan 14, 2016

yuyichao commented Jan 14, 2016

kmsquire commented Jan 14, 2016

StefanKarpinski commented Jan 14, 2016

iamed2 commented Jan 14, 2016

JeffBezanson commented Jan 14, 2016

StefanKarpinski commented Jan 14, 2016

JeffBezanson commented Jan 14, 2016

JeffBezanson commented Jan 22, 2016

JeffBezanson commented Jan 26, 2016

tkelman commented Jan 26, 2016

RFC: efficient product iterator #14596

RFC: efficient product iterator #14596

Conversation

JeffBezanson commented Jan 7, 2016

JeffBezanson commented Jan 7, 2016

timholy commented Jan 7, 2016

timholy Jan 7, 2016

Choose a reason for hiding this comment

JeffBezanson Jan 7, 2016

Choose a reason for hiding this comment

timholy Jan 8, 2016

Choose a reason for hiding this comment

JeffBezanson commented Jan 7, 2016

timholy commented Jan 7, 2016

JeffBezanson commented Jan 7, 2016

timholy commented Jan 8, 2016

JeffBezanson commented Jan 8, 2016

timholy commented Jan 8, 2016

timholy commented Jan 8, 2016

blakejohnson commented Jan 8, 2016

mschauer commented Jan 8, 2016

mschauer commented Jan 8, 2016

tkelman commented Jan 8, 2016

StefanKarpinski commented Jan 8, 2016

tkelman commented Jan 8, 2016

StefanKarpinski commented Jan 8, 2016

JeffBezanson commented Jan 8, 2016

StefanKarpinski commented Jan 8, 2016

mschauer commented Jan 8, 2016

JeffBezanson commented Jan 13, 2016

tkelman commented Jan 14, 2016

yuyichao commented Jan 14, 2016

kmsquire commented Jan 14, 2016

StefanKarpinski commented Jan 14, 2016

iamed2 commented Jan 14, 2016

JeffBezanson commented Jan 14, 2016

StefanKarpinski commented Jan 14, 2016

JeffBezanson commented Jan 14, 2016

JeffBezanson commented Jan 22, 2016

JeffBezanson commented Jan 26, 2016

tkelman commented Jan 26, 2016