-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improved product iterator #22989
improved product iterator #22989
Conversation
Also, less lines of code. And this supports |
Awesome! What a nice surprise. A better product iterator is always on my wish list. |
base/iterators.jl
Outdated
end | ||
size(P::ProductIterator) = _prod_size(P.iterators) | ||
_prod_size(::Tuple{}) = () | ||
_prod_size(t::Tuple) = tuple(_prod_size1(t[1], iteratorsize(t[1]))..., _prod_size(tail(t))...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tuple
is not necessary; you can just use parens.
base/iterators.jl
Outdated
return (state, tailstates...), (val, tailvals...) | ||
end | ||
_prod_next(iterators::Tuple{}, states::Tuple{}, values::Tuple{}) = true, (), () | ||
function _prod_next(iterators, states, values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is getting inlined? Kind of amazing that it doesn't need @inline
. I might add it just to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will test again cause it was quite late yesterday evening, but I think I benchmarked with and without @inline
and didn't see a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the differences between the new inliner and the old one: yes, this is a lot of statements, but each one of them essentially boils down to a couple of intrinsics/primitives taking very few CPU cycles and hence is "cheap" by the metric used in the new inliner.
We've long had a special inlining bonus for functions like next
; I kept that and it may be contributing here, but I'm not actually sure it's necessary (or even helpful) anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, duh, it's definitely not helping here because this function is not named next
. This is pretty good evidence we could take that bonus out and use the "pure" algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the details of how the inliner works, but is there a limit to the inlining of recursive definitions? In this case, the code of next
for the product
of 14 iterators is already gigantic, though it still seems more efficient than with the explicit @noinline
, as indicated by my follow up benchmark in the main discussion.
However, in my own project, I have another iterator, somewhat similar to product
but where the different iterators are coupled in some tree structure, such that the actual nth
iterator depends on the state of the previous n-1
. In that case, I noticed that with explicit @inline
the compilation time became huge for N
larger than 10 or so; so I decided to get rid of these explicit @inline
s and trust the inliner. That's why I also didn't use explicit @inline
in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting 32-bit AV failure. |
cc @yuyichao |
Awesome! I think this will make |
base/iterators.jl
Outdated
@@ -4,7 +4,7 @@ module Iterators | |||
|
|||
import Base: start, done, next, isempty, length, size, eltype, iteratorsize, iteratoreltype, indices, ndims | |||
|
|||
using Base: tuple_type_cons, SizeUnknown, HasLength, HasShape, IsInfinite, EltypeUnknown, HasEltype, OneTo, @propagate_inbounds | |||
using Base: tail, tuple_type_head, tuple_type_tail, tuple_type_cons, SizeUnknown, HasLength, HasShape, IsInfinite, EltypeUnknown, HasEltype, OneTo, @propagate_inbounds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line wrap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will fix in a next commit, after having received advise on the two questions I have just posted.
@JeffBezanson ,
|
Two questions:
|
With |
Not quite what you write, but that's already the case on master: collect(Base.Iterators.product((1,2),(3,)))
-> 2×1 Array{Tuple{Int64,Int64},2}:
(1, 3)
(2, 3) |
|
Might as well @nanosoldier |
Thanks @JeffBezanson for the response. Implementing your response to question 2 actually amounts to code reduction, namely removing line 609 ( Another question before preparing a next commit: Currently, I use iteratorsize(::Type{ProductIterator{T}}) where {T<:Tuple} =
prod_iteratorsize( iteratorsize(tuple_type_head(T)), iteratorsize(ProductIterator{tuple_type_tail(T)}) ) Could this also be written using the @pure macro, e.g. something like @pure iteratorsize(::Type{ProductIterator{T}}) where {T<:Tuple} =
mapreduce(iteratorsize, prod_iteratorsize, T.parameters) Though this does not seem to make |
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Could be worth looking into the |
Yes there are indeed more allocations in that test, also when I run the benchmarks locally. I will try to understand how it is affected by |
@nanosoldier |
Thanks for restarting the benchmarks. The implementation has changed quite a bit still, and is not fully finalized / cleaned up. In particular:
And then another comment. I think I came across a bug in Julia (or LLVM) code generation/optimization. I refer to my comment in the code for further info. I will test some more and will file a separate issue if I am convinced of my case. |
base/iterators.jl
Outdated
iter1 = first(iterators) | ||
value1, state1 = next(iter1, states[1]) | ||
tailstates = tail(states) | ||
values = (value1, map(unsafe_get, state[3])...) # safe if not done(P, state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
state[3]
should be equal to nvalues
here. However, I came across a bug where, if I write nvalues
here, it seems like it is taking the "value" of nvalues
after it has been overwritten in the next block of code. In particular, in the final state before being done, none of the elements in nvalues
will be isnull
, but then, after the next block of code, they are.
Put differently, this function produced a different answer when I just call it versus when I run it line by line in the REPL. Or yet differently, if you replace state[3]
by nvalues
in this place, which should be a valid replacement, the tests fail outside of an actual test. I will analyse further and file a more complete bug description if I am convinced of my case.
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
I think the benchmarks are ok now? The ["scalar","cos"] benchmark seems to be an anomaly. I cannot reproduce it locally. Sometimes another test out of that group differs, but these are elementary The ["random","ranges",...] benchmark does indeed allocate 4 more bytes (on a total of 420 bytes). I don't see immediately where All the other differences seem to be improvements. |
LGTM, but I'll leave it to @JeffBezanson to pull the trigger on this one. |
That's great but hold off merging until I have further investigated the issue about which I wrote a comment in the code. |
OK, I am not able to reproduce it on my desktop computer; maybe it was something in my environment. I've pushed the last commit restoring the code to how it was before the error (also removing the superfluous use of |
I don't know what happend with appveyor. On Travis, 64bit linux errored on an Arpack failure with a test of Regarding the (eigs(speye(50), nev=10))[1] ≈ ones(10) seems prone to give errors. Trying to build a Krylov subspace with an identity matrix is asking for trouble, as every new vector created will be identical to the previous one and therefore become zero after orthogonalization. Not saying that Arpack shouldn't account for that, but it can be rather buggy software and this test will basically trigger that behaviour. I would recommend using another diagonal matrix as a test. |
How does this version do on the original benchmarks in #22989 (comment) ? No need to post numbers, just want to double-check if the story is roughly the same. |
Roughly the same; maybe a little bit regression with respect to
Unfortunately trying to run the above benchmark on latest master yields:
which I guess is an issue with BenchmarkTools.jl |
Thanks! |
This is basically a complete replacement of the implementation of the
product
iterator inBase.Iterators
. It is more efficient and remains type stable for a larger number of iterators. The current product iterator starts allocating for 6 or more iterators, whereas the implementation in this PR remains type stable and therefore allocation free up to 14 iterators.In the benchmark below, I also compare to the more specialised but therefore also more efficient
CartesianRange
iterator. The current implementation seems almost as efficient for the case where all iterators are ranges, so thatCartesianRange
could maybe become a disguisedproduct
iterator that wraps the output tuple in aCartesianIndex
.I think @JeffBezanson implemented the current
product
iterator in master, and @timholy did most of theCartesianRange
work.Benchmark:
and results in
product
CartesianRange
product