julep: a return-able `break` statement (improving iteration performance) #16035

timholy · 2016-04-25T09:02:57Z

I looked into #9080 again, and as pointed out by @simonster, it seems obvious that the core problem is that our current iteration framework seems to require two branches per iteration rather than one. It seems this could be solved if we could, in effect, return break from a function.

The performance difference is illustrated by these functions:

function sumcart_iter_good(A)
    s = 0.0
    R = CartesianRange(size(A))
    I = start(R)
    @inbounds while true
        s += A[I]
        i1, i2 = I.I[1], I.I[2]
        i1 += 1
        if i1 > size(A, 1)
            i1 = 1
            i2 += 1
            if i2 > size(A, 2)  # note we test this only when i1 > size(A, 1)
                break
            end
        end
        I = CartesianIndex((i1, i2))
    end
    s
end

function sumcart_iter_bad(A)
    s = 0.0
    R = CartesianRange(size(A))
    I = start(R)
    @inbounds while true
        s += A[I]
        i1, i2 = I.I[1], I.I[2]
        i1 += 1
        if i1 > size(A, 1)
            i1 = 1
            i2 += 1
        end
        if i2 > size(A, 2)   # here we test it on every iteration
            break
        end
        I = CartesianIndex((i1, i2))
    end
    s
end

The good one typically has only one branch per iteration, whereas the bad one always has two. The good one has the same performance as writing the loops out manually, and the bad one is about 70% worse.

In this gist I present some code which would generate the good version, if we were able to "call" break from inside a function. (CR = CartesianRange, CI = CartesianIndex.) Note this is not the same as effectively testing done inside next, and just passing back the boolean; the key problem is the branch, not the comparison.

Note the gist is also free of @generated functions; we can do that regardless, of course.

The text was updated successfully, but these errors were encountered:

timholy · 2016-04-25T09:28:30Z

Hmm, I just thought of another alternative: define a 3-argument, 2-return done, and have the scheme for-loop translation code call isdone, state = done(iter, state, true) rather than the 2-argument version. It would have a fallback definition:

done(iter, state, extra) = done(iter, state), state

This would allow one, as needed, to override the 3-argument version and use done not only to test for termination, but also to handle the "wrapping" of cartesian indexes---in essence, "cleaning up" after next. (In this proposal, next is almost trivial and only increments I[1], all the wrapping is handled in done.)

This appears to be completely non-breaking and would seemingly solve the performance problem. In a sense this sort of gives us #9182, except that #9182 doesn't solve this particular performance problem.

timholy · 2016-04-25T09:52:04Z

This latter proposal also seems likely to solve #8149: just set up the iterator so the item is tucked into state (and start fills in a dummy item). next just retrieves the item and returns it.

timholy · 2016-04-25T12:06:35Z

Continuing my debate with myself...now I'm skeptical that the 3-argument, 2-return done function would solve the performance problem. You still have to test twice, once during the increment and once to decide about while termination. I'm back to thinking that the "returnable" break is the only way to go here.

Might be good for #8149, though.

timholy · 2016-04-25T19:06:43Z

Looks like we use an LLVM optimization pass, jump threading, that might possibly help here. But the performance suggests it's not achieving this particular goal.

Test code:

function sumcart_while_maybe(A::AbstractMatrix)
    @assert !isempty(A)
    s = 0.0
    i = j = 1
    @inbounds while true
        s += A[i,j]
        mightbreak = false
        i += 1
        if i > size(A, 1)
            i = 1
            j += 1
            mightbreak = true
        end
        if mightbreak
            if j > size(A, 2)
                break
            end
        end
    end
    s
end

Ideally LLVM would take that mightbreak conditional block and plant it in place of the mightbreak=true statement above it. Yet the performance is notably worse than doing that manually:

function sumcart_while_good(A::AbstractMatrix)
    @assert !isempty(A)
    s = 0.0
    i = j = 1
    @inbounds while true
        s += A[i,j]
        i += 1
        if i > size(A, 1)
            i = 1
            j += 1
            if j > size(A, 2)
                break
            end
        end
    end
    s
end

Anyone know of any other passes that might help?

timholy · 2016-06-10T13:25:16Z

Reflecting on #16035 (comment) further, if the increment in next is dumb (doesn't check for wrapping or anything), then I'm back to thinking that the 3-argument, 2-return form of done would solve #9080. Relevant for #16128.

timholy · 2016-06-11T11:38:49Z

Closed in favor of an explicit solution, #16878.

timholy changed the title ~~julep: a return-able break statement (improving iteration performance)~~ julep: 3-argument, 2-return done function Apr 25, 2016

timholy changed the title ~~julep: 3-argument, 2-return done function~~ julep: a return-able break statement (improving iteration performance) Apr 25, 2016

timholy mentioned this issue Jun 11, 2016

julep: extended iteration API, with proof-of-principle for fixing performance of Cartesian iteration #16878

Closed

timholy closed this as completed Jun 11, 2016

timholy mentioned this issue Oct 25, 2016

Code generation and cartesian iteration #9080

Closed

timholy mentioned this issue Mar 8, 2017

Speed up copy!(dest, Rdest, src, Rsrc) by splitting out first dimension #20944

Merged

Keno mentioned this issue Mar 8, 2017

Julep: Revised iteration protocol #18823

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

julep: a return-able `break` statement (improving iteration performance) #16035

julep: a return-able `break` statement (improving iteration performance) #16035

timholy commented Apr 25, 2016 •

edited

Loading

timholy commented Apr 25, 2016

timholy commented Apr 25, 2016

timholy commented Apr 25, 2016

timholy commented Apr 25, 2016

timholy commented Jun 10, 2016

timholy commented Jun 11, 2016

julep: a return-able break statement (improving iteration performance) #16035

julep: a return-able break statement (improving iteration performance) #16035

Comments

timholy commented Apr 25, 2016 • edited Loading

timholy commented Apr 25, 2016

timholy commented Apr 25, 2016

timholy commented Apr 25, 2016

timholy commented Apr 25, 2016

timholy commented Jun 10, 2016

timholy commented Jun 11, 2016

julep: a return-able `break` statement (improving iteration performance) #16035

julep: a return-able `break` statement (improving iteration performance) #16035

timholy commented Apr 25, 2016 •

edited

Loading