Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

julep: a return-able break statement (improving iteration performance) #16035

Closed
timholy opened this issue Apr 25, 2016 · 6 comments
Closed

julep: a return-able break statement (improving iteration performance) #16035

timholy opened this issue Apr 25, 2016 · 6 comments

Comments

@timholy
Copy link
Member

timholy commented Apr 25, 2016

I looked into #9080 again, and as pointed out by @simonster, it seems obvious that the core problem is that our current iteration framework seems to require two branches per iteration rather than one. It seems this could be solved if we could, in effect, return break from a function.

The performance difference is illustrated by these functions:

function sumcart_iter_good(A)
    s = 0.0
    R = CartesianRange(size(A))
    I = start(R)
    @inbounds while true
        s += A[I]
        i1, i2 = I.I[1], I.I[2]
        i1 += 1
        if i1 > size(A, 1)
            i1 = 1
            i2 += 1
            if i2 > size(A, 2)  # note we test this only when i1 > size(A, 1)
                break
            end
        end
        I = CartesianIndex((i1, i2))
    end
    s
end

function sumcart_iter_bad(A)
    s = 0.0
    R = CartesianRange(size(A))
    I = start(R)
    @inbounds while true
        s += A[I]
        i1, i2 = I.I[1], I.I[2]
        i1 += 1
        if i1 > size(A, 1)
            i1 = 1
            i2 += 1
        end
        if i2 > size(A, 2)   # here we test it on every iteration
            break
        end
        I = CartesianIndex((i1, i2))
    end
    s
end

The good one typically has only one branch per iteration, whereas the bad one always has two. The good one has the same performance as writing the loops out manually, and the bad one is about 70% worse.

In this gist I present some code which would generate the good version, if we were able to "call" break from inside a function. (CR = CartesianRange, CI = CartesianIndex.) Note this is not the same as effectively testing done inside next, and just passing back the boolean; the key problem is the branch, not the comparison.

Note the gist is also free of @generated functions; we can do that regardless, of course.

@timholy
Copy link
Member Author

timholy commented Apr 25, 2016

Hmm, I just thought of another alternative: define a 3-argument, 2-return done, and have the scheme for-loop translation code call isdone, state = done(iter, state, true) rather than the 2-argument version. It would have a fallback definition:

done(iter, state, extra) = done(iter, state), state

This would allow one, as needed, to override the 3-argument version and use done not only to test for termination, but also to handle the "wrapping" of cartesian indexes---in essence, "cleaning up" after next. (In this proposal, next is almost trivial and only increments I[1], all the wrapping is handled in done.)

This appears to be completely non-breaking and would seemingly solve the performance problem. In a sense this sort of gives us #9182, except that #9182 doesn't solve this particular performance problem.

@timholy timholy changed the title julep: a return-able break statement (improving iteration performance) julep: 3-argument, 2-return done function Apr 25, 2016
@timholy
Copy link
Member Author

timholy commented Apr 25, 2016

This latter proposal also seems likely to solve #8149: just set up the iterator so the item is tucked into state (and start fills in a dummy item). next just retrieves the item and returns it.

@timholy timholy changed the title julep: 3-argument, 2-return done function julep: a return-able break statement (improving iteration performance) Apr 25, 2016
@timholy
Copy link
Member Author

timholy commented Apr 25, 2016

Continuing my debate with myself...now I'm skeptical that the 3-argument, 2-return done function would solve the performance problem. You still have to test twice, once during the increment and once to decide about while termination. I'm back to thinking that the "returnable" break is the only way to go here.

Might be good for #8149, though.

@timholy
Copy link
Member Author

timholy commented Apr 25, 2016

Looks like we use an LLVM optimization pass, jump threading, that might possibly help here. But the performance suggests it's not achieving this particular goal.

Test code:

function sumcart_while_maybe(A::AbstractMatrix)
    @assert !isempty(A)
    s = 0.0
    i = j = 1
    @inbounds while true
        s += A[i,j]
        mightbreak = false
        i += 1
        if i > size(A, 1)
            i = 1
            j += 1
            mightbreak = true
        end
        if mightbreak
            if j > size(A, 2)
                break
            end
        end
    end
    s
end

Ideally LLVM would take that mightbreak conditional block and plant it in place of the mightbreak=true statement above it. Yet the performance is notably worse than doing that manually:

function sumcart_while_good(A::AbstractMatrix)
    @assert !isempty(A)
    s = 0.0
    i = j = 1
    @inbounds while true
        s += A[i,j]
        i += 1
        if i > size(A, 1)
            i = 1
            j += 1
            if j > size(A, 2)
                break
            end
        end
    end
    s
end

Anyone know of any other passes that might help?

@timholy
Copy link
Member Author

timholy commented Jun 10, 2016

Reflecting on #16035 (comment) further, if the increment in next is dumb (doesn't check for wrapping or anything), then I'm back to thinking that the 3-argument, 2-return form of done would solve #9080. Relevant for #16128.

@timholy
Copy link
Member Author

timholy commented Jun 11, 2016

Closed in favor of an explicit solution, #16878.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant