Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task advances on done() not next() ? #6125

Closed
filmackay opened this issue Mar 12, 2014 · 14 comments
Closed

Task advances on done() not next() ? #6125

filmackay opened this issue Mar 12, 2014 · 14 comments

Comments

@filmackay
Copy link

Unlike other iterable types, Task seem to advance on done instead of next. Is there a reason for this - it seems to break the iteration contract. eg.

function p()
    produce("A")
    produce("B")
    produce("C")
end

t = Task(p)

println("1")
s = start(t)
println("2")
done(t, s)
println("3")
(v, s) = next(t, s)
println("4 $v")
done(t, s)
println("5")
(v, s) = next(t, s)
println("6 $v")
(v, s) = next(t, s)
println("7 $v")
done(t, s)
println("8")

Produces:

1
2
3
4 A
5
6 B
7 B
8
@JeffBezanson
Copy link
Member

It's because you can't tell whether a task is done without trying to run it, at which point you also get the first value it produces.

@ivarne
Copy link
Member

ivarne commented Mar 12, 2014

You are also kind of breaking the contract by not checking if the iteration is done before getting the next value. Where is the iteration contract that says you are allowed to do that? next(1:3,4) does not give a BoundsError.

@StefanKarpinski
Copy link
Member

This used to be the other way, but there was a complaint about that too. In the current arrangement, if you always call next before done the way you're supposed to, this works as one would expect.

@ivarne
Copy link
Member

ivarne commented Mar 12, 2014

@StefanKarpinski I assume you intended to write "done before next"?

@StefanKarpinski
Copy link
Member

Right, yes that.

@filmackay
Copy link
Author

Thanks guys.

OK - so the contract is to always check done before each next, thereby avoiding that is undefined. Would it make more sense for iterators to have a next that returns a (done, value, state) tuple? At the moment an iterator needs to check for doneness in next too - to throw a bounds error. This would remove any ambiguity by binding the two operations together. value would be () if done is true.

@ivarne
Copy link
Member

ivarne commented Mar 13, 2014

I also wondered why we need two functions (done and next), but I think I managed to find some arguments.

  • Many implementations of next does not check bounds. Only methods that would otherwise segfault does that in order to prevent memory corruption. You are an example to show that it is not safe to put the @inbounds declaration in next because we assume that it will not be called when done is false. Maybe we could have something like a unsafe_next that would be allowed to segfault if not called right after done?
  • If next returned (done, value, state) we would not always have a valid value of the correct type to return for value. That would lead to type unstable, slow code, because we can not raise an exception as we do now. The alternative would be to make the done returned from next tell if there are one more element in the iterable, but that would make you unable to process the result of one task before the next has arrived.
  • Python generators uses exceptions to stop iteration of generators, and while that is convenient in a lot of cases, it is much slower than the done check, and thus not suitable for Julia.

@mbauman
Copy link
Member

mbauman commented Mar 13, 2014

Yes, that makes sense. Thanks for the explanation. It's surprisingly tricky.

It still feels weird to be mutating the task within the done check. Would it make more sense to do this in start? It's even simpler if you store the next result in the state, but that breaks a few other methods. As an aside, are there no tests for Tasks?

start(t::Task) = next(t)[2]
done(t::Task, val) = istaskdone(t)
next(t::Task, val) = (t.result, (t.result = consume(t); nothing))

@JeffBezanson
Copy link
Member

These definitions work fairly well:

start(t::Task) = consume(t)
done(t::Task, val) = istaskdone(t)
next(t::Task, val) = (val, consume(t))

The only downside is that they switch to the task twice before you actually get the first value. It would be nice to be able to compute the second thing that next returns at the end of the loop body rather than at the top. Interestingly, this can lead to better code generation too (as I found in #5469), but fortunately there was a workaround for now.

That seems to ask for nextval and nextstate functions, which would also cut down on tuples. While that would be the most breaking change of all time, it's still interesting to think about.

@mbauman
Copy link
Member

mbauman commented Mar 13, 2014

That's exactly what I played with first… it's wonderfully simple but it seems that some other methods (like schedule) depend upon t.result being used. But that can be fixed.

@JeffBezanson
Copy link
Member

I don't think any other code depends on the iterator setting t.result. Really, that mutating during iteration is just a big nuisance.

@StefanKarpinski
Copy link
Member

The nextval nextstate thing could actually be made to be relatively non-disruptive with these interim definitions:

nextval(itr,state) = next(itr,stat)[1]
nextstate(itr,state) = next(itr,stat)[2]
next(itr,state) = (nextval(itr,state), nextstate(itr,state))

These seem circular, but of course the circularity would be broken when any itr type had one or the other set of definitions – next or nextval and nextstate. I'm not advocating that since I think this is kind of ugly, but it's worth considering alternatives. As I recall, one of the motivations for the way next simultaneously returns a value and computes the next index was iteration over UTF-8 strings, where the character decoding process also naturally gives you the starting index of the next character.

@JeffBezanson
Copy link
Member

Quite true; the other side of the coin is being able to share computation between nextval and nextstate.

@StefanKarpinski
Copy link
Member

It's arguable that you could just do that work twice most of the time and as long as the nextval and nextstate calls get inlined, LLVM would probably be able to eliminate the duplication. There are cases where there's enough work to be done that inlining wouldn't happen, but in such cases maybe some other approach to avoiding the duplicated work would be warranted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants