-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data races in n_avail(::Channel)
to fix isready
/isempty
#41833
Conversation
I don't think this is necessary ("benign"). If you have a single-reader, this function may even be useful. Ideally, we could make this read atomic-relaxed so that TSan tooling would ignore it too. |
Yes I wondered about this. But I worry that declaring data races benign is often underestimating the combined "creativity" of users and the optimizer. For example, for users who loop on Consider apparently innocent (if bizarre) user code such as function foo(c::Channel)
s = 0
for i=1:typemax(Int)
!isready(c) || break
s += i^2 + i^3 - 1 # some computation not involving memory operations
end
s
end Here the user expects to break out of the computation if the channel becomes nonempty. But this is not what seems to happen: julia> c = Channel(Inf)
Channel{Any}(9223372036854775807) (empty)
julia> t = Threads.@spawn foo(c)
Task (runnable) @0x00007efd21e428c0
julia> put!(c, 1)
1
# Still running!?
julia> t
Task (runnable) @0x00007efd21e428c0 My guess is that the compiler has hoisted the load in this case? For context, I came across code involving
Sure, what's the right way to go about this? Would it prevent the problem above? |
If we read/write the underlying data (the vector length) of ref = Ref(0)
c = Channel(1)
@spawn begin
ref[] = 1
put!(c, nothing)
end
if isready(c)
@assert ref[] == 0
end which is rather counter intuitive. Explicitly documenting the lack of ordering may be one option, though.
This can happen with atomics too: CppCon 2016: JF Bastien “No Sane Compiler Would Optimize Atomics" - YouTube (around 45:16) |
Yikes. This makes me wonder what guarantees, if any, we can make about I'm a little confused by the talk because they use |
I wonder how other concurrent collection libraries do this. For example,
... but only for " C# also doesn't say anything about it ConcurrentQueue.IsEmpty Property (System.Collections.Concurrent) | Microsoft Docs. Similar dicussion, but the answer seems to be no: c# - Does ConcurrentQueue.IsEmpty require a memory barrier? - Stack Overflow Overall, monotonic might not be so crazy from these two data points. I commented on the puzzling things in the talk here https://julialang.zulipchat.com/#narrow/stream/236830-concurrency/topic/Atomics.20progress.20bar.20example |
It turns out that they correct this in the talk to say that they meant to use
Practically, OpenJDK checks that I also looked at Go's |
I think the real problem is how to specify (i.e., document) the behavior (rather than how to implement them) so that users can reason about their code based on a proper memory model. Also, there are two distinct questions:
I initially thought yes for both, but now that looking at other channel specifications, no for 2 sounds reasonable to me. Also, relying on (But looking at Go is a good idea. The |
Definitely. Though looking at the implementation can be a nice way to guess the intent when the documentation is missing :-) If I understand the Go implementation, I think it may be that "nobody bothered to specify it, because
I think so, yes. More generally it seems useful to be able to read
Maybe not — I don't see what value that would bring if it's inherently racy to use the return of |
There are at least 3 valid (non-racy) uses of isready that I am aware of:
None of these require memory orderings stronger than relaxed. The other example (#41833 (comment)) with Ref above seems fixable by giving this read acquire ordering, but is that useful? |
Yeah, I agree relaxed read/write is sufficient (and also increases the classes of code we can write). My initial comment was only looking at the name A tricky use-case I just realized was Lines 344 to 347 in 543386d
I think this code (+ :monotonic) would have been OK if we could assume all the tasks are scheduled through |
I noticed that
|
3e08033
to
9790e41
Compare
isready(::Channel)
and remove data race in waitlength(::Channel)
and fix data races in isready
/isempty
Ok, I've completely changed tack here to add a new atomic This is slightly redundant in that we should "always" have There's a slight change / improvement of semantic here between Does this sound like a good compromise? If so I'll add test coverage for |
This sounds like the best approach ATM to me. |
Ok, I've added tests for the new I think the existing use of function wait(c::Channel)
isready(c) && return # OK ??
lock(c)
try
while !isready(c) # OK ??
check_channel_state(c)
wait(c.cond_wait)
end
finally
unlock(c)
end
nothing
end Why do I think this is valid? Well, we can use Racy use outside a lockwait(c)
# At this point, we only know `c` had a value "recently" or "soon" There's really no guarantee here, so Use inside a locklock(c)
wait(c)
# At this point we know `c` definitely has a value
unlock(c) This use should be valid independent of |
I agree with the reasoning. |
base/channels.jl
Outdated
try | ||
@atomic :monotonic c.length += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atomic :monotonic c.length += 1 | |
@atomic :monotonic c.length = c.length + 1 # just atomic store, not increment |
we don't actually want atomic increment (which is often quite slow), but simply a monotonic store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, this is inside the lock... so confusing!
So (to check I understand) I think this means:
- We don't need an atomic load — no other thread can be storing to this in parallel
- For the same reasons, we don't need atomic increment
- We do require atomic (monotonic) store, as other threads may do an atomic load outside the lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go through and update all the places that this pattern occurs.
while length(c.data) == c.sz_max | ||
check_channel_state(c) | ||
wait(c.cond_put) | ||
end | ||
push!(c.data, v) | ||
did_buffer = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am slightly confused what this is a count of precisely. Why not do the increment here, so it is always a lower bound on the number of items which are available, rather than being slightly ahead of the items that are available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows us to count the tasks waiting in wait(c.cond_put)
, for consistency with unbuffered channels.
If we don't do this, the count of "available items" is quite inconsistent between (finite) buffered vs unbuffered channels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I don't have a strong opinion on the "right" answer here. I just think we should be consistent in the way we count "available" items in buffered/unbuffered cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree it's nice to be consistent.
Another approach to make it consistent is to have length(unbuffered_channel) == 0
and length(buffered_channel) == length(buffered_channel.data)
, right? Maybe it's reasonable in some sense, since we'd have
lock(ch)
n = length(ch)
close(ch)
unlock(ch)
@assert length(collect(ch)) == n
when there are no takers. This property seems intuitive to me. (Though I guess it makes length
rather useless for the internals...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is definitely the other option 👍
I think the current system might be more useful as it gives you more options for applying backpressure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely convinced this is problem. I feel there's a whole pile of caveats which we could use to argue against many of the standard verbs when applied to concurrent and/or blocking containers.
I guess we should ask the question:
- What else could
length()
possibly mean, such that it would make "normal generic container code" work with a Channel?
Another random thought — we now have the notion of closewrite()
(currently shutdown()
) which is meant for closing the writer side of full duplex IO streams. closewrite(::Channel)
could also make sense and would allow the iteration operation you're describing to succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be very happy if you can come up with a concise set of precondition, postcondition and invariance of length
that encompasses the definition of length
in this PR. I just think it's very hard if not impossible.
Also, I don't think concurrency is relevant to my point. I tried to work around the specification issue by adding the condition "there are no interference from other tasks" (i.e., no receivers and no new senders). My point is that pre-close
channel does not have a well-defined length
that is compatible with iterate
. So, I don't think closewrite
fixes the issue since the point of length
in this PR (aka "n_avail
") is querying the state of the channel before it's closed (for supporting the backpressure usecase).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's hard, which is why I already reverted to n_avail
yesterday. I assume you noticed this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we take the two cases separately:
- blocking channels: acts as a threading barrier or semaphore, giving the upper bound on the number of items that are currently already blocked waiting to write into this channel
- non-blocking channel: acts as a yield-free predicate, giving the lower bound on the number of items that can be read from the channel without yielding
Perhaps the comparison is that the former condition, in a batch processing system, is relevant for flow-control of writers, and the later is relevant for flow-control of readers? Where the readers/writers want to spend their efforts working on the queue with the greatest demand / most capacity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted to
n_avail
Ah, sorry, I didn't notice.
the former condition, in a batch processing system, is relevant for flow-control of writers, and the later is relevant for flow-control of readers
Hmm... interesting. This feels like an important point. Implementing this idea requires supporting length(c.data)
and n_avail
separately.
@@ -417,9 +425,12 @@ immediately, does not block. | |||
For unbuffered channels returns `true` if there are tasks waiting | |||
on a [`put!`](@ref). | |||
""" | |||
isready(c::Channel) = n_avail(c) > 0 | |||
n_avail(c::Channel) = isbuffered(c) ? length(c.data) : length(c.cond_put.waitq) | |||
isempty(c::Channel) = isbuffered(c) ? isempty(c.data) : isempty(c.cond_put.waitq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think length(c.cond_put.waitq)
should also be a non-data-race-y thing to access. Probably not currently implemented as such, but perhaps ideally would be? (rather tricky with the knowledge that it might get removed from the queue and put in another queue, so perhaps not possible though)
190be09
to
df7634a
Compare
length(::Channel)
and fix data races in isready
/isempty
n_avail(::Channel)
to fix isready
/isempty
Ok, I've reverted the use of Overall I still think it would make sense to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than the naming of the field .length
and the related method.
Actually, never mind. |
4888119
to
ac0a4ae
Compare
c203a3f
to
a38092d
Compare
Ok, I added that suggestion + rebased now. This PR has languished, let's get it merged once CI passes. |
Needs a rebase to fix CI I guess. |
f65f386
to
89e19a9
Compare
89e19a9
to
2c6b3fa
Compare
This removes the data race from isready() and isempty(), which are now implemented in terms of n_avail(). A new atomic `n_avail` field is added to track the "current number of available items" (buffered + waiting tasks). This is separate from the buffer and wait queue because these consist of `Vector`s which cannot easily have their length fields read and written atomically. For buffered channels, the n_avail now includes a count of any waiting tasks in addition to the number of buffered items. This makes it consistent with the computation for unbuffered channels. Co-authored-by: Takafumi Arakaki <[email protected]>
2c6b3fa
to
13f9a5e
Compare
The buildkite failure looks unrelated. Let's merge this. |
…g#41833) This removes the data race from isready() and isempty(), which are now implemented in terms of n_avail(). A new atomic `n_avail` field is added to track the "current number of available items" (buffered + waiting tasks). This is separate from the buffer and wait queue because these consist of `Vector`s which cannot easily have their length fields read and written atomically. For buffered channels, the n_avail now includes a count of any waiting tasks in addition to the number of buffered items. This makes it consistent with the computation for unbuffered channels. Co-authored-by: Takafumi Arakaki <[email protected]>
…g#41833) This removes the data race from isready() and isempty(), which are now implemented in terms of n_avail(). A new atomic `n_avail` field is added to track the "current number of available items" (buffered + waiting tasks). This is separate from the buffer and wait queue because these consist of `Vector`s which cannot easily have their length fields read and written atomically. For buffered channels, the n_avail now includes a count of any waiting tasks in addition to the number of buffered items. This makes it consistent with the computation for unbuffered channels. Co-authored-by: Takafumi Arakaki <[email protected]>
Make it clear that
isready()
is not threadsafe, as it has a data race during reading of the channel buffer or waitq length.Alternatively we could make
isready
threadsafe withlock()
but perhaps this would cause more harm than good? It also wouldn't make uses ofisready()
non-racy without a lock around the rest of the user's code which interacts with the channel. So for now I've just documented the current state of affairs.Also remove an unlocked optimistic use of
isready()
inwait()
to avoid the data race.