You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Version
All supported versions as of the creation of this issue.
Background
I/O resources must handle a concurrent update of readiness while an I/O operation is in progress. An I/O operation looks like this:
loop {
let readiness = io.await_readiness(interest).await;
let res = io.perform_io_op()?;
if let Ready(ret) = res {
return res;
}
io.clear_readiness(readiness);
// repeat
}
A potential race condition arises if the I/O driver receives a readiness event and updates the I/O resource concurrently while the loop above is between the perform_io_op() and clear_readiness calls. A naive implementation would clear the readiness set by the I/O driver.
To prevent this issue, today, Tokio tracks an I/O driver "tick." A number that increments each time the I/O driver is polled. When the I/O driver updates an I/O resource's readiness, it includes the tick at which the event was received. The readiness value returned by await_readiness includes the tick, and clear_readiness only clears the readiness if the I/O resource's tick matches the one included as an argument. If the ticks don't match, then the I/O has more recent readiness and should not be cleared.
However, due to atomic space limitations in the I/O resource, we don't store the full tick value. Instead, we truncate it to u8 and store that. This should be sufficient as the tick is only updated after a full "reactor loop" which includes many syscalls etc... and the pseudocode loop above is very tight. It is unlikely that a full 255 reactor loops will happen in the critical "clear readiness" critical path.
Issue
The problem, however, is that if a task that holds an I/O readiness last updated on a tick (let's say tick 0) is idle. At the same time, the I/O driver does 255 cycles, and then the task happens to wake up for an unrelated reason and starts the loop above. In the critical section, the I/O driver happens to update the I/O resource with tick 256, which, cast to u8 will wrap to 0, then the update will get lost due to the same reason described in the background section above.
Note that this issue is not fixed by increasing the tick resolution stored in an I/O resource (to u16 or u32), it would just make it less likely to happen.
The text was updated successfully, but these errors were encountered:
Version
All supported versions as of the creation of this issue.
Background
I/O resources must handle a concurrent update of readiness while an I/O operation is in progress. An I/O operation looks like this:
A potential race condition arises if the I/O driver receives a readiness event and updates the I/O resource concurrently while the loop above is between the
perform_io_op()
andclear_readiness
calls. A naive implementation would clear the readiness set by the I/O driver.To prevent this issue, today, Tokio tracks an I/O driver "tick." A number that increments each time the I/O driver is polled. When the I/O driver updates an I/O resource's readiness, it includes the tick at which the event was received. The
readiness
value returned byawait_readiness
includes the tick, andclear_readiness
only clears the readiness if the I/O resource's tick matches the one included as an argument. If the ticks don't match, then the I/O has more recent readiness and should not be cleared.However, due to atomic space limitations in the I/O resource, we don't store the full tick value. Instead, we truncate it to
u8
and store that. This should be sufficient as the tick is only updated after a full "reactor loop" which includes many syscalls etc... and the pseudocode loop above is very tight. It is unlikely that a full 255 reactor loops will happen in the critical "clear readiness" critical path.Issue
The problem, however, is that if a task that holds an I/O readiness last updated on a tick (let's say tick 0) is idle. At the same time, the I/O driver does 255 cycles, and then the task happens to wake up for an unrelated reason and starts the loop above. In the critical section, the I/O driver happens to update the I/O resource with tick 256, which, cast to
u8
will wrap to 0, then the update will get lost due to the same reason described in the background section above.Note that this issue is not fixed by increasing the tick resolution stored in an I/O resource (to u16 or u32), it would just make it less likely to happen.
The text was updated successfully, but these errors were encountered: