Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Userland should busy-wait briefly before using usleep() #4424

Closed
wants to merge 1 commit into from

Conversation

ryao
Copy link
Contributor

@ryao ryao commented Mar 16, 2016

The userland code uses the do {usleep()} while (unavaliable) pattern
for closing races in asynchronous device initialization. However, the
normal case is that calling usleep() introduces an unnecessarily long
delay because the other side typically finishes what it was doing much
faster than the period in which we sleep. The periods in which we
usleep() are short, but they can add to a significant period of time
when scripts trigger the codepaths using usleep() in a loop.

libzfs_load_module uses the alternative pattern of do {sched_yield()} while (unavaliable) for 10ms before falling back to the do {usleep()} while (unavaliable) pattern, which is much better for latencies. As a
general principle, I would like to encourage contributors to avoid the
do {usleep()} while (unavaliable) pattern for closing asynchronous
races. Modifying the places that use it to do what libzfs_load_module
does should achieve that.

There are some minor behavior changes worth noting.

One is that reusing the exact code structure from libzfs_load_module
meant that zpool_label_disk_wait() will return the error code passed
by stat64(). I had considered altering it to maintaining the current
error behavior, but after looking at its consumers, I felt that they
would benefit from a more accurate error code.

Another is that while I used the same timeouts, time spent in the
kernel no longer can extend the timeouts, so the maximum timeouts should
be somewhat shorter. That should be more noticeable on a heavily loaded
system. One might consider the busy-wait to be a cause for concern
there, but the use of sched_yield() should lower the number of
iterations in which we busy-wait before falling back to usleep().

The last is that I made a slight improvement to the original algorithm
where if we would timeout after returning from usleep(), we will now
skip the call and timeout early.

Signed-off-by: Richard Yao [email protected]

@ryao ryao force-pushed the userland_timeout branch from 77ebd85 to 7032a3b Compare March 16, 2016 19:21
@ryao ryao changed the title Userland should usleep() as a fallback from busywaiting Userland should busy-wait briefly before using usleep() Mar 16, 2016
@ryao ryao force-pushed the userland_timeout branch 5 times, most recently from 45bb07f to 3cd86f2 Compare March 16, 2016 19:52
@ryao
Copy link
Contributor Author

ryao commented Mar 16, 2016

This functionality probably should be made a library function somewhere, but I am not sure where to put it. Maybe lib/libuutil/?

@ryao ryao force-pushed the userland_timeout branch from 3cd86f2 to 8b9bcc2 Compare March 16, 2016 20:00
The userland code uses the `do {usleep()} while (unavaliable)` pattern
for closing races in asynchronous device initialization. However, the
normal case is that calling `usleep()` introduces an unnecessarily long
delay because the other side typically finishes what it was doing *much*
faster than the period in which we sleep. The periods in which we
`usleep()` are short, but they can add to a significant period of time
when scripts trigger the codepaths using `usleep()` in a loop.

`libzfs_load_module` uses the alternative pattern of `do {sched_yield()}
while (unavaliable)` for 10ms before falling back to the `do {usleep()}
while (unavaliable)` pattern, which is much better for latencies. As a
general principle, I would like to encourage contributors to avoid the
`do {usleep()} while (unavaliable)` pattern for closing asynchronous
races. Modifying the places that use it to do what `libzfs_load_module`
does should achieve that.

There are some minor behavior changes worth noting.

One is that reusing the exact code structure from `libzfs_load_module`
meant that `zpool_label_disk_wait()` will return the error code passed
by `stat64()`. I had considered altering it to maintaining the current
error behavior, but after looking at its consumers, I felt that they
would benefit from a more accurate error code.

Another is that while I used the same timeouts, time spent in the
kernel no longer can extend the timeouts, so the maximum timeouts should
be somewhat shorter. That should be more noticeable on a heavily loaded
system. One might consider the busy-wait to be a cause for concern
there, but the use of `sched_yield()` should lower the number of
iterations in which we busy-wait before falling back to `usleep()`.

The last is that I made a slight improvement to the original algorithm
where if we would timeout after returning from `usleep()`, we will now
skip the call and timeout early.

Signed-off-by: Richard Yao <[email protected]>
@ryao ryao force-pushed the userland_timeout branch from 8b9bcc2 to 6373f1a Compare March 16, 2016 20:08
@behlendorf
Copy link
Contributor

I was going to suggest the same thing since this is I believe the fourth place where we've needed to do this. The zpool_label_disk_wait() function is effectively already this library function it's not just not easily available everywhere. I'm reluctant to add it to libuutil since I believe we mostly only depend on that for uu_avl_* and uu_list_* functions. Plus libefi where we'd need it doesn't already depend on libuutil and we should avoid adding dependencies. Moving it to libspl might be the best place even if it doesn't exactly fit with the other compatibility functions.

@behlendorf
Copy link
Contributor

@ryao this functionality can be moved to a library function at a some point in the future. If your happy with the current patch I'll get it applied.

@behlendorf
Copy link
Contributor

Closing is favor of the patches in #4523.

@behlendorf behlendorf closed this Apr 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants