-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delay injection can cause indefinitely hung zios #8404
Conversation
@tonyhutter I noticed that you ported this feature from Illumos - I was wondering if you had thoughts on the use of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find, thanks!
If we hit the (NSEC_TO_TICK(diff) == 0) condition in zio_delay_interrupt, zio_interrupt is never called and the zio does not progress. Signed-off-by: sara hartse <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #8404 +/- ##
==========================================
- Coverage 78.57% 78.56% -0.02%
==========================================
Files 380 380
Lines 115839 115840 +1
==========================================
- Hits 91019 91007 -12
- Misses 24820 24833 +13
Continue to review full report at Codecov.
|
I've gone ahead and merged this to resolve the outstanding issue. Let's tackle possible improvements in a separate PR. |
Motivation and Context
I ran into this while doing some performance testing using
zinject
- sometimes azio
would just never complete.Looking at how the delay injection is implemented we can see that if we hit the
(NSEC_TO_TICK(diff) == 0)
condition inzio_delay_interrupt
(i.e. there's very little left to delay, so just spin) then after waiting the function returns immediately andzio_interrupt
is never called. Therefore, thezio
cannot progress.Description
I added a call to
zio_interrupt
afterzfs_sleep_until
completes.How Has This Been Tested?
I re-ran my performance tests on the VM that had been consistently hitting this issue and was able to complete them without incident. I verifying that I was hitting condition in question by adding some temporary logging.
Types of changes
Checklist:
Signed-off-by
.