-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Soft-deadlock observed on on swap operations #1526
Comments
@ryao You're right the stacks aren't very clear as to what happened here. I actually suspect the |
I just got something similar today when putting my system under severe stress without swap.
I was running That being said, cv_wait_common is declared static, which causes us to see __cv_destroy() in the stacks where we should see cv_wait_common() because the compiler is not emitting symbol information for use by dump_stack(). It seems to do that in the mainline kernel. It is not clear to me what the difference is that causes Linux mainline to have this information and us to lack it at this time. We appear to have something causing us to wait an exorbitant amount of time, but I was not prepared for it, so the partial system freeze until it cleared prevented me from capturing information. I will do what I can to catch it. |
@behlendorf I was able to capture complete stacks today of this phenomena and there is NOTHING out of the ordinary. I forcibly restarted my system. Pool import proceeded to hang until ZIL replay finished. This took a noticeably long amount of time. What appears to be happening is that we have a large amount of data in ZIL, someone calls syncfs() and then everything stops on global ZIL write-out. This is related to issue #2190. I think resolving this requires taking a look at how Illumos does this. Offhand, I know that Illumos has a per-filesystem sync, which could partially explain things. |
Disregard my previous (now deleted) note about stack traces captured using |
This report is too vague to be actionable. I have since corrected stack traces on my system and there are plenty of more actionable reports, so I am going to close this. |
My system hung momentarily a few times before finally hanging for good when doing operations on swap backed by a zvol. I was able to get dmesg output before it hung for good:
The memory allocation in zil_add_block() appears to hang while we have the mutex, which prevents all other zvol threads from making forward progress. This also prevents kswapd from making progress.
This is unusual because swap operations are supposed to be synchronous and therefore, they should bypass the ZIL, but that appears to not be the case. Another unusual aspect of these stack traces is that zrl_owner() is on the stack, but nothing in the code calls it, so it should be deadcode.
Swap operations involve read, write and discard requests, so something appears to be wrong with the backtraces. However, they are fairly consistent. I have reproduced this issue twice. The first time was 3 months ago with Linux 3.6.8 and the second time was today with Linux 3.9.6.
The text was updated successfully, but these errors were encountered: