-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
txg_sync hung task and deadlock when trying to shrink ARC #4319
Comments
@perfinion it appears that the I/O pipeline is stalled waiting on a kernel thread to start in order to handle an outstanding I/O/. Presumably because you've done an excellent job exhausting memory on your system, although that's not 100% clear from the stack traces. Here are the key bits:
If you still have the system in this start it would be useful to see what Setting |
I have the VM unchanged so I can get any more info that would help. Once it locks up the machine is completely unusable, not only things that hit disk, but absolutely everything hangs and then the hung tasks panic reboots the machine. I am also attaching the full dmesg output from the hung_tasks_timeout because it has a z_* thread too. The earlier stacks were mostly from sysrq+W to print stacks for all processes in the D state. In future I could do sysrq+t to dump everything if that helps. Other than magic sysreq, I can only really do things from outside of qemu cuz nothing inside works once it starts dying. I will try and trigger it later with spl_taskq_thread_dynamic=0 to verify that fixes it. right now I have absolutely nothing set so everything is default. kthreadd's stacktrace:
This is the full output from the point when the hung_tasks_timeout kicks in:
|
According to gdb, Looks like they are stuck trying to allocate memory. I assume the OOM killer should kick in to free up memory but it isnt. When I try to trigger the hang, about half the time the OOM killer will kill my eatram processes and things will immediately return to normal so it takes a few tries. |
It seems kthreadd doing direct reclaim into zfs causing deadlock. Edit: |
Yes indeed, now what if anything can we do about it. |
I just tried with spl_taskq_thread_dynamic=0, and was still able to hit this deadlock unfortunately :(. I dumped the stacks with sysrq+w and t. Would it be useful to see the stacktraces for the taskq_dynamic=0 case as well? I can post them if they help, if not they are quite long so dont want to spam the bug. |
@perfinion yes, I'd definitely like to see the stacks please add them to the issue. It's going to have to be something related too but slightly different from the previous deadlock which depended on the dynamic taskq behavior to deadlock. |
@behlendorf this is the entire output, first sysctl+w, then t then the hung tasks panic stuff
|
@behlendorf This is all the output, I didnt realize i could attach files instead. |
It looks like disabling the dynamic taskqs solved the initial problem as hoped. However, you quickly uncovered another way to get the |
@behlendorf |
@behlendorf |
@perfinion |
@tuxoko you raise a good point about the atime update, this is something we inherited from Illumos and may not need at all on Linux. For Linux inode updates are supposed to be written through the There's an excellent chance that this atime update is redundant and can be disabled. That would be nice since it would not only help performance by remove one of the most likely deadlock locations. |
atime updating is really a mess. We have 3 places for atime: inode->i_atime, znode->z_atime and SA. |
@behlendorf @tuxoko would mounting with noatime or lazyatime completely remove all the atime paths? if it does, i'll see if i can trigger it again then we know thats a good way to proceed and could work on removing the atime updates from zinactive. I'll also test out the ABD patches soon when I get time and see how things go. |
@tuxoko for a long time now I've wanted to remove the atime from the znode along with all those other redundant fields. The values in the inode are always the authoritative values as far as the VFS is concerned. Take a look at the comment above Relatime should work if you set the property on the dataset, it may not honor the mount option though. @perfinion setting relatime won't remove all the possible deadlock paths but it may help. It would be an interesting thing to test.
|
@behlendorf
Also Now, if you do stat(2), |
turns out the original tests were with relatime on. I did atime=off relatime=off now and got a hang too. @tuxoko @behlendorf I tried ABD and also got the deadlock. Both these tests were with dynamic taskq's disabled. Interestingly, without abd when i first run a few eatram processes, the OOM killer gets them and it is a little more difficult to hit the deadlock. With ABD I hit the deadlock pretty fast. Either it was a fluke and I am just practiced at hitting it or allocating in a different way means its easier to hit? I am not so sure anymore if atimes are as important, initially I could trigger with just reads if I was lucky but with taskq_dynamic=0, having a lot of writes makes hitting it a lot easier. I am attaching both sets of stacktraces. |
One thing that might help is to set |
I tested this again a few weeks ago at commit 5c27b29 and it was very easy to trigger still, then just tested at commit dfbc863 and could no longer trigger this. It would slow down a lot when i ran a lot of things but it would always recover after a few seconds which is great! I have stacktraces from these tests which I can upload if they are helpful for verifying. I was also looking through before and if I understand it correctly, it seems like dmu_tx_assign is designed to fail and deals with it correctly but txg_wait_open can never exit and dmu_tx_wait returns void too. |
@perfinion The most interesting commits in the range you mention are ddae16a and 31b6111 assuming you're using a lot of xaatrs. It would be interesting to see the arcstats after running your memory-hogging program. Speaking of which, given the test system only has 1GiB of RAM, how much memory is the "a lot of ram" it's |
@dweeezil yeah I suspect those are the commits that helped the most but have not tested specifically around them. do you want me to?
After a long time, the programs finish and the system hasnt locked up. restorecon re-sets SELinux labels to it touches all the xattrs. here is arcstats:
here is eatram.c, I needed to mlockall and actually write to all the ram otherwise the overcommit kept giving more ram.
|
I have intermittently been hitting a deadlock and finally setup a VM to reproduce it and get real stacktraces. This is done on master for SPL and ZFS over the weekend so it is completely up to date. The VM has 1GB ram and the root drive is on ext4 and set up a mirror zpool on two 1GB disks and a couple random datasets in it. I then rsync'd the portage tree (lots of little files) into /tank/.
The rough steps to trigger it are
I have been following #4106 and #4166 but am still hitting this :(.
I have the gentoo hardened 4.3.3-r4 kernel but I dont think the kernel version matters since I have been hitting it for a while.
Most of the other processes are like this (kthreadd, init, python, find, htop ...):
How can I help debugging?
-- Jason
The text was updated successfully, but these errors were encountered: