-
Notifications
You must be signed in to change notification settings - Fork 178
spl_kmem_cache spinning problem (again) combined with txg_sync hung task #247
Comments
This was reported to the ZFS issue tracker in openzfs/zfs#1454. Your back trace is much better than the one in that report. It should be easier to identify the cause of this bug using it. That report inspired me to write ryao/spl@5717902. The commit message describes an additional issue that could cause what you have observed, but it would be hard to speculate on what is happening on your system without first testing this patch. Would you test my patch and report whether it had any effect? Also, would you elaborate more on your system configuration? Specifically, where is the rootfs stored, is there swap and if you have swap, is it on a zvol? |
I'll give it a try. It sometimes takes a while (days) for it to happen. I'll let you know if it happens again with this patch. Just to be sure I got the right thing, it is just editing one line and adding __GFP_IO | in? To answer your questions, if I understand your rootfs question correctly, the OS is installed on a pair of mirrored (md) 120 GB Intel 520 SSD drives. The root file system is ext4. There is 4 GB of swap space but it is not on a zvol. It is also on the mirrored SSD's. Also, at least right now it is showing that no Swap is being used which isn't surprising with this much RAM. Other information: It is a Supermicro system with two Xeon E5-2620 CPU's (HT is on) and 128 GB of RAM. Thanks. Let me know if you need anything else. Steve |
That patch is a 1-line change and from what you told me, I think it is unlikely to resolve your problem. Feedback from testing would still be slightly useful, although it is probably not necessary. A fix for the other issue that I spotted would have a better chance of fixing your issue. I will post an update when I have it, although I do not expect to find time in the next few days. |
OK. I'll leave it as is then and wait for you're next fix. Thanks very much. Steve |
This system is now having some hardware problems that I am diagnosing with the vendor. I believe one of the HBA's is bad. I'll get back to you on this once that is resolved. |
The deadman thread that was added by openzfs/zfs@cc92e9d to zfsonlinux/zfs head would force a reboot if one of the controllers stopped responding. This is not in 0.6.1 and conceivably, it is possible for things to start spinning indefinitely if a controller stops working. However, it is not clear to me how the spinning would happen without having the pool stop responding to IO. |
I did a couple of sizable rsyncs and all seemed ok but now I see that kswapd0 and kswap1 are at 100% (the system is otherwise idle): top - 14:36:10 up 2 days, 19:21, 2 users, load average: 2.00, 2.00, 2.00 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND Now for some reason my shell is also using 100% of one core: top - 17:44:46 up 2 days, 22:29, 4 users, load average: 3.01, 3.02, 3.04 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND Looking at perf top I see: Samples: 26K of event 'cycles', Event count (approx.): 14163818214 Does this relate to the spl_kmem_cache spinning problem at all? I see no entries in dmesg or /var/log/messages at all pertaining to much of anything (a couple of packages added and an attempt at clearing the kswapd spinning by turning off swap and then back on. Any ideas or things to check? Thanks, Steve |
Any closer on this? It has happened again. spl_kmem_cache at up around 100% top shows: top - 10:10:46 up 2 days, 14:38, 3 users, load average: 2.04, 6.57, 10.98 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND Looking at each core (HT is active) I see: Cpu0 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st perf top shows (is this helpful?) : This happened while rsyncing again. The directory tree that it was working on has a lot of files in one of its directories. 285224 files for a total of about 250 GB. rsync was building the file list of files to copy when this spinning occurred. Any help would be greatly appreciated. Thanks, Steve |
@cousins The first issue you reported about the pool hang would be completely explained by the bad HBA. It was clearly waiting on IO. The second (different) issue regarding the spinning shows that ZFS is trying very hard to reclaim memory. When it's in this state could you issue a |
Hi Brian, After going over the hardware in detail and finally finding a flaky SAS cable connection we have been up and running for about four weeks without any hardware issues and other than the performance probem you're helping with (du performance on TB's of data with millions of files) it has been very solid. Until today. Last night I had the two rsyncs running and then added another smaller rsync from another server. All of these were going to /pool2. At the same time I started deleting files on another pool: /pool2/asl which was set up to test the hardware. This morning I found spl_kmem_cache spinning and the load up at 48.00. I can login but I can't do anything zfs related. df shows the volumes ok (it doesn't hank) but I can't cd to them. zpool status hangs. In /var/log/messages it shows that it happened shortly after midnight last night (trace is below). I don't see any messages about hardware problems. multipath -l shows all 81 drives active and all looks good in that regard. Let me know if you need any more information. I'll leave it in this state for a while in case you need more information. Here is the trace: Jul 18 00:23:51 nfs1 kernel: INFO: task z_wr_int/0:10333 blocked for more than 600 seconds. Thanks for your help. Steve |
FWIW, here is what "perf top" shows: Samples: 212K of event 'cycles', Event count (approx.): 35451406588 and just plain top: top - 13:01:24 up 26 days, 21:38, 8 users, load average: 51.01, 51.02, 51.00 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND |
@cousins Go ahead and reboot the machine, and thanks for the additional debugging. As it happens this looks like exactly the issue I'm working on right now. Others have reported the same problem occurring after many weeks of uptime. This is one of the few issues still blocking the 0.6.2 release. |
HI Brian, OK. Thanks. Any rough estimate on how long this might take to fix? Steve |
Something somewhat similar seems to have happened today. Uptime of only 17 hours. Nothing seems to have crashed yet (no backtrace in logs or console) but the zfs volume has hung and kswapd1, kswapd0 and spl_kmem_cache are all at 100% along with one rsync process and makewhatis (started by cron). The load is up around 10.0 and I can't do anything with that zfs volume however, zpool status does respond and shows everything in good shape. What I did after starting the system up again was:
and then two rsyncs to the pool2 pool. I have another smaller system (20 TB with a 13 drive raidz2 pool plus a spare) that hasn't had any of this problem. It is on Ubuntu 12.04.2 LTS with the 3.2.0-29-generic kernel. It is at zfs/spl 0.6.1 The larger system (that I'm having trouble with) is on CentOS 6.4 with the 2.6.32-358.11.1.el6.x86_64 kernel. Are there kernel related issues too? As I am using this more more and more commands are locking up. My guess is that I'll get a trace soon ... sure enough I just checked the console and got: nfs1.localdomain login: INFO: task khugepaged:261 blocked for more than 600 sec. Any ideas? Thanks, Steve |
@behlendorf Just checking to see what the status is. Any idea on when a fix may be out? I have reduced zfs_arc_max to 30 GB which seems to be helping in general but I don't know if it will do anything as far as this long-term problem goes. This system has 128 GB of RAM. Would I be better off by setting xfs_arc_max to something even lower or would it potentially help performance if I tried to increase it a bit. The default value seemed to play a part in making the system unstable. Thanks, Steve |
Several fixes were merged in too master just in the last few days which may help. Can you pull the latest SPL/ZFS source and use that. I've also heard reports of issues with transparent huge pages with RHEL6.3. |
Thanks @behlendorf and @ryao . I have gotten installed latest code and it is running now. We'll see how it does over the weekend and (hopefully) beyond. Have a great weekend. Steve |
Since we haven't heard anything back I'm going to assume things are better. We'll open a new issue if that's not the case. |
I have 0.6.1 running on a CentOS 6.4 system. I have a large pool made up of 60 4 TB drives in six raidz2 sets of 10 drives. To create this I did:
zpool create pool2 -f -o ashift=12 raidz2 dev1 dev2 dev3 ... dev10 raidz2 dev11 dev12 ... dev20 raidz2 ...
It has worked fine mostly but once in a while when I rsync to it it will hang and the spl_kmem_cache process will be at 100%. dmesg shows a series of:
INFO: task txg_sync:790 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync D 0000000000000010 0 790 2 0x00000080
ffff88201e751b60 0000000000000046 ffff88201e751b20 ffffffff8106317e
ffff88201e751b90 ffff882000000000 00000000fee6cd10 0000000000000001
ffff88201f5465f8 ffff88201e751fd8 000000000000fb88 ffff88201f5465f8
Call Trace:
[] ? try_to_wake_up+0x24e/0x3e0
[] ? ktime_get_ts+0xa9/0xe0
[] io_schedule+0x73/0xc0
[] cv_wait_common+0xac/0x1c0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait_io+0x18/0x20 [spl]
[] zio_wait+0xfb/0x190 [zfs]
[] dsl_pool_sync+0x2f5/0x540 [zfs]
[] spa_sync+0x39e/0xa00 [zfs]
[] ? read_tsc+0x9/0x20
[] txg_sync_thread+0x307/0x590 [zfs]
[] ? set_user_nice+0xc9/0x130
[] ? txg_sync_thread+0x0/0x590 [zfs]
[] thread_generic_wrapper+0x68/0x80 [spl]
[] ? thread_generic_wrapper+0x0/0x80 [spl]
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
I have searched through other spl_kmem_cache issues on this list but I haven't seen any that match this so I wanted to report it. If I can do anything to help resolve this I will.
Thanks.
Steve
The text was updated successfully, but these errors were encountered: