-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
yay more zfs deadlock #1415
Comments
I must admit I feel a bit the same. I do not use rsync a lot on my zfs system, but tgtd to export zvols to run KVM machines, and I do have a lot of performance issues, since RC14 at least (Maybe it's a misconfiguration or hardware problem, don't know yet, but it's really painful.) David |
@spacelama I suspect that you are affected by the bug in issue #1342. It is a regression that was started in Linux 3.6. The current solutions are to patch the kernel or use an older kernel. If you decide to patch the kernel, there is a patch available from the Gentoo bug tracker: https://bugs.gentoo.org/show_bug.cgi?id=462066 People on recent kernels have reported plenty of issues that appear to be unrelated to ZFS, so this patch alone might not be sufficient to have a good experience. My recommendation is to run only LTS kernels, such as Linux 2.6.32.y, Linux 3.2.y or Linux 3.4.y, on production systems. This applies regardless of whether or not a system is using ZFS. |
Nope, this or a bug very similar affects debian wheezy 3.2 kernels. I will try to get a stacktrace next time it happens while I'm home and before the hardware watchdog kicks in. |
Commit 294f680 which was recently merged in to master may address this deadlock. |
I see shrink_slab in the backtraces. #1627 could help with that. |
@spacelama Any update? Have recent changes, such as those mentioned above improved things? |
Closing issue, the patches referenced above should have improved things and I haven't heard otherwise. |
I'm getting regular lockups trying to rsync to a filesystem with the current nightly beta. |
Ok. I've reopened the issue. |
@jwiegley -- We experienced lockups during rsyncs that were eventually avoided by dialing back the (originally dialed up as the box's only usage is storage #1365 (comment)) zfs_arc_max. Fragmentation in the cache leads to higher total memory usage. Are you adjusting any of the zfs tunables? What does your memory situation look like pre-lockup? |
Let me describe my scenario in a bit more detail: I've got a home-built file server here, with 16GB of ECC RAM and a 12 drives in a RAID-10 array. The OS is Ubuntu 12.04.3, installed on a USB flash drive. I use git-annex heavily for maintaining my archives, and one thing that it does is create a LOT of files (it's the biggest inode consumer I've ever seen). These all get populated within the I have seen it happen, maybe 4-5 times in the past, that trying to run I worked around this by destroying the filesystem (which had no snapshots, btw) and recreating it from a backup copy. The reason I don't have the tarpit filesystem anymore, but I wonder if this could be reproduce on your side by just artificially manufacturing maybe a million files in, say, 2000 directories at maximum depth 4, and then trying to delete them all at once. Lastly, I have not modified any ZFS variables in the kernel; I don't have an L2ARC or a ZIL device. Pre-lockup I don't know what the memory situation was. |
Regarding the soft lockup problem: I noticed frequent occurrences with zfs on linux 0.6.2 on |
On Sat, 16 Nov 2013, wphilips wrote:
As the original reporter of this copy of this bug, I note that I've had no What I still get is some slowdowns from when linux's cache is reduced down module options current: Tim Connors |
This has happened to me again. While trying to rsync from my Mac Pro to my ZOL mount, it started giving me a bunch of errors like:
Then the filesystem completely locked up in the kernel, nothing can be done with it. After a hard reset, any attempt to mount this filesystem results in a deadlock, necessitating a reset. Is there a way to "clear" this filesystem so that I can mount it again? I don't think the rsync even made any changes yet, it was just trying to match permissions between files at this point. |
I've been following this thread since it was originally reported and have been trying to make some sense of it. At this point, it seems likely that the problems being reported recently are not the same as @spacelama's original issue. His original problem report was seen while using a 0.6.0.97-0ubuntu (very old) build which appears to me to correspond to 0.6.0-rc14 (a997dfb) in the main repository. At this point, with hopefully most of the major 0.6.2 stuff in the tree, it's not clear to me how relevant his original report is nor is it clear whether he's still seeing the problem with 0.6.1 or with a recent daily build. This issue was closed but was then re-opened following @jwiegley's report, but that report didn't seem to include enough details to relate it to the original issue. It's also not clear that @eborisch's or @wphilips' problems are the same thing (as was originally reported in this issue). @spacelama Are you still seeing your original problem with recent code? If not, then it I would suggest that this issue be closed. As to the others who reported various lockups, it seems a common thread is massive metadata-walking in the form of an rsync receive to ZFS. There's good evidence that under some circumstances, the metadata can overwhelm the ARC and can far exceed the arc_meta_limit. I'd urge anyone who is having lockups or other problems that are triggered when performing an rsync receive to include the contents of their arcstats kstat file, their kernel version and their ZoL version in a new trouble report. I've personally seen this problem but it seems to be difficult to reproduce. It may be dependent on kernel version or on other factors. I'd also suggest that anyone having a related problem try recent ZoL code and if they're having a problem with their metadata size to refer to #1871 or one of the other more relevant issues. If the lockups being reported by others are not metadata-related, and are on a substantially newer version than the 0.6.0-rc14 (I would hope so), I'd suggest opening a new issue. |
I did #1901. I'll be sure to grab kstat next time. I seem to be able to reproduce a hang (but it takes a few hours of letting it run.) I'll try again tomorrow. |
And yes, this is an rsync destination. The lock-ups I currently am getting are during scrubs or zfs sends. Failures during rsync haven't happened since I removed bumping the zfs_meta_max tunable. |
Last night we've encountered, I think, the same problem. Running Ubuntu 12.04.3 on usb stick with kernel 3.5.0-42-generic and ZFS 0.6.2-1 (Ubuntu launchpad ZOL repo) on a dual quad core intel, 12G mem and a zpool with two 6 disk raidz1's (total ~20T). Machine is used for backups with heavy rsync and samba load during the night. The smb is high io (VMWare VeeAM backup target). Almost all mem is in use, not bij processes, cache is exhausted: rsyslog was killed by the oom-killer during the backup window. Following is the arcstats from kstat: 4 1 0x01 84 4032 13734614807 878327691108390 System is up for 10 days. During the day since startup a lot of zfs recevies were done via netcat to migrate data from another (older) zfs box to this one. I hope this helps you guys finding the source of the problem. I will reboot the box in a couple of hours before the upcoming backupwindow. |
Any suggestion on how to unbork a filesystem that will no longer mount after experiencing this problem? I'd like to get some of the data back out of it. |
@jwiegley How long have you left it trying to mount? Especially if your removed files had xattrs, and your zfs isn't set to xattr=sa, you could find you're simply not giving it enough time (due to #457) to work through it's 'deleted' queue (which is processed on mount: zfs_domount->zfs_sb_setup->zfs_unlinked_drain). E.g. I've had systems in this state for 10h+ (!!!) before finally mounting. |
@jwiegley What happens when you try to mount the file system? Does the pool import? Does the problem only occur when trying to mount the file system? You certainly could be facing a large delete queue. Actually, these days, the files-with-xattr-doesn't-free-space problem is likely #1548. You can confirm whether your file systems have a large delete queue with zdb prior to importing. The delete queue is typically object 3 but you can figure out which object it is in a particular file system with |
The pool seems to import fine, and if I |
OK. A huge delete queue does sound like it may be your problem if it imports OK but sits for a very long time when trying to mount it. Again, see #1548 for details. Filesystems in which most (all) files have xattrs can be deadly unless you're using xattr=sa. |
@smitb Quick summary of your ARC: Your arc_max is about 6GiB but its size is only about 3GiB. The metadata, however, is taking up about 1.5GiB of it and that's also the metadata limit. You might want to increase zfs_arc_meta_limit. I don't think either of you, @jwiegley or @smitb are having problems that are related the originally reported issue #1415. |
@dweeezil There is more! :-) We also experience lockups, some short (a few secs), some longer that get logged to syslog. It started to happen when were at ~ 80% of a migration to this box (from another box with zfs send/receive). The first event happened on november 22nd. No zfs send/receives were running, a couple of rsync's and sftp were: Nov 22 02:07:42 [] kernel: [488337.747821] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. We've rebooted the box yesterday afternoon. No hung messages in syslog last night, but the box is experiencing short hangs of 3 - 8 seconds. During these short periods the machine is unresponsive. zpool status:
errors: No known data errors zpool list: We use Seagate 2T enterprise disks connected to a 3ware 9750 controller. The disks are configured as 12 separate units with a single disk in it. This configuration has the benefit that the on card bbu-ed raid cache is still used. I will tune the zfs_arc_meta_limit and will reboot the box in about 4 hours. (But this won't be the culprit as you said) If I can assist by supplying more information, please ask. Regards, Bas |
@dweeezil Here is the zdb dump of my delete queue on the affected filesystem:
|
@jwiegley Your delete queue is empty. My next suggestion would be to try mounting the file systems individually rather than using |
@dweeezil I've tried both of the Also, this issue of filesystems becoming unmountable is far too easy to trigger. I just had it happen again by running "chown btsync .*" in a nearly unpopulated filesystem. Now I have two filesystems that I cannot access anymore. Is there a way to "reset" the filesystem so that I can mount it and recover the data? |
@behlendorf I believe you're correct, no objection on my part. |
No objection. |
Er, there's been a heck of a lot of scope creep since my original submission (why? Someone made a huge mental leap up there somewhere). I can assure you this bug as originally reported on my systems has absolutely nothing to do with mounting, so the bug issue 1415 shouldn't be closed since it still exists and has nothing to do with issue 1911. I have actively tried to not trigger my bug in quite a few months (I have deliberately changed my workload to deal with more data and less metadata), but I know it still exists, because 0.6.2 doesn't have any of the metadata fixes that have been talked about on the mailing list. Once 0.6.3 or whatever with the fixes comes out (I couldn't convince zfs/spl HEAD to build dkms deb images, otherwise I'd test sooner), I will try to reproduce the original bug again. |
I forgot to mention, I tried 0.6.2 with my old workload, and yes, it was still crashing every week or so. It is my changed workload that renders it more stable. I'll try with the old workload when I have the metadata fixes that are apparently in HEAD. |
@spacelama sorry about your thread getting hijacked. Let me circle back to the original issue you reported with rsync. On Friday I merged some significant improvements for the ARC which are designed to improve metadata heavy workloads like yours. Now they don't directly address the lockup you described but they may make it far less likely. We've seen a dramatic reduction is ARC misses for workloads similar to yours. If it's not to inconvenient it would be very useful to see what impact they have on your systems. It would be ideal if you could gather some timing data from before and after you update so we can measure the impact. Another easy metric to grab would be from the |
I'll have to report back after my 2 weeks holiday - right now, a Tim Connors |
Uptime 25 days, and have been using the old backup scheme for that entire time that always up until 0.6.2 caused deadlocks after a week or two. I don't interact much with the server on the commandline, but I've never noticed it become debilitatingly slow since the upgrade to HEAD. Never seems to have overly used RAM, linux's cache is allowed to be larger than I've seen it in ages (150MB cached which is probably pretty much the entire working set of my ext4 root filesystem). Looking good! arcstats, if it helps:
|
Er, and since git versions aren't useful by themselves when merging private branches: zfs> git log --pretty=oneline | head |
Thanks for the update. That's encouraging. |
44 days uptime, back to a load of 30 and minute long stalls for all local and NFS serving traffic. The kernel's cache is dropping down to 8MB(!) usage out of 4GB in the machine, which is what stalls everything else on the machine - it's got to serve / and my Maildirs out of that tiny cache. After a round of drop_caches and leaving it to continue with backups for a few minutes, this is arcstats:
780MB free in the machine, I was able to do a dummy malloc/zero of 512MB dropping free memory down to 390MB and didn't stress the machine in the process, so don't know why the kernel would be deciding it can't cache anything. Machine hasn't locked up yet, but I'll probably reboot within a few days. |
Run |
On Sun, 13 Apr 2014, Richard Yao wrote:
Machine is currently back to coping with the load - the load of 6 is sync hasn't fixed it in the past (was doing it with the drop_cache But a few days ago when it started having troubles and continuuing Tim Connors |
These changes have been merged and we have some additional improvements planned. I'm closing this issue since it sounds like things have improved. We can definitely open a new issue if there is still and issue. |
(updated with kernel and zfs version info - dangers of composing something late at night)
About once a week for the past year, on average, ZFS locks up on me - usually rsync related (only ever a maximum of 2 running, talking potentially from a single disk to a 3 disk raidz), on a filesystem that has 15M inodes (mostly hardlinks with perhaps an average link count of 16, most not touched in the working set), and always running something near git HEAD. Finally got a trace because my home directory was still writable, dmesg was still working (can't log in as root though since /var is on ZFS and su/sudo/ssh all needs to acccess stuff in /var) and a watchdog had not reset the machine yet.
Yes, I've had all the symptoms everyone else has mentioned over the years - arc climbing well in excess of arc_max etc etc.
I'm currently using on a box with 4G of ram and 4 atom cores:
options zfs zfs_arc_max=1310720000 zfs_arc_meta_limit=1048576000
but of course I've had lockups on any sensible variation (and default) of those params.
Currently running debian kernel and zfs-dkms 0.6.0.97-0ubuntu1~oneiric1 from zfs-native-stable ubuntu PPA:
but of course these lockups have happened on every kernel I've ever tried since 2.6 days and from zfs-rc8 or so.
[300120.936061] INFO: task kswapd0:39 blocked for more than 120 seconds.
[300120.936070] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[300120.936076] kswapd0 D ffff88013fd93b80 0 39 2 0x00000000
[300120.936085] ffff88013afd71c0 0000000000000046 0000000000013b80 ffff88013aea41c0
[300120.936096] 0000000000013b80 ffff880137a9ffd8 ffff880137a9ffd8 ffff88013afd71c0
[300120.936105] ffff88013afd71c0 ffff8801248c8fb8 ffff88013afd71c0 ffff8801248c8fbc
[300120.936114] Call Trace:
[300120.936133] [] ? __mutex_lock_common.isra.6+0x149/0x165
[300120.936142] [] ? mutex_lock+0x1a/0x2d
[300120.936150] [] ? mutex_lock+0xd/0x2d
[300120.936256] [] ? zfs_zinactive+0x4a/0xc6 [zfs]
[300120.936341] [] ? zfs_inactive+0x103/0x13c [zfs]
[300120.936352] [] ? evict+0xa3/0x153
[300120.936359] [] ? dispose_list+0x27/0x31
[300120.936367] [] ? prune_icache_sb+0x2aa/0x2c5
[300120.936376] [] ? prune_super+0xc3/0x12d
[300120.936385] [] ? shrink_slab+0x19d/0x289
[300120.936393] [] ? mem_cgroup_iter+0x24/0x181
[300120.936402] [] ? balance_pgdat+0x2e1/0x552
[300120.936410] [] ? _raw_spin_lock_irqsave+0x14/0x35
[300120.936418] [] ? kswapd+0x2b4/0x325
[300120.936428] [] ? abort_exclusive_wait+0x79/0x79
[300120.936436] [] ? balance_pgdat+0x552/0x552
[300120.936443] [] ? balance_pgdat+0x552/0x552
[300120.936451] [] ? kthread+0x81/0x89
[300120.936459] [] ? __kthread_parkme+0x5c/0x5c
[300120.936468] [] ? ret_from_fork+0x7c/0xb0
[300120.936476] [] ? __kthread_parkme+0x5c/0x5c
[300120.936498] INFO: task arc_adapt:1128 blocked for more than 120 seconds.
[300120.936502] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[300120.936506] arc_adapt D ffff88013fd93b80 0 1128 2 0x00000000
[300120.936513] ffff880138547800 0000000000000046 ffff880139c950c8 ffff88013aea41c0
[300120.936522] 0000000000013b80 ffff8801366c7fd8 ffff8801366c7fd8 ffff880138547800
[300120.936531] ffff88013aea41c0 ffffffffa0564768 ffff880138547800 ffffffffa056476c
[300120.936540] Call Trace:
[300120.936554] [] ? __mutex_lock_common.isra.6+0x149/0x165
[300120.936578] [] ? __thread_create+0x2d2/0x2d2 [spl]
[300120.936586] [] ? mutex_lock+0x1a/0x2d
[300120.936638] [] ? arc_evict+0x6c/0x49b [zfs]
[300120.936647] [] ? _raw_spin_lock_irqsave+0x14/0x35
[300120.936672] [] ? spl_kmem_cache_free+0x298/0x30a [spl]
[300120.936691] [] ? __thread_create+0x2d2/0x2d2 [spl]
[300120.936740] [] ? arc_adjust+0x66/0x172 [zfs]
[300120.936791] [] ? arc_adjust_meta+0x176/0x176 [zfs]
[300120.936842] [] ? arc_adapt_thread+0x57/0xb6 [zfs]
[300120.936861] [] ? thread_generic_wrapper+0x6a/0x75 [spl]
[300120.936869] [] ? kthread+0x81/0x89
[300120.936877] [] ? __kthread_parkme+0x5c/0x5c
[300120.936885] [] ? ret_from_fork+0x7c/0xb0
[300120.936893] [] ? __kthread_parkme+0x5c/0x5c
[300120.936971] INFO: task txg_quiesce:1604 blocked for more than 120 seconds.
[300120.936975] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[300120.936978] txg_quiesce D ffff88013fc13b80 0 1604 2 0x00000000
[300120.936986] ffff88012aa1a840 0000000000000046 ffffffffa0541c00 ffff880137025840
[300120.936995] 0000000000013b80 ffff880124fcbfd8 ffff880124fcbfd8 ffff88012aa1a840
[300120.937004] 0000000000000202 ffffc900110fe068 ffffc900110fe000 0000000000000000
[300120.937012] Call Trace:
[300120.937035] [] ? cv_wait_common+0xdc/0x171 [spl]
[300120.937044] [] ? should_resched+0x5/0x23
[300120.937053] [] ? _cond_resched+0x6/0x1b
[300120.937061] [] ? abort_exclusive_wait+0x79/0x79
[300120.937068] [] ? should_resched+0x5/0x23
[300120.937144] [] ? txg_quiesce_thread+0x201/0x2cc [zfs]
[300120.937221] [] ? txg_init+0x180/0x180 [zfs]
[300120.937240] [] ? __thread_create+0x2d2/0x2d2 [spl]
[300120.937258] [] ? thread_generic_wrapper+0x6a/0x75 [spl]
[300120.937266] [] ? kthread+0x81/0x89
[300120.937274] [] ? __kthread_parkme+0x5c/0x5c
[300120.937282] [] ? ret_from_fork+0x7c/0xb0
[300120.937290] [] ? __kthread_parkme+0x5c/0x5c
[300120.937334] INFO: task ntpd:4382 blocked for more than 120 seconds.
[300120.937338] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[300120.937342] ntpd D ffff88013fd93b80 0 4382 1 0x00000004
[300120.937349] ffff88009c963140 0000000000000082 0000000000000000 ffff88013aea41c0
[300120.937358] 0000000000013b80 ffff88006e2b7fd8 ffff88006e2b7fd8 ffff88009c963140
[300120.937367] 0000000000000001 ffffffffa0564768 ffff88009c963140 ffffffffa056476c
[300120.937375] Call Trace:
[300120.937390] [] ? __mutex_lock_common.isra.6+0x149/0x165
[300120.937403] [] ? mutex_lock+0x1a/0x2d
[300120.937453] [] ? add_reference.isra.10+0x62/0xd1 [zfs]
[300120.937506] [] ? arc_buf_add_ref+0x69/0xce [zfs]
[300120.937560] [] ? __dbuf_hold_impl+0x27f/0x39b [zfs]
[300120.937614] [] ? dbuf_hold_impl+0x74/0x98 [zfs]
[300120.937668] [] ? dbuf_hold_level+0x19/0x2b [zfs]
[300120.937727] [] ? dmu_tx_check_ioerr+0x31/0x93 [zfs]
[300120.937779] [] ? RW_WRITE_HELD+0x4c/0x7e [zfs]
[300120.937839] [] ? dmu_tx_count_write+0x88/0x540 [zfs]
[300120.937891] [] ? __dbuf_hold_impl+0x371/0x39b [zfs]
[300120.937900] [] ? __kmalloc+0xde/0xf0
[300120.937908] [] ? should_resched+0x5/0x23
[300120.937928] [] ? kmem_alloc_debug+0x14b/0x2d7 [spl]
[300120.937989] [] ? dnode_hold_impl+0x362/0x37c [zfs]
[300120.938050] [] ? dmu_tx_hold_write+0x2c/0x3d [zfs]
[300120.938133] [] ? zfs_write+0x4c7/0x9cf [zfs]
[300120.938141] [] ? should_resched+0x5/0x23
[300120.938149] [] ? _cond_resched+0x6/0x1b
[300120.938158] [] ? page_add_file_rmap+0x60/0x79
[300120.938166] [] ? handle_pte_fault+0x2c6/0x7dd
[300120.938253] [] ? zpl_write_common+0x52/0x6a [zfs]
[300120.938338] [] ? zpl_write+0x5a/0x7e [zfs]
[300120.938347] [] ? vfs_write+0x96/0xf8
[300120.938354] [] ? sys_write+0x51/0x80
[300120.938362] [] ? system_call_fastpath+0x16/0x1b
[300120.938371] INFO: task squid:4509 blocked for more than 120 seconds.
[300120.938375] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[300120.938379] squid D ffff88013fc93b80 0 4509 4502 0x00000000
[300120.938387] ffff880139c79840 0000000000000082 0000000000000000 ffff88013aea2180
[300120.938395] 0000000000013b80 ffff880068a43fd8 ffff880068a43fd8 ffff880139c79840
[300120.938404] 0000000000000400 ffffffffa0564768 ffff880139c79840 ffffffffa056476c
[300120.938413] Call Trace:
[300120.938427] [] ? __mutex_lock_common.isra.6+0x149/0x165
[300120.938440] [] ? mutex_lock+0x1a/0x2d
[300120.938489] [] ? arc_change_state.isra.7+0x121/0x21c [zfs]
[300120.938541] [] ? arc_access.isra.8+0xe7/0x17e [zfs]
[300120.938593] [] ? arc_read_nolock+0x374/0x695 [zfs]
[300120.938601] [] ? __cache_free.isra.39+0x18e/0x19d
[300120.938609] [] ? should_resched+0x5/0x23
[300120.938616] [] ? _cond_resched+0x6/0x1b
[300120.938669] [] ? arc_read+0xbe/0x122 [zfs]
[300120.938723] [] ? dbuf_prefetch+0x20c/0x235 [zfs]
[300120.938784] [] ? dmu_zfetch_dofetch.isra.4+0xcf/0x104 [zfs]
[300120.938792] [] ? should_resched+0x5/0x23
[300120.938852] [] ? dmu_zfetch+0x42b/0x937 [zfs]
[300120.938861] [] ? should_resched+0x5/0x23
[300120.938913] [] ? dbuf_read+0xd6/0x4b6 [zfs]
[300120.938966] [] ? __dbuf_hold_impl+0x371/0x39b [zfs]
[300120.938973] [] ? _raw_spin_lock_irqsave+0x14/0x35
[300120.938984] [] ? _raw_spin_lock_irqsave+0x14/0x35
[300120.939044] [] ? dnode_hold_impl+0x138/0x37c [zfs]
[300120.939094] [] ? remove_reference.isra.9+0x7b/0x87 [zfs]
[300120.939102] [] ? should_resched+0x5/0x23
[300120.939110] [] ? _cond_resched+0x6/0x1b
[300120.939166] [] ? dmu_bonus_hold+0x1c/0x177 [zfs]
[300120.939174] [] ? mutex_lock+0xd/0x2d
[300120.939258] [] ? zfs_zget+0x5b/0x191 [zfs]
[300120.939337] [] ? zfs_dirent_lock+0x3f4/0x425 [zfs]
[300120.939417] [] ? zfs_zaccess_aces_check+0x1ca/0x1f5 [zfs]
[300120.939496] [] ? zfs_dirlook+0x1bc/0x215 [zfs]
[300120.939575] [] ? zfs_zaccess+0x1cf/0x255 [zfs]
[300120.939583] [] ? should_resched+0x5/0x23
[300120.939666] [] ? zfs_lookup+0x263/0x2b0 [zfs]
[300120.939751] [] ? zpl_lookup+0x47/0x95 [zfs]
[300120.939760] [] ? d_alloc+0x61/0x69
[300120.939769] [] ? lookup_real+0x24/0x3f
[300120.939776] [] ? do_last+0x505/0xa0f
[300120.939783] [] ? __inode_permission+0x57/0x95
[300120.939791] [] ? path_openat+0xc1/0x33b
[300120.939799] [] ? do_filp_open+0x2a/0x6e
[300120.939807] [] ? __alloc_fd+0xdc/0xed
[300120.939816] [] ? do_sys_open+0x5c/0xe0
[300120.939825] [] ? system_call_fastpath+0x16/0x1b
[300120.939913] INFO: task BackupPC_dump:31833 blocked for more than 120 seconds.
[300120.939917] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[300120.939920] BackupPC_dump D ffff88013fd93b80 0 31833 4692 0x00000000
[300120.939929] ffff880056161140 0000000000000086 ffff88008ca23c58 ffff88013765f0c0
[300120.939937] 0000000000013b80 ffff88008ca23fd8 ffff88008ca23fd8 ffff880056161140
[300120.939946] ffffc90049402640 ffff88013fd93b80 ffff880056161140 0000000000000001
[300120.939955] Call Trace:
[300120.939964] [] ? io_schedule+0x56/0x6d
[300120.939987] [] ? cv_wait_common+0xd5/0x171 [spl]
[300120.939995] [] ? abort_exclusive_wait+0x79/0x79
[300120.940053] [] ? should_resched+0x5/0x23
[300120.940176] [] ? zio_wait+0x107/0x139 [zfs]
[300120.940260] [] ? dmu_buf_hold_array_by_dnode+0x242/0x329 [zfs]
[300120.940344] [] ? dmu_buf_hold_array+0x59/0x7b [zfs]
[300120.940429] [] ? dmu_read_uio+0x3d/0xbd [zfs]
[300120.940543] [] ? zfs_read+0x344/0x3ec [zfs]
[300120.940660] [] ? zpl_read_common+0x52/0x6a [zfs]
[300120.940776] [] ? zpl_read+0x5a/0x7e [zfs]
[300120.940802] [] ? vfs_read+0x93/0xf5
[300120.940815] [] ? sys_read+0x51/0x80
[300120.940828] [] ? system_call_fastpath+0x16/0x1b
[300120.940841] INFO: task BackupPC_dump:2324 blocked for more than 120 seconds.
[300120.940847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[300120.940853] BackupPC_dump D ffff88013fd13b80 0 2324 31833 0x00000000
[300120.940862] ffff880119f5b800 0000000000000082 ffff880086071c58 ffff8801376ec780
[300120.940874] 0000000000013b80 ffff880086071fd8 ffff880086071fd8 ffff880119f5b800
[300120.940887] ffffc90053cd4440 ffff88013fd13b80 ffff880119f5b800 0000000000000001
[300120.940900] Call Trace:
[300120.940913] [] ? io_schedule+0x56/0x6d
[300120.940944] [] ? cv_wait_common+0xd5/0x171 [spl]
[300120.940957] [] ? abort_exclusive_wait+0x79/0x79
[300120.940968] [] ? should_resched+0x5/0x23
[300120.941060] [] ? zio_wait+0x107/0x139 [zfs]
[300120.941117] [] ? dmu_buf_hold_array_by_dnode+0x242/0x329 [zfs]
[300120.941215] [] ? dmu_buf_hold_array+0x59/0x7b [zfs]
[300120.941274] [] ? dmu_read_uio+0x3d/0xbd [zfs]
[300120.941358] [] ? zfs_read+0x344/0x3ec [zfs]
[300120.941444] [] ? zpl_read_common+0x52/0x6a [zfs]
[300120.941529] [] ? zpl_read+0x5a/0x7e [zfs]
[300120.941538] [] ? vfs_read+0x93/0xf5
[300120.941545] [] ? sys_read+0x51/0x80
[300120.941554] [] ? system_call_fastpath+0x16/0x1b
[300120.941564] INFO: task BackupPC_dump:31174 blocked for more than 120 seconds.
[300120.941568] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[300120.941572] BackupPC_dump D ffff88013fc13b80 0 31174 31167 0x00000000
[300120.941580] ffff88012491b8c0 0000000000000086 ffff88008d8051a8 ffff880137025840
[300120.941589] 0000000000013b80 ffff88001a5f7fd8 ffff88001a5f7fd8 ffff88012491b8c0
[300120.941598] 0000000000000202 ffff88013fc13b80 ffff88012491b8c0 0000000000000001
[300120.941607] Call Trace:
[300120.941616] [] ? io_schedule+0x56/0x6d
[300120.941637] [] ? cv_wait_common+0xd5/0x171 [spl]
[300120.941646] [] ? abort_exclusive_wait+0x79/0x79
[300120.941653] [] ? should_resched+0x5/0x23
[300120.941738] [] ? zio_wait+0x107/0x139 [zfs]
[300120.941791] [] ? dbuf_read+0x3fe/0x4b6 [zfs]
[300120.941802] [] ? _raw_spin_lock_irqsave+0x14/0x35
[300120.941864] [] ? dnode_hold_impl+0x138/0x37c [zfs]
[300120.941913] [] ? remove_reference.isra.9+0x7b/0x87 [zfs]
[300120.941921] [] ? should_resched+0x5/0x23
[300120.941929] [] ? _cond_resched+0x6/0x1b
[300120.941985] [] ? dmu_bonus_hold+0x1c/0x177 [zfs]
[300120.941993] [] ? mutex_lock+0xd/0x2d
[300120.942076] [] ? zfs_zget+0x5b/0x191 [zfs]
[300120.942157] [] ? zfs_dirent_lock+0x3f4/0x425 [zfs]
[300120.942236] [] ? zfs_zaccess_aces_check+0x1ca/0x1f5 [zfs]
[300120.942315] [] ? zfs_dirlook+0x1bc/0x215 [zfs]
[300120.942397] [] ? zfs_zaccess+0x1cf/0x255 [zfs]
[300120.942408] [] ? should_resched+0x5/0x23
[300120.942506] [] ? zfs_lookup+0x263/0x2b0 [zfs]
[300120.942595] [] ? zpl_lookup+0x47/0x95 [zfs]
[300120.942604] [] ? d_alloc+0x61/0x69
[300120.942613] [] ? lookup_real+0x24/0x3f
[300120.942620] [] ? do_last+0x505/0xa0f
[300120.942628] [] ? __inode_permission+0x57/0x95
[300120.942636] [] ? path_openat+0xc1/0x33b
[300120.942644] [] ? do_filp_open+0x2a/0x6e
[300120.942652] [] ? __alloc_fd+0xdc/0xed
[300120.942661] [] ? do_sys_open+0x5c/0xe0
[300120.942670] [] ? system_call_fastpath+0x16/0x1b
[300240.940064] INFO: task kswapd0:39 blocked for more than 120 seconds.
[300240.940075] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[300240.940082] kswapd0 D ffff88013fd93b80 0 39 2 0x00000000
[300240.940096] ffff88013afd71c0 0000000000000046 0000000000013b80 ffff88013aea41c0
[300240.940110] 0000000000013b80 ffff880137a9ffd8 ffff880137a9ffd8 ffff88013afd71c0
[300240.940124] ffff88013afd71c0 ffff8801248c8fb8 ffff88013afd71c0 ffff8801248c8fbc
[300240.940138] Call Trace:
[300240.940160] [] ? __mutex_lock_common.isra.6+0x149/0x165
[300240.940169] [] ? mutex_lock+0x1a/0x2d
[300240.940177] [] ? mutex_lock+0xd/0x2d
[300240.940281] [] ? zfs_zinactive+0x4a/0xc6 [zfs]
[300240.940366] [] ? zfs_inactive+0x103/0x13c [zfs]
[300240.940377] [] ? evict+0xa3/0x153
[300240.940384] [] ? dispose_list+0x27/0x31
[300240.940391] [] ? prune_icache_sb+0x2aa/0x2c5
[300240.940400] [] ? prune_super+0xc3/0x12d
[300240.940410] [] ? shrink_slab+0x19d/0x289
[300240.940418] [] ? mem_cgroup_iter+0x24/0x181
[300240.940427] [] ? balance_pgdat+0x2e1/0x552
[300240.940436] [] ? _raw_spin_lock_irqsave+0x14/0x35
[300240.940443] [] ? kswapd+0x2b4/0x325
[300240.940453] [] ? abort_exclusive_wait+0x79/0x79
[300240.940461] [] ? balance_pgdat+0x552/0x552
[300240.940468] [] ? balance_pgdat+0x552/0x552
[300240.940476] [] ? kthread+0x81/0x89
[300240.940484] [] ? __kthread_parkme+0x5c/0x5c
[300240.940493] [] ? ret_from_fork+0x7c/0xb0
[300240.940501] [] ? __kthread_parkme+0x5c/0x5c
[300240.940524] INFO: task arc_adapt:1128 blocked for more than 120 seconds.
[300240.940528] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[300240.940532] arc_adapt D ffff88013fd93b80 0 1128 2 0x00000000
[300240.940539] ffff880138547800 0000000000000046 ffff880139c950c8 ffff88013aea41c0
[300240.940548] 0000000000013b80 ffff8801366c7fd8 ffff8801366c7fd8 ffff880138547800
[300240.940557] ffff88013aea41c0 ffffffffa0564768 ffff880138547800 ffffffffa056476c
[300240.940565] Call Trace:
[300240.940580] [] ? __mutex_lock_common.isra.6+0x149/0x165
[300240.940604] [] ? __thread_create+0x2d2/0x2d2 [spl]
[300240.940612] [] ? mutex_lock+0x1a/0x2d
[300240.940663] [] ? arc_evict+0x6c/0x49b [zfs]
[300240.940674] [] ? _raw_spin_lock_irqsave+0x14/0x35
[300240.940698] [] ? spl_kmem_cache_free+0x298/0x30a [spl]
[300240.940717] [] ? __thread_create+0x2d2/0x2d2 [spl]
[300240.940766] [] ? arc_adjust+0x66/0x172 [zfs]
[300240.940817] [] ? arc_adjust_meta+0x176/0x176 [zfs]
[300240.940868] [] ? arc_adapt_thread+0x57/0xb6 [zfs]
[300240.940887] [] ? thread_generic_wrapper+0x6a/0x75 [spl]
[300240.940895] [] ? kthread+0x81/0x89
[300240.940903] [] ? __kthread_parkme+0x5c/0x5c
[300240.940911] [] ? ret_from_fork+0x7c/0xb0
[300240.940919] [] ? __kthread_parkme+0x5c/0x5c
The pools currently look this, but the lockups happened when the pools were less full too:
(I can't believe the mailing list is full of people saying ZFS is production ready. Not if you use rsync. It's been no end of trouble for me, but unfortunately it's a 3 week job to migrate back to something sensible like mdadm+ext3)
The text was updated successfully, but these errors were encountered: