Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs crash, followed by lockup on boot. (volume mount successfully in zevo.) #1702

Closed
satmandu opened this issue Sep 6, 2013 · 7 comments
Closed
Milestone

Comments

@satmandu
Copy link
Contributor

satmandu commented Sep 6, 2013

I have 3 volumes on 6 disks, 5 disks of which are connected via eSATA, and one via firewire.

Volumes were created in zevo, and have mounted successfully in ubuntu on kernel 3.11, with zfsonlinux .62.

While doing a chown on a git directory, zfsonlinux had an error, locking up all zfs volumes.

Rebooting works, until the volumes get mounted. Attempting to mount volumes fails and locks the system.

This is what was in my dmesg prior to having to reboot:

[   11.084831] SPL: Loaded module v0.6.2-1~raring
[   11.092885] znvpair: module license 'CDDL' taints kernel.
[   11.095207] Disabling lock debugging due to kernel taint
[   11.170847] ZFS: Loaded module v0.6.2-1~raring, ZFS pool version 5000, ZFS filesystem version 5

[  240.628496] txg_quiesce     D ffff8801bfd13280     0  1259      2 0x00000000
[  240.628507]  ffff8801aabdfd68 0000000000000046 ffff8801ac0c45f0 ffff8801aabdffd8
[  240.628517]  ffff8801aabdffd8 ffff8801aabdffd8 ffff8801b6918000 ffff8801ac0c45f0
[  240.628526]  ffffc900057fd2b0 ffffc900057fd1f8 ffffc900057fd2b8 0000000000000000
[  240.628534] Call Trace:
[  240.628552]  [<ffffffff816a8ef9>] schedule+0x29/0x70
[  240.628591]  [<ffffffffa061a31d>] cv_wait_common+0xfd/0x1b0 [spl]
[  240.628602]  [<ffffffff8106ffe0>] ? finish_wait+0x80/0x80
[  240.628626]  [<ffffffffa061a425>] __cv_wait+0x15/0x20 [spl]
[  240.628724]  [<ffffffffa075f22b>] txg_quiesce_thread+0x22b/0x380 [zfs]
[  240.628811]  [<ffffffffa075f000>] ? txg_fini+0x280/0x280 [zfs]
[  240.628834]  [<ffffffffa0612c78>] thread_generic_wrapper+0x78/0x90 [spl]
[  240.628856]  [<ffffffffa0612c00>] ? __thread_create+0x340/0x340 [spl]
[  240.628864]  [<ffffffff8106f970>] kthread+0xc0/0xd0
[  240.628873]  [<ffffffff8106f8b0>] ? kthread_create_on_node+0x120/0x120
[  240.628881]  [<ffffffff816b295c>] ret_from_fork+0x7c/0xb0
[  240.628890]  [<ffffffff8106f8b0>] ? kthread_create_on_node+0x120/0x120
[  240.628912] INFO: task mount.zfs:1771 blocked for more than 120 seconds.
[  240.629007] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  240.629112] mount.zfs       D ffff8801bfd13280     0  1771   1770 0x00000000
[  240.629121]  ffff8801b323b9c8 0000000000000086 ffff8801b4aec5f0 ffff8801b323bfd8
[  240.629130]  ffff8801b323bfd8 ffff8801b323bfd8 ffff8801b6918000 ffff8801b4aec5f0
[  240.629138]  ffff8801ac585ac0 ffff8801ac585a20 ffff8801ac585ac8 0000000000000000
[  240.629146] Call Trace:
[  240.629155]  [<ffffffff816a8ef9>] schedule+0x29/0x70
[  240.629180]  [<ffffffffa061a31d>] cv_wait_common+0xfd/0x1b0 [spl]
[  240.629188]  [<ffffffff8106ffe0>] ? finish_wait+0x80/0x80
[  240.629212]  [<ffffffffa061a425>] __cv_wait+0x15/0x20 [spl]
[  240.629298]  [<ffffffffa075e8c3>] txg_wait_synced+0xb3/0x190 [zfs]
[  240.629385]  [<ffffffffa079e12d>] zil_replay_log_record+0xed/0x1b0 [zfs]
[  240.629469]  [<ffffffffa079eb27>] zil_parse+0x437/0x820 [zfs]
[  240.629552]  [<ffffffffa079d700>] ? zil_aitx_compare+0x20/0x20 [zfs]
[  240.629634]  [<ffffffffa079e040>] ? zil_replay_error.isra.5+0xc0/0xc0 [zfs]
[  240.629720]  [<ffffffffa07a13f1>] zil_replay+0xb1/0x100 [zfs]
[  240.629805]  [<ffffffffa078f4d8>] zfs_sb_setup+0x138/0x150 [zfs]
[  240.629889]  [<ffffffffa07902e8>] zfs_domount+0x228/0x270 [zfs]
[  240.629969]  [<ffffffffa07ab610>] ? zpl_mount+0x30/0x30 [zfs]
[  240.630049]  [<ffffffffa07ab61e>] zpl_fill_super+0xe/0x20 [zfs]
[  240.630058]  [<ffffffff81191b96>] mount_nodev+0x56/0xb0
[  240.630138]  [<ffffffffa07ab605>] zpl_mount+0x25/0x30 [zfs]
[  240.630147]  [<ffffffff81192a63>] mount_fs+0x43/0x1b0
[  240.630158]  [<ffffffff811abfa4>] vfs_kern_mount+0x74/0x110
[  240.630167]  [<ffffffff811ae519>] do_mount+0x259/0xa20
[  240.630177]  [<ffffffff8112c55e>] ? __get_free_pages+0xe/0x50
[  240.630185]  [<ffffffff811ae146>] ? copy_mount_options+0x36/0x170
[  240.630193]  [<ffffffff811aed6e>] SyS_mount+0x8e/0xe0
[  240.630202]  [<ffffffff816b2a06>] system_call_fastpath+0x1a/0x1f

My guess is that there might have been an issue where the esata bus was overwhelmed and throttled, causing zfs on linux to get confused. That's all I have.

The volumes mount successfully when the system is rebooted into OS X, and I'm running a scrub on the volumes now there.

@behlendorf
Copy link
Contributor

The stacks indicate that the mount was blocked on I/O while it was replaying the ZIL. That makes me suspect a disk problem. If you were able to mount the FS on your Mac then the ZIL will have been replayed and you won't have any issues importing the pool on Linux.

@satmandu
Copy link
Contributor Author

satmandu commented Sep 7, 2013

I suspect this is happening because I'm hitting my eSATA bandwidth cap on this device (I have multiple drives connected to this eSATA port) and not because of a drive failure.

So this is likely a system block i/o issue on the drive and not an i/o timeout intrinsic to zfs?

So just tuning in the system via something like this for my device should help alleviate this issue, right?

echo 180 > /sys/block/sda/device/timeout

Or would it make sense to reduce the timeout to something like 5 seconds from 30 seconds and make the device try again? I'm not sure what would be most efficient here. Scrubs on the disk show all the data is fine. Marking the disk as bad due to a failed read when the failed read was due to an i/o limitation seems to be less than ideal.

As per this discussion: #471 , I'm not sure about the best way to proceed here.

Regardless, if the mount is blocked on I/O, the zfs module shouldn't crash, right? (If this happens in Zevo the pool is forcibly exported, though it occasionally takes the system down there too.)

@behlendorf
Copy link
Contributor

Nothing should crash, and in fact nothing crashed under Linux. The console messages (which scary looking) are simply warnings. They indicate that a task is taking far longer to complete than it should, in this case it was due to an IO which just wasn't completing. But nothing failed.

INFO: task mount.zfs:1771 blocked for more than 120 seconds

Increasing the device timeout may help if for some reason things are just backed up. However, if you're overloading this eSATA port it may be wiser to decrease the timeout and increase. This will allow you to attempt more retries in the same amount of time. The best thing to do would be to move the disks on to additional controllers to avoid this entirely.

@satmandu
Copy link
Contributor Author

satmandu commented Sep 7, 2013

Thanks again. I've adjusted the timeout to 10 seconds to see if that helps.

I was able to reboot the system by SSHing in, but the system then hung at zfs mount at the next boot. I had to reboot into OS X, where zevo mounted the volumes, before rebooting into linux, where I had to export, and then reimport the pools.

Is there maybe an issue with mounting pools at next boot after this sort of thing happens? (The failure to mount the zpools at next boot into linux after the issue blocked the startup process, so I couldn't connect in and see what was going on.)

If I/O doesn't complete within the timeout, and the mount becomes unresponsive, do the pools not export cleanly during shutdown? Is there a way to force umount/export at shutdown?

Since the timeout was 30 seconds, and mount.zfs was complaining after 120 seconds, I imagine that zfs should have complained and marked disk(s) as bad. Would that have caused the lockup at next boot?

@satmandu
Copy link
Contributor Author

satmandu commented Sep 7, 2013

I added this to /etc/udev/rules.d/81-zfs-timeout.rules on my ubuntu system to drop the timeout to 10 seconds on all of my potential devices.

ACTION=="add", KERNEL=="sda", SUBSYSTEM=="block", RUN+="/bin/sh -c 'echo 10 > /sys$devpath/device/timeout'"
ACTION=="add", KERNEL=="sdb", SUBSYSTEM=="block", RUN+="/bin/sh -c 'echo 10 > /sys$devpath/device/timeout'"
ACTION=="add", KERNEL=="sdc", SUBSYSTEM=="block", RUN+="/bin/sh -c 'echo 10 > /sys$devpath/device/timeout'"
ACTION=="add", KERNEL=="sdd", SUBSYSTEM=="block", RUN+="/bin/sh -c 'echo 10 > /sys$devpath/device/timeout'"
ACTION=="add", KERNEL=="sde", SUBSYSTEM=="block", RUN+="/bin/sh -c 'echo 10 > /sys$devpath/device/timeout'"
ACTION=="add", KERNEL=="sdf", SUBSYSTEM=="block", RUN+="/bin/sh -c 'echo 10 > /sys$devpath/device/timeout'"
ACTION=="add", KERNEL=="sdg", SUBSYSTEM=="block", RUN+="/bin/sh -c 'echo 10 > /sys$devpath/device/timeout'"
ACTION=="add", KERNEL=="sdh", SUBSYSTEM=="block", RUN+="/bin/sh -c 'echo 10 > /sys$devpath/device/timeout'"

@behlendorf
Copy link
Contributor

@satmandu Where your proposed tuning enough to resolve the issue?

@satmandu
Copy link
Contributor Author

satmandu commented Nov 3, 2013

Sadly no. 50% of boots were hanging. ( I had the timeout eventually set to 40, then I gave up.)

Reverting the change allowed me to reliably boot up... which I consider a priority. ;-)

@satmandu satmandu closed this as completed Nov 3, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants