Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mount.zfs hangs in D state for a long time before it succeeds #2516

Closed
edamato opened this issue Jul 21, 2014 · 4 comments
Closed

mount.zfs hangs in D state for a long time before it succeeds #2516

edamato opened this issue Jul 21, 2014 · 4 comments
Labels
Type: Performance Performance improvement or performance problem
Milestone

Comments

@edamato
Copy link

edamato commented Jul 21, 2014

I tried to mount a zfs filesystem after a kernel panic due to networking drivers, the zpool was successfully imported, however mounting one of the zfs filesystem objects took about 2 hours.

  • The zpool was degraded because the mirror replication was off. Mirroring takes place over the network.
  • During the crash the system was running a scrub. The scrub continued automatically after rebooting.
  • During the time in which the mount was taking place, I could see very little IO and the mount.zfs process would occasionally use some cpu and then stop
  • There wasn't any way (to my knowledge) to know what mount.zfs was doing and how long it would take, or a progress meter to find out how much it still had to do.
  • Is it possible to find out what it was doing and to control its behaviour if this happens again?

Thanks!

Environment:

  • RHEL6.4, 256GB of RAM, 2 x 4 core Xeon
  • ZFS version zfs-0.6.2 spl-0.6.2, against RHEL6 kernel 2.6.32-358.el6
  • zpool uses 4 mirrored vdevs totalling about 50TB

zfs kernel options:

options zfs zfs_arc_max=8589934592 zfs_top_maxinflight=1024 zfs_write_limit_shift=9

  • arc size is 8GB to prevent soft lockups on zvol I/O
  • zfs write limit shift is 9 to force txg sync every 512M of writes
  • maxinflight is 1024 to speed up scrubs.

sysrqt from the mount.zfs process:


Jul 20 18:47:49 nodeA kernel: mount.zfs     D 0000000000000001     0 15661      1 0x00000004
Jul 20 18:47:49 nodeA kernel: ffff884019c0f4d8 0000000000000082 0000000000000000 ffffffffa06a6a40
Jul 20 18:47:49 nodeA kernel: ffff881fc84a98f0 00000002d9cf2f18 0000000000000004 00000000a06e04e3
Jul 20 18:47:49 nodeA kernel: ffff88401a1efaf8 ffff884019c0ffd8 000000000000fb88 ffff88401a1efaf8
Jul 20 18:47:49 nodeA kernel: Call Trace:
Jul 20 18:47:49 nodeA kernel: [<ffffffffa06a6a40>] ? vdev_mirror_child_done+0x0/0x30 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffff8150de73>] io_schedule+0x73/0xc0
Jul 20 18:47:49 nodeA kernel: [<ffffffffa032f41c>] cv_wait_common+0xac/0x1c0 [spl]
Jul 20 18:47:49 nodeA kernel: [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
Jul 20 18:47:49 nodeA kernel: [<ffffffffa032f548>] __cv_wait_io+0x18/0x20 [spl]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa06e1c3b>] zio_wait+0xfb/0x1b0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa064858d>] dbuf_read+0x3fd/0x750 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa0323d0b>] ? kmem_free_debug+0x4b/0x150 [spl]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa065fc98>] dnode_next_offset_level+0x368/0x3f0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa06e3091>] ? zio_create+0x461/0x550 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa0661c32>] dnode_next_offset+0x1b2/0x230 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa065de48>] dmu_tx_hold_free+0x168/0x900 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa064f779>] dmu_free_long_range_impl+0x199/0x260 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa064f88b>] dmu_free_long_range+0x4b/0x70 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa06bae40>] zfs_rmnode+0x60/0x340 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa06d8765>] zfs_zinactive+0xa5/0x110 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa06d775f>] zfs_inactive+0x7f/0x220 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa06ea1e0>] ? zpl_inode_delete+0x0/0x30 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa06ea11e>] zpl_clear_inode+0xe/0x10 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffff8119cf2c>] clear_inode+0xac/0x140
Jul 20 18:47:49 nodeA kernel: [<ffffffffa06ea200>] zpl_inode_delete+0x20/0x30 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffff8119d62e>] generic_delete_inode+0xde/0x1d0
Jul 20 18:47:49 nodeA kernel: [<ffffffff8119d785>] generic_drop_inode+0x65/0x80
Jul 20 18:47:49 nodeA kernel: [<ffffffff8119c5d2>] iput+0x62/0x70
Jul 20 18:47:49 nodeA kernel: [<ffffffffa06bad8c>] zfs_unlinked_drain+0xdc/0x130 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffff8150dd43>] ? thread_return+0x6b1/0x76e
Jul 20 18:47:49 nodeA kernel: [<ffffffff81096e27>] ? finish_wait+0x67/0x80
Jul 20 18:47:49 nodeA kernel: [<ffffffffa03299fa>] ? taskq_create+0x30a/0x510 [spl]
Jul 20 18:47:49 nodeA kernel: [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
Jul 20 18:47:49 nodeA kernel: [<ffffffffa06d4b60>] ? zfs_get_data+0x0/0x240 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa06cdfc8>] zfs_sb_setup+0x108/0x140 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa06ced96>] zfs_domount+0x2b6/0x340 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffff8127ec5a>] ? strlcpy+0x4a/0x60
Jul 20 18:47:49 nodeA kernel: [<ffffffff81184089>] ? sget+0x429/0x480
Jul 20 18:47:49 nodeA kernel: [<ffffffff811829e0>] ? set_anon_super+0x0/0x100
Jul 20 18:47:49 nodeA kernel: [<ffffffffa06ea080>] ? zpl_fill_super+0x0/0x20 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa06ea08e>] zpl_fill_super+0xe/0x20 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffff811841ff>] get_sb_nodev+0x5f/0xa0
Jul 20 18:47:49 nodeA kernel: [<ffffffffa06ea075>] zpl_get_sb+0x25/0x30 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffff8118381b>] vfs_kern_mount+0x7b/0x1b0
Jul 20 18:47:49 nodeA kernel: [<ffffffff811839c2>] do_kern_mount+0x52/0x130
Jul 20 18:47:49 nodeA kernel: [<ffffffff811a3c12>] do_mount+0x2d2/0x8d0
Jul 20 18:47:49 nodeA kernel: [<ffffffff811a42a0>] sys_mount+0x90/0xe0
Jul 20 18:47:49 nodeA kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b


and txg_sync:


Jul 20 18:47:49 nodeA kernel: txg_sync      D 0000000000000004     0 15070      2 0x00000000
Jul 20 18:47:49 nodeA kernel: ffff88401af93020 0000000000000046 0000000000000202 0000003000000001
Jul 20 18:47:49 nodeA kernel: 000000013e8980a8 0000000000000001 ffff88401ac70ae0 0000000000000082
Jul 20 18:47:49 nodeA kernel: ffff88401ac71098 ffff88401af93fd8 000000000000fb88 ffff88401ac71098
Jul 20 18:47:49 nodeA kernel: Call Trace:
Jul 20 18:47:49 nodeA kernel: [<ffffffff81096e8e>] ? prepare_to_wait_exclusive+0x4e/0x80
Jul 20 18:47:49 nodeA kernel: [<ffffffffa032f475>] cv_wait_common+0x105/0x1c0 [spl]
Jul 20 18:47:49 nodeA kernel: [<ffffffff81055ab3>] ? __wake_up+0x53/0x70
Jul 20 18:47:49 nodeA kernel: [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
Jul 20 18:47:49 nodeA kernel: [<ffffffffa063d777>] ? buf_hash_find+0x87/0x110 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa032f585>] __cv_wait+0x15/0x20 [spl]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa0643ffb>] arc_read+0xab/0x8b0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa032603a>] ? kmem_alloc_debug+0x13a/0x4c0 [spl]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa06e057a>] ? zio_taskq_dispatch+0x7a/0xa0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa0640c60>] ? arc_getbuf_func+0x0/0x80 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa067be45>] dsl_scan_visitbp+0x355/0xba0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa067a41c>] ? dsl_scan_prefetch+0x9c/0xb0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa067bf6a>] dsl_scan_visitbp+0x47a/0xba0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa067c5e7>] dsl_scan_visitbp+0xaf7/0xba0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa067bf6a>] dsl_scan_visitbp+0x47a/0xba0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa067bf6a>] dsl_scan_visitbp+0x47a/0xba0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa067bf6a>] dsl_scan_visitbp+0x47a/0xba0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa067bf6a>] dsl_scan_visitbp+0x47a/0xba0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa067bf6a>] dsl_scan_visitbp+0x47a/0xba0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa067bf6a>] dsl_scan_visitbp+0x47a/0xba0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa0644609>] ? arc_read+0x6b9/0x8b0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa067c1ba>] dsl_scan_visitbp+0x6ca/0xba0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa0649739>] ? dbuf_dirty+0x3b9/0x7a0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa067c76c>] dsl_scan_visitds+0xdc/0x4f0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa067ccbe>] dsl_scan_sync+0x13e/0xaa0 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa0689817>] spa_sync+0x477/0xa80 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffff81012b69>] ? read_tsc+0x9/0x20
Jul 20 18:47:49 nodeA kernel: [<ffffffffa069c707>] txg_sync_thread+0x307/0x590 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffff81056089>] ? set_user_nice+0xc9/0x130
Jul 20 18:47:49 nodeA kernel: [<ffffffffa069c400>] ? txg_sync_thread+0x0/0x590 [zfs]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa0327478>] thread_generic_wrapper+0x68/0x80 [spl]
Jul 20 18:47:49 nodeA kernel: [<ffffffffa0327410>] ? thread_generic_wrapper+0x0/0x80 [spl]
Jul 20 18:47:49 nodeA kernel: [<ffffffff81096916>] kthread+0x96/0xa0
Jul 20 18:47:49 nodeA kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
Jul 20 18:47:49 nodeA kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0
Jul 20 18:47:49 nodeA kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

zpool events showed:

# zpool events
TIME                           CLASS
Jul 20 2014 19:40:14.712296140 resource.fs.zfs.statechange
Jul 20 2014 19:40:14.712296140 resource.fs.zfs.statechange
Jul 20 2014 19:40:14.712296140 resource.fs.zfs.statechange
Jul 20 2014 19:40:14.712296140 resource.fs.zfs.statechange
Jul 20 2014 19:40:14.712296140 resource.fs.zfs.statechange
Jul 20 2014 19:40:14.712296140 resource.fs.zfs.statechange
Jul 20 2014 19:40:14.712296140 resource.fs.zfs.statechange
Jul 20 2014 19:40:14.712296140 resource.fs.zfs.statechange
Jul 20 2014 19:40:14.893292755 ereport.fs.zfs.checksum
Jul 20 2014 19:40:14.917292316 ereport.fs.zfs.checksum
Jul 20 2014 19:40:14.939291891 ereport.fs.zfs.checksum
Jul 20 2014 19:40:14.976291202 ereport.fs.zfs.checksum
Jul 20 2014 19:40:15.017290435 ereport.fs.zfs.checksum
Jul 20 2014 19:40:15.026290266 ereport.fs.zfs.checksum
Jul 20 2014 19:40:15.169287591 ereport.fs.zfs.checksum
Jul 20 2014 19:40:15.181287367 ereport.fs.zfs.checksum

(mount succeeded)


Jul 20 2014 19:40:28.031046839 ereport.fs.zfs.config.sync
Jul 20 2014 19:40:45.191725050 ereport.fs.zfs.config.sync
Jul 20 2014 19:40:47.132688615 ereport.fs.zfs.resilver.start
Jul 21 2014 05:08:37.487704826 ereport.fs.zfs.resilver.finish
@behlendorf behlendorf added this to the 0.7.0 milestone Jul 21, 2014
@behlendorf behlendorf added the Bug label Jul 21, 2014
@avg-I
Copy link
Contributor

avg-I commented Jul 31, 2014

I observed a similar problem but without a scub being involved.
Mounting of dataset was taking so long that I didn't have patience for it to end.

To me it appeared like both mount.zfs and zfs_iput_taskq/ were doing the same thing, possibly interfering with each other. Perhaps the problem was also related to issue #2240.
Both of the mentioned threads were looping (not stuck) in zfs_unlinked_drain() producing a lot of read I/O. Captured stack traces always had the following calls:
zfs_rmnode
zfs_zinactive
zfs_inactive
zpl_evict_inode
evict
iput
zfs_unlinked_drain

@avg-I
Copy link
Contributor

avg-I commented Aug 5, 2014

I can reproduce this almost every time I crash a system or forcefully reboot it (e.g. power cycle).

@prometheanfire
Copy link
Contributor

I can as well I think, can't get what's active durring mount (zfs is rootfs).

@behlendorf behlendorf added Type: Performance Performance improvement or performance problem and removed Bug labels Mar 25, 2016
@behlendorf
Copy link
Contributor

The delay here is caused by the need to process the unlinked set after a crash. Currently this is done synchronously in the mount but in principle it could be done asynchronously after the mount completes. This is a duplicate of #3814 which proposes to implement this functionality. Unfortunately for the moment you'll need to wait it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests

4 participants