Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs and zpool commands hangs when znapzend runs backups and influence for the cluster #7331

Closed
arturpzol opened this issue Mar 22, 2018 · 10 comments
Labels
Type: Performance Performance improvement or performance problem

Comments

@arturpzol
Copy link

System information

Type Version/Name
Distribution Name Debian
Distribution Version Jessie
Linux Kernel 4.4
Architecture 64
ZFS Version 0.7.7 and master
SPL Version 0.7.7 and master

Describe the problem you're observing

When znapzend runs some backups in the same time sometimes zfs and zpool commands hangs for a few seconds/minutes depending on the system load. This behaviour causes failover in cluster enviroment.

For example:

In ZFS OCF RA Pacemaker resource monitor checks health of the pool every 5 seconds:

<action name="monitor" depth="0" timeout="30s" interval="5s" />

zpool_monitor function:

zpool_monitor () {
    # If the pool is not imported, then we can't monitor its health
    if ! zpool_is_imported; then
        return $OCF_NOT_RUNNING
    fi

    # Check the pool status
    HEALTH=$(zpool list -H -o health "$OCF_RESKEY_pool")
    case "$HEALTH" in
        ONLINE|DEGRADED) return $OCF_SUCCESS;;
        FAULTED)         return $OCF_NOT_RUNNING;;
        *)               return $OCF_ERR_GENERIC;;
    esac
}

When zpool list -H -o health will hang for a 30 seconds failover is performing.

It looks like txg_sync need to finish work before unblock commands:


Mar 21 16:07:12 [kern.info] [ 1976.537474] txg_sync        D 0000000000000000     0 42138      2 0x00000000
Mar 21 16:07:12 [kern.warning] [ 1976.537475]  ffff880080713200 ffff88007ef64000 ffff88007ef63c10 ffff88023686dc40
Mar 21 16:07:12 [kern.warning] [ 1976.537477]  ffff88007ef63bc0 ffff88023686dc40 ffff880086d96eb0 ffffffff8170d18c
Mar 21 16:07:12 [kern.warning] [ 1976.537478]  000000010003038a ffffffff8170f3a8 ffffffffa0703b6c 0000000000000282
Mar 21 16:07:12 [kern.warning] [ 1976.537479] Call Trace:
Mar 21 16:07:12 [kern.warning] [ 1976.537481]  [<ffffffff8170d18c>] ? schedule+0x2c/0x70
Mar 21 16:07:12 [kern.warning] [ 1976.537488]  [<ffffffff8170f3a8>] ? schedule_timeout+0x108/0x1a0
Mar 21 16:07:12 [kern.warning] [ 1976.537518]  [<ffffffffa0703b6c>] ? zio_taskq_dispatch+0x8c/0xa0 [zfs]
Mar 21 16:07:12 [kern.warning] [ 1976.537520]  [<ffffffff810a6bf0>] ? cascade+0x60/0x60
Mar 21 16:07:12 [kern.warning] [ 1976.537522]  [<ffffffff8170c9af>] ? io_schedule_timeout+0xaf/0x130
Mar 21 16:07:12 [kern.warning] [ 1976.537528]  [<ffffffffa0523906>] ? __cv_timedwait_common+0xf6/0x130 [spl]
Mar 21 16:07:12 [kern.warning] [ 1976.537529]  [<ffffffff81093230>] ? wait_woken+0x80/0x80
Mar 21 16:07:12 [kern.warning] [ 1976.537559]  [<ffffffffa07085db>] ? zio_wait+0x11b/0x240 [zfs]
Mar 21 16:07:12 [kern.warning] [ 1976.537583]  [<ffffffffa067e34b>] ? dsl_pool_sync+0x39b/0x3f0 [zfs]
Mar 21 16:07:12 [kern.warning] [ 1976.537609]  [<ffffffffa069e259>] ? spa_sync+0x3d9/0xcb0 [zfs]
Mar 21 16:07:12 [kern.warning] [ 1976.537637]  [<ffffffffa06b03a1>] ? txg_sync_thread+0x301/0x4c0 [zfs]
Mar 21 16:07:12 [kern.warning] [ 1976.537664]  [<ffffffffa06b00a0>] ? txg_delay+0x150/0x150 [zfs]
Mar 21 16:07:12 [kern.warning] [ 1976.537667]  [<ffffffffa051f120>] ? thread_generic_wrapper+0x70/0x80 [spl]
Mar 21 16:07:12 [kern.warning] [ 1976.537670]  [<ffffffffa051f0b0>] ? __thread_exit+0x10/0x10 [spl]
Mar 21 16:07:12 [kern.warning] [ 1976.537671]  [<ffffffff81077b7a>] ? kthread+0xca/0xe0
Mar 21 16:07:12 [kern.warning] [ 1976.537672]  [<ffffffff81077ab0>] ? kthread_create_on_node+0x170/0x170
Mar 21 16:07:12 [kern.warning] [ 1976.537673]  [<ffffffff8171034f>] ? ret_from_fork+0x3f/0x70
Mar 21 16:07:12 [kern.warning] [ 1976.537675]  [<ffffffff81077ab0>] ? kthread_create_on_node+0x170/0x170

strace shows that commands are waiting for open /dev/zfs:


brk(0)                                  = 0x11d8000
brk(0x11f9000)                          = 0x11f9000
access("/sys/module/zfs", F_OK)         = 0
open("/dev/zfs", O_RDWR)

Is there possible to run zpool list or zpool status in async mode or any other possibility to get status of the pool? or if the only solution is to increase the timeout?

@loli10K
Copy link
Contributor

loli10K commented Mar 22, 2018

Depending on how "znapzend backups" work under the hood this could be a duplicate of #3104.

@behlendorf
Copy link
Contributor

@arturpzol let's start by seeing if we can determine exactly where it's waiting. I would have expected to see zpool list waiting in ioctl(2), not open(2) which shouldn't take long under most circumstances.
When it's waiting can you cat /proc/pid/stack to dump the kernel stack and post it.

@arturpzol
Copy link
Author

@behlendorf below logs from stack:

ps axf|grep "zpool list"
117241 pts/0    D+     0:00  |       \_ zpool list -H -o health Pool-0

cat /proc/117241/stack 
[<ffffffffa06eca65>] zfsdev_open+0x15/0x120 [zfs]
[<ffffffff81404afd>] misc_open+0x9d/0x140
[<ffffffff8114482c>] chrdev_open+0x9c/0x170
[<ffffffff81144790>] chrdev_open+0x0/0x170
[<ffffffff8113ec28>] do_dentry_open+0x1b8/0x2d0
[<ffffffff8114d2e6>] path_openat+0x596/0x1170
[<ffffffff81149b00>] terminate_walk+0xa0/0xb0
[<ffffffff8114f2b9>] do_filp_open+0x79/0xd0
[<ffffffff8115a417>] __alloc_fd+0x37/0x160
[<ffffffff8113fd9e>] do_sys_open+0x11e/0x200
[<ffffffff8171032e>] entry_SYSCALL_64_fastpath+0x12/0x71
[<ffffffffffffffff>] 0xffffffffffffffff

after a few seconds/minutes (depends on the load) when zpool list will return state of the pool process is still hanged for a few next seconds/minutes:

zpool list -H -o health Pool-0
ONLINE

cat /proc/117241/stack 
[<ffffffffa06e67c0>] zfsdev_release+0x10/0x70 [zfs]
[<ffffffff8114229b>] __fput+0x8b/0x1d0
[<ffffffff810763d0>] task_work_run+0x70/0x90
[<ffffffff8100308f>] exit_to_usermode_loop+0x8f/0xa0
[<ffffffff81003565>] syscall_return_slowpath+0x45/0x50
[<ffffffff8171048c>] int_ret_from_sys_call+0x25/0x8f
[<ffffffffffffffff>] 0xffffffffffffffff

When system load is low zpool list return output at once of course.

I have tried to use ZFS Channel Programs but suspension can be also noticed:

ps axf|grep "zfs program"
8326 pts/5    D+     0:00 zfs program -n

cat /proc/8326/stack 
[<ffffffffa06eca65>] zfsdev_open+0x15/0x120 [zfs]
[<ffffffff81404a7d>] misc_open+0x1d/0x140
[<ffffffff8114482c>] chrdev_open+0x9c/0x170
[<ffffffff81144790>] chrdev_open+0x0/0x170
[<ffffffff8113ec28>] do_dentry_open+0x1b8/0x2d0
[<ffffffff8114d2e6>] path_openat+0x596/0x1170
[<ffffffff81149b00>] terminate_walk+0xa0/0xb0
[<ffffffff8114f2b9>] do_filp_open+0x79/0xd0
[<ffffffff8115a417>] __alloc_fd+0x37/0x160
[<ffffffff8113fd9e>] do_sys_open+0x11e/0x200
[<ffffffff8171032e>] entry_SYSCALL_64_fastpath+0x12/0x71
[<ffffffffffffffff>] 0xffffffffffffffff

@behlendorf behlendorf added the Type: Performance Performance improvement or performance problem label Mar 27, 2018
@tonyhutter
Copy link
Contributor

@arturpzol do you happen to have a reliable reproducer? I'd like to look into this.

@tonyhutter
Copy link
Contributor

I tried to reproduce this on my own by doing a vdev attach while running strace -r on zpool list -H -o health:

0.000104 openat(AT_FDCWD, "/dev/zfs", O_RDWR) = 5
...
0.000347 ioctl(3, _IOC(0, 0x5a, 0x5, 0), 0x7ffdf404c730) = 0
3.415809 ioctl(3, _IOC(0, 0x5a, 0x27, 0), 0x7ffdf404b670) = 0  <--- ZFS_IOC_POOL_GET_PROPS

The openat() went though without delay. The ioctl() took 3.4 seconds, which is what we'd expect.

Can you rerun strace -r and post the full output? What is your pool configuration? Also, do you happen to know what kind of operations znapzend does during a backup (snapshots? zfs send? etc).

We ran into a similar problem back in February, where were we had a dying disk that was holding up up ZFS. That caused our zpool status heartbeat to hang, which in turn caused pacemaker to think the node was down and powered it off. It would be nice if there was a timeout or an async option for zpool so that this doesn't happen. We first need a reliable way to reproduce the issue before we can make a fix though.

@behlendorf
Copy link
Contributor

For the Pacemaker case it wouldn't be unreasonable to add a /proc/spl/kstat/zfs/<pool>/health kstat which provides only this information. We'd of course still want to get to the bottom of any unexpected delays described in this issue.

@marcinkuk
Copy link

marcinkuk commented May 8, 2018

Hello,

I have similar issue in my environment.

Environment:

  • node1 - rpool1/zvol1
  • node2 - rpool2/zvol2

Scenario:
# zfs send recv rpool1/zvlol1@snap1 | ssh node2 zfs recv rpool2/zvol2 -F
poweroff node2 during send + recv

Behaviour:

  • on node1:
    # zpool list -H -o health (and other zfs get commands) hangs for about 30 seconds

I think it may be related.

@tonyhutter
Copy link
Contributor

tonyhutter commented May 14, 2018

@marcinkuk I tried to reproduce your zfs send hang between two VMs with:

zpool create mypool1 `pwd`/file1
zfs create -V 2000M mypool1/zvol1
zfs set compression=off mypool1/zvol1
dd if=/dev/urandom of=/dev/zd0 bs=1M count=800
zfs snapshot mypool1/zvol1@snap1
zfs send mypool1/zvol1@snap1 | ssh hutter@fedora27 zfs recv mypool2/zvol2 -F

However, I didn't see the sending VM hang when I paused the receiving VM. This was with 0.7.9. Can you give me some more details on your dataset, and which ZoL version you're running?

@tonyhutter
Copy link
Contributor

I setup a test where I continuously wrote to a mirrored pool, while simultaneously offlining/onlining disks in the pool. While doing that, I would run zpool status in another window, and sometimes see small hangs. I tracked it down to this call path:

zpool status -> ZFS_IOC_POOL_STATS ioctl() -> spa_get_stats()->spa_open_common()-> spa_open_common() -> spa_config_generate()

In spa_config_generate(), I noticed that spa_config_enter() would sometimes take up to 4 seconds:

nvlist_t *
spa_config_generate(spa_t *spa, vdev_t *vd, uint64_t txg, int getstats)
{
...
                spa_config_enter(spa, SCL_CONFIG | SCL_STATE, FTAG, RW_READER); 
...

I narrowed it down further, and saw that it was the SCL_CONFIG lock (and not SCL_STATE) that was taking the 4 seconds. I'm not sure how much we can improve that lock, short of major surgery. With that in mind, I think implementing @behlendorf's lockless /proc/spl/kstat/zfs/<pool>/health idea is the way to go for pacemaker.

tonyhutter added a commit to tonyhutter/zfs that referenced this issue May 25, 2018
1. Add a proc entry to display the pool's health:

$ cat /proc/spl/kstat/zfs/tank/health
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#7331
Requires-spl: refs/pull/705/head
tonyhutter added a commit to tonyhutter/zfs that referenced this issue May 29, 2018
1. Add a proc entry to display the pool's health:

$ cat /proc/spl/kstat/zfs/tank/health
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#7331
Requires-spl: refs/pull/705/head
tonyhutter added a commit to tonyhutter/zfs that referenced this issue May 29, 2018
1. Add a proc entry to display the pool's health:

$ cat /proc/spl/kstat/zfs/tank/health
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#7331
Requires-spl: refs/pull/705/head
tonyhutter added a commit to tonyhutter/zfs that referenced this issue May 29, 2018
1. Add a proc entry to display the pool's health:

$ cat /proc/spl/kstat/zfs/tank/health
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#7331
Requires-spl: refs/pull/705/head
tonyhutter added a commit to tonyhutter/zfs that referenced this issue May 29, 2018
1. Add a proc entry to display the pool's health:

$ cat /proc/spl/kstat/zfs/tank/health
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#7331
Requires-spl: refs/pull/705/head
tonyhutter added a commit to tonyhutter/zfs that referenced this issue May 30, 2018
1. Add a proc entry to display the pool's health:

$ cat /proc/spl/kstat/zfs/tank/health
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#7331
Requires-spl: refs/pull/705/head
tonyhutter added a commit to tonyhutter/zfs that referenced this issue May 30, 2018
1. Add a proc entry to display the pool's health:

$ cat /proc/spl/kstat/zfs/tank/health
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#7331
tonyhutter added a commit to tonyhutter/zfs that referenced this issue May 30, 2018
1. Add a proc entry to display the pool's health:

$ cat /proc/spl/kstat/zfs/tank/health
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#7331
tonyhutter added a commit to tonyhutter/zfs that referenced this issue May 30, 2018
1. Add a proc entry to display the pool's health:

$ cat /proc/spl/kstat/zfs/tank/health
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#7331
tonyhutter added a commit to tonyhutter/zfs that referenced this issue May 30, 2018
1. Add a proc entry to display the pool's health:

$ cat /proc/spl/kstat/zfs/tank/health
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#7331
tonyhutter added a commit to tonyhutter/zfs that referenced this issue May 30, 2018
1. Add a proc entry to display the pool's health:

$ cat /proc/spl/kstat/zfs/tank/health
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#7331
tonyhutter added a commit to tonyhutter/zfs that referenced this issue May 31, 2018
1. Add a proc entry to display the pool's health:

$ cat /proc/spl/kstat/zfs/tank/health
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#7331
tonyhutter added a commit to tonyhutter/zfs that referenced this issue Jun 5, 2018
1. Add a proc entry to display the pool's state:

$ cat /proc/spl/kstat/zfs/tank/state
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#7331
tonyhutter added a commit to tonyhutter/zfs that referenced this issue Aug 15, 2018
1. Add a proc entry to display the pool's state:

$ cat /proc/spl/kstat/zfs/tank/state
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#7331
Closes openzfs#7563
tonyhutter added a commit to tonyhutter/zfs that referenced this issue Aug 23, 2018
1. Add a proc entry to display the pool's state:

$ cat /proc/spl/kstat/zfs/tank/state
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#7331
Closes openzfs#7563
tonyhutter added a commit to tonyhutter/zfs that referenced this issue Aug 27, 2018
1. Add a proc entry to display the pool's state:

$ cat /proc/spl/kstat/zfs/tank/state
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#7331
Closes openzfs#7563
tonyhutter added a commit to tonyhutter/zfs that referenced this issue Sep 5, 2018
1. Add a proc entry to display the pool's state:

$ cat /proc/spl/kstat/zfs/tank/state
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#7331
Closes openzfs#7563
@gmelikov
Copy link
Member

JFYI PR with /proc usage was just merged to resource agents' master branch ClusterLabs/resource-agents@2bdeee4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests

6 participants