Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async_destroy can block a pool from doing anything useful #12697

Open
rincebrain opened this issue Oct 27, 2021 · 1 comment
Open

async_destroy can block a pool from doing anything useful #12697

rincebrain opened this issue Oct 27, 2021 · 1 comment
Labels
Type: Performance Performance improvement or performance problem

Comments

@rincebrain
Copy link
Contributor

System information

Type Version/Name
Distribution Name Debian
Distribution Version 11.1
Kernel Version 5.10.0-9-amd64
Architecture amd64
OpenZFS Version 2.0.6

Describe the problem you're observing

(I was reminded of not having reported this when reading #12693, so here. I'll probably try to run it down when I have free time, but dunno when that'll be...)

A week or two ago, I was cleaning up after some experiments in compression and recordsize, and I decided to try doing something of the form cat dslist | while read LINE;do zfs destroy -r $LINE & done;

This was 60 datasets or so, each around 30 GB, with 50+ non-empty snapshots each. I assumed it would spend a bit of time on it and move on.

Instead, after 15 minutes, none of the 60 destroys had returned, attempting to TERM or KILL them all failed because they were all in state D, and zpool or zfs anything on the pool hung the same way. Oops.

Eventually, I concluded this was probably due to async_destroy somehow stalling out normal work, reached out and twiddled zfs_free_min_time_ms first down to 100, then when that made no difference, down to 0.

Upon setting it to 0, all the blocked processes exited ~instantly, and when I twiddled it back to 1000, the system behaved normally again, cleaning up the huge amount of space (comparatively) I just freed.

Describe how to reproduce the problem

As above.

Include any warning/errors/backtraces from the system logs

The system logged only a few of these, around 10, once, about 10 minutes after I initially ran the destroys:

Oct 22 10:56:00 steamer kernel: INFO: task zfs:117215 blocked for more than 120 seconds.
Oct 22 10:56:00 steamer kernel:       Tainted: P           OE     5.10.0-9-amd64 #1 Debian 5.10.70-1
Oct 22 10:56:00 steamer kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 22 10:56:00 steamer kernel: task:zfs             state:D stack:    0 pid:117215 ppid:     1 flags:0x00004004
Oct 22 10:56:00 steamer kernel: Call Trace:
Oct 22 10:56:00 steamer kernel:  __schedule+0x282/0x870
Oct 22 10:56:00 steamer kernel:  schedule+0x46/0xb0
Oct 22 10:56:00 steamer kernel:  io_schedule+0x42/0x70
Oct 22 10:56:00 steamer kernel:  cv_wait_common+0xed/0x290 [spl]
Oct 22 10:56:00 steamer kernel:  ? add_wait_queue_exclusive+0x70/0x70
Oct 22 10:56:00 steamer kernel:  txg_wait_synced_impl+0xea/0x250 [zfs]
Oct 22 10:56:00 steamer kernel:  txg_wait_synced+0xc/0x40 [zfs]
Oct 22 10:56:00 steamer kernel:  dsl_sync_task_common+0x1ab/0x280 [zfs]
Oct 22 10:56:00 steamer kernel:  ? zcp_eval_impl+0x450/0x450 [zfs]
Oct 22 10:56:00 steamer kernel:  ? dsl_dir_phys+0x10/0x10 [zfs]
Oct 22 10:56:00 steamer kernel:  ? dsl_dir_phys+0x10/0x10 [zfs]
Oct 22 10:56:00 steamer kernel:  ? zcp_eval_impl+0x450/0x450 [zfs]
Oct 22 10:56:00 steamer kernel:  dsl_sync_task_sig+0x11/0x20 [zfs]
Oct 22 10:56:00 steamer kernel:  zcp_eval+0x47d/0x830 [zfs]
Oct 22 10:56:00 steamer kernel:  ? sysvec_apic_timer_interrupt+0x36/0x80
Oct 22 10:56:00 steamer kernel:  ? spl_kmem_alloc_impl+0xb3/0xc0 [spl]
Oct 22 10:56:00 steamer kernel:  ? nv_mem_zalloc+0x2f/0x40 [znvpair]
Oct 22 10:56:00 steamer kernel:  dsl_destroy_snapshots_nvl+0x15b/0x2f0 [zfs]
Oct 22 10:56:00 steamer kernel:  ? mutex_lock+0xe/0x30
Oct 22 10:56:00 steamer kernel:  zfs_ioc_destroy_snaps+0x157/0x160 [zfs]
Oct 22 10:56:00 steamer kernel:  zfsdev_ioctl_common+0x208/0x680 [zfs]
Oct 22 10:56:00 steamer kernel:  ? _copy_from_user+0x28/0x60
Oct 22 10:56:00 steamer kernel:  zfsdev_ioctl+0x53/0xe0 [zfs]
Oct 22 10:56:00 steamer kernel:  __x64_sys_ioctl+0x83/0xb0
Oct 22 10:56:00 steamer kernel:  do_syscall_64+0x33/0x80
Oct 22 10:56:00 steamer kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 22 10:56:00 steamer kernel: RIP: 0033:0x7f6682b28cc7
Oct 22 10:56:00 steamer kernel: RSP: 002b:00007ffe44f2cc68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Oct 22 10:56:00 steamer kernel: RAX: ffffffffffffffda RBX: 00007ffe44f303e8 RCX: 00007f6682b28cc7
Oct 22 10:56:00 steamer kernel: RDX: 00007ffe44f2cc90 RSI: 0000000000005a3b RDI: 0000000000000005
Oct 22 10:56:00 steamer kernel: RBP: 00007ffe44f30280 R08: 0000556566ee1d20 R09: 00007f6682bf2be0
Oct 22 10:56:00 steamer kernel: R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000005a3b
Oct 22 10:56:00 steamer kernel: R13: 00007ffe44f2cc90 R14: 0000000000005a3b R15: 0000556566eab5e0
Oct 22 10:56:01 steamer kernel: INFO: task zfs:117226 blocked for more than 120 seconds.
Oct 22 10:56:01 steamer kernel:       Tainted: P           OE     5.10.0-9-amd64 #1 Debian 5.10.70-1
Oct 22 10:56:01 steamer kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 22 10:56:01 steamer kernel: task:zfs             state:D stack:    0 pid:117226 ppid:     1 flags:0x00004000
Oct 22 10:56:01 steamer kernel: Call Trace:
Oct 22 10:56:01 steamer kernel:  __schedule+0x282/0x870
Oct 22 10:56:01 steamer kernel:  schedule+0x46/0xb0
Oct 22 10:56:01 steamer kernel:  io_schedule+0x42/0x70
Oct 22 10:56:01 steamer kernel:  cv_wait_common+0xed/0x290 [spl]
Oct 22 10:56:01 steamer kernel:  ? add_wait_queue_exclusive+0x70/0x70
Oct 22 10:56:01 steamer kernel:  txg_wait_synced_impl+0xea/0x250 [zfs]
Oct 22 10:56:01 steamer kernel:  txg_wait_synced+0xc/0x40 [zfs]
Oct 22 10:56:01 steamer kernel:  zil_close+0x23c/0x390 [zfs]
Oct 22 10:56:01 steamer kernel:  zfsvfs_teardown+0x1c6/0x3a0 [zfs]
Oct 22 10:56:01 steamer kernel:  zfs_umount+0x31/0x120 [zfs]
Oct 22 10:56:01 steamer kernel:  zpl_put_super+0x26/0x70 [zfs]
Oct 22 10:56:01 steamer kernel:  generic_shutdown_super+0x6c/0x100
Oct 22 10:56:01 steamer kernel:  kill_anon_super+0x14/0x30
Oct 22 10:56:01 steamer kernel:  deactivate_locked_super+0x31/0x90
Oct 22 10:56:01 steamer kernel:  cleanup_mnt+0x12d/0x190
Oct 22 10:56:01 steamer kernel:  task_work_run+0x65/0xa0
Oct 22 10:56:01 steamer kernel:  exit_to_user_mode_prepare+0x111/0x120
Oct 22 10:56:01 steamer kernel:  syscall_exit_to_user_mode+0x28/0x140
Oct 22 10:56:01 steamer kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 22 10:56:01 steamer kernel: RIP: 0033:0x7f8518602e27
Oct 22 10:56:01 steamer kernel: RSP: 002b:00007fffa6b31678 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
Oct 22 10:56:01 steamer kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f8518602e27
Oct 22 10:56:01 steamer kernel: RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000560d0f7d3160
Oct 22 10:56:01 steamer kernel: RBP: 00007fffa6b316e0 R08: 0000000000000000 R09: 0000000000000000
Oct 22 10:56:01 steamer kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000560d0f7ba590
Oct 22 10:56:01 steamer kernel: R13: 0000560d0f7d3160 R14: 0000560d0f7b7570 R15: 0000000000000001
@rincebrain rincebrain added the Type: Defect Incorrect behavior (e.g. crash, hang) label Oct 27, 2021
@stale
Copy link

stale bot commented Oct 28, 2022

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Oct 28, 2022
@behlendorf behlendorf added Type: Performance Performance improvement or performance problem and removed Type: Defect Incorrect behavior (e.g. crash, hang) Status: Stale No recent activity for issue labels Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests

2 participants