Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS deadlock with zfs receive and clamscan #3719

Closed
whatarcane opened this issue Aug 31, 2015 · 10 comments
Closed

ZFS deadlock with zfs receive and clamscan #3719

whatarcane opened this issue Aug 31, 2015 · 10 comments
Milestone

Comments

@whatarcane
Copy link

Anything that attempts to interact with the ZFS filesystem (df, lsof, etc) gets stuck in "D" state. Because "kill -9" has no effect on these processes, they are not periodically in the "D" state, but are stuck. It seems this is a different issue than the other reported deadlock, #3687, because load on the system consists only of processes in the "D" state.

kernel: 3.2.68-1+deb7u3
ZFS is 0.6.4-1.2-1-wheezy from Debian ZoL packages

The deadlock happens when "/sbin/zfs receive -F" operations are attempted while running clamscan on the filesystem (which has property readonly=on). Most "zpool" operations succeed (e.g., "status"), and other filesystems can be accessed, properties changed, unmounted, etc. However the affected filesystem can't be unmounted and "zpool export" fails. The rest of the system is fully operational and responsive. It seems reproducible in that every time I've run clamscan with "zfs receive -F" being done every 10 minutes, after a few hours the filesystem deadlocks. It doesn't deadlock on the first few receive operations. "reboot" and "reboot -f" fail, a power cycle operation is needed to reboot the server.

@kernelOfTruth
Copy link
Contributor

just throwing in what comes up on the mind:

you're running with/without NCQ ?
(I'm collecting a few negative experiences with NCQ but my observations haven't been too conclusive thus far)

cfq is the i/o scheduler ?

@whatarcane
Copy link
Author

The scheduler is noop. The zpool is constituted of 3 RAID 1 arrays striped together:
NAME STATE READ WRITE CKSUM
localzfs ONLINE 0 0 0
scsi-36... ONLINE 0 0 0
scsi-36... ONLINE 0 0 0
scsi-36... ONLINE 0 0 0

errors: No known data errors
I know we lost the nice capability to auto-repair with checksums, but the PERC H710 controller doesn't support passthrough (JBOD) mode and RAID-0s for each drive are inconvenient when we want to hot-swap drives.

@whatarcane
Copy link
Author

I've been trying to narrow down the conditions needed to reproduce this. It turns out clamscan+zfs receive are not enough. However it happens only when clamscan runs. I'll update this ticket with what I manage to find.

@whatarcane
Copy link
Author

I've not been able to come up with a deterministic way to trigger this. Symptoms are similar to issue "Processes that access zvols hang in D state", #3667

@whatarcane
Copy link
Author

This issue seems to appear when using a replication stream (send -R with receive -F). It hasn't happened since I switched to using send/receive for each snapshot individually, and to destroying expired ones individually.

@whatarcane
Copy link
Author

It still happens with zfs-dkms 0.6.5.1-2-wheezy, running kernel 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u3 x86_64 GNU/Linux. This is the information I gathered as load is now beyond 30 and the machine will soon grind to a halt and will have to be power cycled. System was rebooted last evening after installing the latest ZFS packages. Current uptime is 15:42.

CPUs: 12/24T
Memory: 128GB
VM/Hypervisor: no
ECC mem: yes
Distribution: Debian GNU/Linux
Kernel version: 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u3 x86_64 GNU/Linux
SPL/ZFS source: Packages from Debian Wheezy ZoL repository
SPL/ZFS version:
[ 7.859180] SPL: Loaded module v0.6.5-1-wheezy
[ 8.059730] ZFS: Loaded module v0.6.5.1-2-wheezy, ZFS pool version 5000, ZFS filesystem version 5
[ 10.693108] SPL: using hostid 0x2e808513
System services: system is on the receiving end of zfs send | receive commands, runs clamscan and is otherwise on standby with services stopped (except basic ones like cron, ntpd, ssh).

Processes in the "D" state:
root 119 0.0 0.0 0 0 ? D Sep20 0:50 [kswapd0]
root 120 0.0 0.0 0 0 ? D Sep20 0:31 [kswapd1]
root 941 0.0 0.0 33152 1560 ? Ds 05:50 0:00 /sbin/zfs rollback -r localzfs/[email protected]
root 1322 0.0 0.0 33152 1560 ? Ds 07:40 0:00 /sbin/zfs rollback -r localzfs/[email protected]
root 2580 0.0 0.0 4228 112 ? D 07:48 0:00 /usr/bin/lsof -n +c15
root 4636 50.1 0.2 445604 362520 ? D 01:01 206:19 /usr/bin/clamscan -r -i --max-files=99999 --max-scansize=0 --max-dir-recursion=999 --no-summary --block-encrypted --scan-swf=no /www
root 4738 0.0 0.0 33152 1560 ? Ds 07:50 0:00 /sbin/zfs rollback -r localzfs/[email protected]
root 5464 0.0 0.0 6456 972 ? D 06:00 0:00 lsof -i UDP:ntp
root 6859 0.0 0.0 33152 1560 ? Ds 06:00 0:00 /sbin/zfs rollback -r localzfs/[email protected]
root 9572 0.0 0.0 33152 1556 ? Ds 06:10 0:00 /sbin/zfs rollback -r localzfs/[email protected]
root 10672 0.0 0.0 6456 972 ? D 06:20 0:00 lsof -i UDP:ntp
root 12091 0.0 0.0 33152 1556 ? Ds 06:20 0:00 /sbin/zfs rollback -r localzfs/[email protected]
root 14851 0.0 0.0 33152 1556 ? Ds 06:30 0:00 /sbin/zfs rollback -r localzfs/[email protected]
root 15971 0.0 0.0 6456 968 ? D 06:40 0:00 lsof -i UDP:ntp
root 17393 0.0 0.0 33152 1560 ? Ds 06:40 0:00 /sbin/zfs rollback -r localzfs/[email protected]
root 20315 0.0 0.0 33152 1560 ? Ds 06:50 0:00 /sbin/zfs rollback -r localzfs/[email protected]
root 20859 0.1 0.0 32928 1392 ? D 05:00 0:11 /sbin/zfs receive localzfs/[email protected]
root 21448 0.0 0.0 6456 968 ? D 07:00 0:00 lsof -i UDP:ntp
root 21997 0.0 0.0 4228 112 ? D 05:09 0:00 /usr/bin/lsof -n +c15
root 22931 0.0 0.0 33152 1560 ? Ds 07:00 0:00 /sbin/zfs rollback -r localzfs/[email protected]
root 23416 0.0 0.0 32928 1388 ? D 05:10 0:00 /sbin/zfs receive localzfs/[email protected]
root 25076 0.0 0.0 6448 968 ? D 05:20 0:00 lsof -i UDP:ntp
root 25637 0.0 0.0 33152 1560 ? Ds 07:10 0:00 /sbin/zfs rollback -r localzfs/[email protected]
root 25849 0.0 0.0 33152 1556 ? Ds 05:20 0:00 /sbin/zfs rollback -r localzfs/[email protected]
root 26738 0.0 0.0 6456 972 ? D 07:20 0:00 lsof -i UDP:ntp
root 28258 0.0 0.0 33152 1564 ? Ds 07:20 0:00 /sbin/zfs rollback -r localzfs/[email protected]
root 28365 0.0 0.0 33152 1564 ? Ds 05:30 0:00 /sbin/zfs rollback -r localzfs/[email protected]
root 30013 0.0 0.0 6448 968 ? D 05:40 0:00 lsof -i UDP:ntp
root 30831 0.0 0.0 33152 1560 ? Ds 05:40 0:00 /sbin/zfs rollback -r localzfs/[email protected]
root 31022 0.0 0.0 33152 1556 ? Ds 07:30 0:00 /sbin/zfs rollback -r localzfs/[email protected]
root 32191 0.0 0.0 6456 976 ? D 07:40 0:00 lsof -i UDP:ntp

zdb

localzfs:
version: 5000
name: 'localzfs'
state: 0
txg: 246040
pool_guid: 4444991534874121777
errata: 0
hostid: 780174611
hostname: 'xxxxxxxxxxxx'
vdev_children: 3
vdev_tree:
type: 'root'
id: 0
guid: 4444991534874121777
children[0]:
type: 'disk'
id: 0
guid: 12398468484731398433
path: '/dev/sdc1'
whole_disk: 1
metaslab_array: 39
metaslab_shift: 34
ashift: 12
asize: 3000019451904
is_log: 0
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 14659634516062223553
path: '/dev/sde1'
whole_disk: 1
metaslab_array: 37
metaslab_shift: 34
ashift: 12
asize: 3000019451904
is_log: 0
create_txg: 4
children[2]:
type: 'disk'
id: 2
guid: 13765222038191402172
path: '/dev/sdf1'
whole_disk: 1
metaslab_array: 35
metaslab_shift: 34
ashift: 12
asize: 3000019451904
is_log: 0
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data

grep processor /proc/cpuinfo | wc -l

24

free

         total       used       free     shared    buffers     cached

Mem: 132241672 129544664 2697008 0 447856 64076024
-/+ buffers/cache: 65020784 67220888
Swap: 91272028 0 91272028

for param in /sys/module/{spl,zfs}/parameters/*; do printf "%-50s" basename $param; cat $param; done

spl_hostid 780174611
spl_hostid_path /etc/hostid
spl_kmem_alloc_max 1048576
spl_kmem_alloc_warn 32768
spl_kmem_cache_expire 2
spl_kmem_cache_kmem_limit 2048
spl_kmem_cache_kmem_threads 4
spl_kmem_cache_magazine_size 0
spl_kmem_cache_max_size 32
spl_kmem_cache_obj_per_slab 8
spl_kmem_cache_obj_per_slab_min 1
spl_kmem_cache_reclaim 0
spl_kmem_cache_slab_limit 16384
spl_taskq_thread_bind 0
spl_taskq_thread_dynamic 1
spl_taskq_thread_priority 1
spl_taskq_thread_sequential 4
l2arc_feed_again 1
l2arc_feed_min_ms 200
l2arc_feed_secs 1
l2arc_headroom 2
l2arc_headroom_boost 200
l2arc_nocompress 0
l2arc_noprefetch 1
l2arc_norw 0
l2arc_write_boost 8388608
l2arc_write_max 8388608
metaslab_aliquot 524288
metaslab_bias_enabled 1
metaslab_debug_load 0
metaslab_debug_unload 0
metaslab_fragmentation_factor_enabled 1
metaslab_lba_weighting_enabled 1
metaslab_preload_enabled 1
metaslabs_per_vdev 200
spa_asize_inflation 24
spa_config_path /etc/zfs/zpool.cache
spa_load_verify_data 1
spa_load_verify_maxinflight 10000
spa_load_verify_metadata 1
spa_slop_shift 5
zfetch_array_rd_sz 1048576
zfetch_block_cap 256
zfetch_max_streams 8
zfetch_min_sec_reap 2
zfs_admin_snapshot 0
zfs_arc_average_blocksize 8192
zfs_arc_grow_retry 0
zfs_arc_lotsfree_percent 10
zfs_arc_max 0
zfs_arc_meta_adjust_restarts 4096
zfs_arc_meta_limit 0
zfs_arc_meta_min 0
zfs_arc_meta_prune 10000
zfs_arc_meta_strategy 1
zfs_arc_min 0
zfs_arc_min_prefetch_lifespan 0
zfs_arc_num_sublists_per_state 24
zfs_arc_p_aggressive_disable 1
zfs_arc_p_dampener_disable 1
zfs_arc_p_min_shift 0
zfs_arc_shrink_shift 0
zfs_arc_sys_free 0
zfs_autoimport_disable 1
zfs_dbgmsg_enable 0
zfs_dbgmsg_maxsize 4194304
zfs_dbuf_state_index 0
zfs_deadman_enabled 1
zfs_deadman_synctime_ms 1000000
zfs_dedup_prefetch 0
zfs_delay_min_dirty_percent 60
zfs_delay_scale 500000
zfs_dirty_data_max 13541547212
zfs_dirty_data_max_max 33853868032
zfs_dirty_data_max_max_percent 25
zfs_dirty_data_max_percent 10
zfs_dirty_data_sync 67108864
zfs_disable_dup_eviction 0
zfs_expire_snapshot 300
zfs_flags 0
zfs_free_leak_on_eio 0
zfs_free_max_blocks 100000
zfs_free_min_time_ms 1000
zfs_immediate_write_sz 32768
zfs_max_recordsize 1048576
zfs_mdcomp_disable 0
zfs_metaslab_fragmentation_threshold 70
zfs_mg_fragmentation_threshold 85
zfs_mg_noalloc_threshold 0
zfs_no_scrub_io 0
zfs_no_scrub_prefetch 0
zfs_nocacheflush 0
zfs_nopwrite_enabled 1
zfs_pd_bytes_max 52428800
zfs_prefetch_disable 0
zfs_read_chunk_size 1048576
zfs_read_history 0
zfs_read_history_hits 0
zfs_recover 0
zfs_resilver_delay 2
zfs_resilver_min_time_ms 3000
zfs_scan_idle 50
zfs_scan_min_time_ms 1000
zfs_scrub_delay 4
zfs_send_corrupt_data 0
zfs_sync_pass_deferred_free 2
zfs_sync_pass_dont_compress 5
zfs_sync_pass_rewrite 2
zfs_top_maxinflight 32
zfs_txg_history 0
zfs_txg_timeout 5
zfs_vdev_aggregation_limit 131072
zfs_vdev_async_read_max_active 3
zfs_vdev_async_read_min_active 1
zfs_vdev_async_write_active_max_dirty_percent 60
zfs_vdev_async_write_active_min_dirty_percent 30
zfs_vdev_async_write_max_active 10
zfs_vdev_async_write_min_active 1
zfs_vdev_cache_bshift 16
zfs_vdev_cache_max 16384
zfs_vdev_cache_size 0
zfs_vdev_max_active 1000
zfs_vdev_mirror_switch_us 10000
zfs_vdev_read_gap_limit 32768
zfs_vdev_scheduler noop
zfs_vdev_scrub_max_active 2
zfs_vdev_scrub_min_active 1
zfs_vdev_sync_read_max_active 10
zfs_vdev_sync_read_min_active 10
zfs_vdev_sync_write_max_active 10
zfs_vdev_sync_write_min_active 10
zfs_vdev_write_gap_limit 4096
zfs_zevent_cols 80
zfs_zevent_console 0
zfs_zevent_len_max 512
zil_replay_disable 0
zil_slog_limit 1048576
zio_delay_max 30000
zio_requeue_io_start_cut_in_line 1
zvol_inhibit_dev 0
zvol_major 230
zvol_max_discard_blocks 16384
zvol_prefetch_bytes 131072

cat /proc/spl/kstat/zfs/arcstats

6 1 0x01 91 4368 7547810102 55406549897135
name type data
hits 4 22249056
misses 4 2594312
demand_data_hits 4 14394145
demand_data_misses 4 1312542
demand_metadata_hits 4 5333207
demand_metadata_misses 4 607799
prefetch_data_hits 4 4719
prefetch_data_misses 4 609872
prefetch_metadata_hits 4 2516985
prefetch_metadata_misses 4 64099
mru_hits 4 12103219
mru_ghost_hits 4 14760
mfu_hits 4 7638630
mfu_ghost_hits 4 399
deleted 4 669149
mutex_miss 4 37
evict_skip 4 804
evict_not_enough 4 0
evict_l2_cached 4 0
evict_l2_eligible 4 85123566080
evict_l2_ineligible 4 6921351168
evict_l2_skip 4 0
hash_elements 4 1485105
hash_elements_max 4 1829210
hash_collisions 4 124180
hash_chains 4 60784
hash_chain_max 4 4
p 4 3212628498
c 4 30102312152
c_min 4 33554432
c_max 4 67707736064
size 4 30067906896
hdr_size 4 545097008
data_size 4 26700071424
metadata_size 4 1228627968
other_size 4 1594110496
anon_size 4 2637824
anon_evictable_data 4 0
anon_evictable_metadata 4 0
mru_size 4 6222422528
mru_evictable_data 4 6108322304
mru_evictable_metadata 4 107796480
mru_ghost_size 4 23861027328
mru_ghost_evictable_data 4 21916299264
mru_ghost_evictable_metadata 4 1944728064
mfu_size 4 21703639040
mfu_evictable_data 4 20591749120
mfu_evictable_metadata 4 455376896
mfu_ghost_size 4 4649071104
mfu_ghost_evictable_data 4 4611969024
mfu_ghost_evictable_metadata 4 37102080
l2_hits 4 0
l2_misses 4 0
l2_feeds 4 0
l2_rw_clash 4 0
l2_read_bytes 4 0
l2_write_bytes 4 0
l2_writes_sent 4 0
l2_writes_done 4 0
l2_writes_error 4 0
l2_writes_lock_retry 4 0
l2_evict_lock_retry 4 0
l2_evict_reading 4 0
l2_evict_l1cached 4 0
l2_free_on_write 4 0
l2_cdata_free_on_write 4 0
l2_abort_lowmem 4 0
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 0
l2_asize 4 0
l2_hdr_size 4 0
l2_compress_successes 4 0
l2_compress_zeros 4 0
l2_compress_failures 4 0
memory_throttle_count 4 0
duplicate_buffers 4 0
duplicate_buffers_size 4 0
duplicate_reads 4 0
memory_direct_count 4 90
memory_indirect_count 4 70430
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 0
arc_meta_used 4 3367835472
arc_meta_limit 4 50780802048
arc_meta_max 4 4405720496
arc_meta_min 4 16777216
arc_need_free 4 0
arc_sys_free 4 2115866624

cat /proc/spl/kmem/slab

--------------------- cache ------------------------------------------------------- ----- slab ------ ---- object ----- --- emergency ---
name flags size alloc slabsize objsize total alloc max total alloc max dlock alloc max
spl_vn_cache 0x00020 65536 30576 8192 104 8 7 13 336 294 546 0 0 0
spl_vn_file_cache 0x00020 57344 28224 8192 112 7 6 13 294 252 546 0 0 0
spl_zlib_workspace_cache 0x00240 0 0 2145216 268104 0 0 0 0 0 0 0 0 0
ddt_cache 0x00040 398464 248560 199232 24856 2 2 2 16 10 16 0 0 0
zio_buf_20480 0x00042 13045760 10485760 200704 20480 65 64 91 520 512 728 0 0 0
zio_data_buf_20480 0x00042 734777344 501084160 200704 20480 3661 3661 4345 29288 24467 34760 0 0 0
zio_buf_24576 0x00042 11907072 9830400 233472 24576 51 50 94 408 400 752 0 0 0
zio_data_buf_24576 0x00042 83582976 18554880 233472 24576 358 358 1062 2864 755 8496 0 0 0
zio_buf_28672 0x00042 10649600 8945664 266240 28672 40 39 94 320 312 752 0 0 0
zio_data_buf_28672 0x00042 76943360 16314368 266240 28672 289 289 744 2312 569 5952 0 0 0
zio_buf_32768 0x00042 13455360 11534336 299008 32768 45 44 86 360 352 688 0 0 0
zio_data_buf_32768 0x00042 58306560 15663104 299008 32768 195 195 754 1560 478 6032 0 0 0
zio_buf_40960 0x00042 16769024 14745600 364544 40960 46 45 83 368 360 664 0 0 0
zio_data_buf_40960 0x00042 81657856 17489920 364544 40960 224 224 1143 1792 427 9144 0 0 0
zio_buf_49152 0x00042 16343040 14548992 430080 49152 38 37 79 304 296 632 0 0 0
zio_data_buf_49152 0x00042 84295680 18677760 430080 49152 196 196 1281 1568 380 10248 0 0 0
zio_buf_57344 0x00042 20815872 18808832 495616 57344 42 41 81 336 328 648 0 0 0
zio_data_buf_57344 0x00042 48074752 7397376 495616 57344 97 97 1109 776 129 8872 0 0 0
zio_buf_65536 0x00042 26374144 24117248 561152 65536 47 46 71 376 368 568 0 0 0
zio_data_buf_65536 0x00042 43208704 7929856 561152 65536 77 77 1023 616 121 8184 0 0 0
zio_buf_81920 0x00042 31150080 28835840 692224 81920 45 44 95 360 352 760 0 0 0
zio_data_buf_81920 0x00042 70606848 16547840 692224 81920 102 102 1708 816 202 13664 0 0 0
zio_buf_98304 0x00042 38694912 36175872 823296 98304 47 46 92 376 368 736 0 0 0
zio_data_buf_98304 0x00042 76566528 16121856 823296 98304 93 93 1288 744 164 10304 0 0 0
zio_buf_114688 0x00042 69668864 66060288 954368 114688 73 72 113 584 576 904 0 0 0
zio_data_buf_114688 0x00042 71577600 16629760 954368 114688 75 75 769 600 145 6152 0 0 0
zio_buf_131072 0x00042 72724480 59768832 1085440 131072 67 67 284 536 456 2272 0 0 0
zio_data_buf_131072 0x00042 44856893440 22373728256 1085440 131072 41326 41326 59669 330608 170698 477352 0 0 0
zio_buf_163840 0x00042 0 0 1347584 163840 0 0 0 0 0 0 0 0 0
zio_data_buf_163840 0x00042 0 0 1347584 163840 0 0 0 0 0 0 0 0 0
zio_buf_196608 0x00042 0 0 1609728 196608 0 0 0 0 0 0 0 0 0
zio_data_buf_196608 0x00042 0 0 1609728 196608 0 0 0 0 0 0 0 0 0
zio_buf_229376 0x00042 0 0 1871872 229376 0 0 0 0 0 0 0 0 0
zio_data_buf_229376 0x00042 0 0 1871872 229376 0 0 0 0 0 0 0 0 0
zio_buf_262144 0x00042 0 0 2134016 262144 0 0 0 0 0 0 0 0 0
zio_data_buf_262144 0x00042 0 0 2134016 262144 0 0 0 0 0 0 0 0 0
zio_buf_327680 0x00042 0 0 2658304 327680 0 0 0 0 0 0 0 0 0
zio_data_buf_327680 0x00042 0 0 2658304 327680 0 0 0 0 0 0 0 0 0
zio_buf_393216 0x00042 0 0 3182592 393216 0 0 0 0 0 0 0 0 0
zio_data_buf_393216 0x00042 0 0 3182592 393216 0 0 0 0 0 0 0 0 0
zio_buf_458752 0x00042 0 0 3706880 458752 0 0 0 0 0 0 0 0 0
zio_data_buf_458752 0x00042 0 0 3706880 458752 0 0 0 0 0 0 0 0 0
zio_buf_524288 0x00042 0 0 4231168 524288 0 0 0 0 0 0 0 0 0
zio_data_buf_524288 0x00042 0 0 4231168 524288 0 0 0 0 0 0 0 0 0
zio_buf_655360 0x00042 0 0 5279744 655360 0 0 0 0 0 0 0 0 0
zio_data_buf_655360 0x00042 0 0 5279744 655360 0 0 0 0 0 0 0 0 0
zio_buf_786432 0x00042 0 0 6328320 786432 0 0 0 0 0 0 0 0 0
zio_data_buf_786432 0x00042 0 0 6328320 786432 0 0 0 0 0 0 0 0 0
zio_buf_917504 0x00042 0 0 7376896 917504 0 0 0 0 0 0 0 0 0
zio_data_buf_917504 0x00042 0 0 7376896 917504 0 0 0 0 0 0 0 0 0
zio_buf_1048576 0x00042 0 0 8425472 1048576 0 0 0 0 0 0 0 0 0
zio_data_buf_1048576 0x00042 0 0 8425472 1048576 0 0 0 0 0 0 0 0 0
zio_buf_1310720 0x00042 0 0 10522624 1310720 0 0 0 0 0 0 0 0 0
zio_data_buf_1310720 0x00042 0 0 10522624 1310720 0 0 0 0 0 0 0 0 0
zio_buf_1572864 0x00042 0 0 12619776 1572864 0 0 0 0 0 0 0 0 0
zio_data_buf_1572864 0x00042 0 0 12619776 1572864 0 0 0 0 0 0 0 0 0
zio_buf_1835008 0x00042 0 0 14716928 1835008 0 0 0 0 0 0 0 0 0
zio_data_buf_1835008 0x00042 0 0 14716928 1835008 0 0 0 0 0 0 0 0 0
zio_buf_2097152 0x00042 0 0 16814080 2097152 0 0 0 0 0 0 0 0 0
zio_data_buf_2097152 0x00042 0 0 16814080 2097152 0 0 0 0 0 0 0 0 0
zio_buf_2621440 0x00042 0 0 21008384 2621440 0 0 0 0 0 0 0 0 0
zio_data_buf_2621440 0x00042 0 0 21008384 2621440 0 0 0 0 0 0 0 0 0
zio_buf_3145728 0x00042 0 0 25202688 3145728 0 0 0 0 0 0 0 0 0
zio_data_buf_3145728 0x00042 0 0 25202688 3145728 0 0 0 0 0 0 0 0 0
zio_buf_3670016 0x00042 0 0 29396992 3670016 0 0 0 0 0 0 0 0 0
zio_data_buf_3670016 0x00042 0 0 29396992 3670016 0 0 0 0 0 0 0 0 0
zio_buf_4194304 0x00042 0 0 29392896 4194304 0 0 0 0 0 0 0 0 0
zio_data_buf_4194304 0x00042 0 0 29392896 4194304 0 0 0 0 0 0 0 0 0
zio_buf_5242880 0x00042 0 0 31485952 5242880 0 0 0 0 0 0 0 0 0
zio_data_buf_5242880 0x00042 0 0 31485952 5242880 0 0 0 0 0 0 0 0 0
zio_buf_6291456 0x00042 0 0 31481856 6291456 0 0 0 0 0 0 0 0 0
zio_data_buf_6291456 0x00042 0 0 31481856 6291456 0 0 0 0 0 0 0 0 0
zio_buf_7340032 0x00042 0 0 29380608 7340032 0 0 0 0 0 0 0 0 0
zio_data_buf_7340032 0x00042 0 0 29380608 7340032 0 0 0 0 0 0 0 0 0
zio_buf_8388608 0x00042 0 0 25182208 8388608 0 0 0 0 0 0 0 0 0
zio_data_buf_8388608 0x00042 0 0 25182208 8388608 0 0 0 0 0 0 0 0 0
zio_buf_10485760 0x00042 0 0 31473664 10485760 0 0 0 0 0 0 0 0 0
zio_data_buf_10485760 0x00042 0 0 31473664 10485760 0 0 0 0 0 0 0 0 0
zio_buf_12582912 0x00042 0 0 25178112 12582912 0 0 0 0 0 0 0 0 0
zio_data_buf_12582912 0x00042 0 0 25178112 12582912 0 0 0 0 0 0 0 0 0
zio_buf_14680064 0x00042 0 0 29372416 14680064 0 0 0 0 0 0 0 0 0
zio_data_buf_14680064 0x00042 0 0 29372416 14680064 0 0 0 0 0 0 0 0 0
zio_buf_16777216 0x00042 0 0 16785408 16777216 0 0 0 0 0 0 0 0 0
zio_data_buf_16777216 0x00042 0 0 16785408 16777216 0 0 0 0 0 0 0 0 0

stacktraces.log:

119:

[] mutex_spin_on_owner+0x26/0x39
[] zpl_nr_cached_objects+0x1a/0x5f [zfs]
[] prune_super+0x65/0x149
[] shrink_slab+0x96/0x24d
[] balance_pgdat+0x283/0x4b7
[] kswapd+0x2e0/0x319
[] autoremove_wake_function+0x0/0x2a
[] kswapd+0x0/0x319
[] kthread+0x76/0x7e
[] kernel_thread_helper+0x4/0x10
[] kthread+0x0/0x7e
[] kernel_thread_helper+0x0/0x10

[] 0xffffffffffffffff

120:

[] zpl_nr_cached_objects+0x1a/0x5f [zfs]
[] prune_super+0x65/0x149
[] shrink_slab+0x96/0x24d
[] balance_pgdat+0x283/0x4b7
[] kswapd+0x2e0/0x319
[] autoremove_wake_function+0x0/0x2a
[] kswapd+0x0/0x319
[] kthread+0x76/0x7e
[] kernel_thread_helper+0x4/0x10
[] kthread+0x0/0x7e
[] kernel_thread_helper+0x0/0x10

[] 0xffffffffffffffff

941:
/sbin/zfs^@Rollback^@-r^@localzfs/[email protected]^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_write+0x34/0x88 [zfs]
[] rrm_enter_write+0x1a/0x24 [zfs]
[] zfs_sb_teardown+0x6d/0x259 [zfs]
[] zfs_ioc_rollback+0x2b/0x7b [zfs]
[] zfsdev_ioctl+0x21e/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] do_brk+0x22d/0x280
[] remove_vma+0x64/0x6b
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

1322:
/sbin/zfs^@Rollback^@-r^@localzfs/[email protected]^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_write+0x34/0x88 [zfs]
[] rrm_enter_write+0x1a/0x24 [zfs]
[] zfs_sb_teardown+0x6d/0x259 [zfs]
[] zfs_ioc_rollback+0x2b/0x7b [zfs]
[] zfsdev_ioctl+0x21e/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] do_brk+0x22d/0x280
[] remove_vma+0x64/0x6b
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

2580:
/usr/bin/lsof^@-n^@+c15^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_read_impl+0x4a/0x130 [zfs]
[] zfs_getattr_fast+0x33/0x171 [zfs]
[] zpl_getattr+0x34/0x4a [zfs]
[] vfs_fstatat+0x48/0x60
[] sys_newstat+0x12/0x2b
[] vfs_read+0xb8/0xe6
[] sys_read+0x5f/0x6b
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

4636:
/usr/bin/clamscan^@-r^@-i^@--max-files=99999^@--max-scansize=0^@--max-dir-recursion=999^@--no-summary^@--block-encrypted^@--scan-swf=no^@/www^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_read_impl+0x4a/0x130 [zfs]
[] zfs_close+0x2f/0x9a [zfs]
[] zpl_release+0x63/0x83 [zfs]
[] fput+0xf9/0x1a1
[] filp_close+0x62/0x6a
[] sys_close+0x8e/0xcb
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

4738:
/sbin/zfs^@Rollback^@-r^@localzfs/[email protected]^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_write+0x34/0x88 [zfs]
[] rrm_enter_write+0x1a/0x24 [zfs]
[] zfs_sb_teardown+0x6d/0x259 [zfs]
[] zfs_ioc_rollback+0x2b/0x7b [zfs]
[] zfsdev_ioctl+0x21e/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] do_brk+0x22d/0x280
[] remove_vma+0x64/0x6b
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

5464:
lsof^@-i^@udp:ntp^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_read_impl+0x4a/0x130 [zfs]
[] zfs_getattr_fast+0x33/0x171 [zfs]
[] zpl_getattr+0x34/0x4a [zfs]
[] vfs_fstatat+0x48/0x60
[] sys_newstat+0x12/0x2b
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

6859:
/sbin/zfs^@Rollback^@-r^@localzfs/[email protected]^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_write+0x34/0x88 [zfs]
[] rrm_enter_write+0x1a/0x24 [zfs]
[] zfs_sb_teardown+0x6d/0x259 [zfs]
[] zfs_ioc_rollback+0x2b/0x7b [zfs]
[] zfsdev_ioctl+0x21e/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] do_brk+0x22d/0x280
[] remove_vma+0x64/0x6b
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

/sbin/zfs^@Rollback^@-r^@localzfs/[email protected]^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_write+0x34/0x88 [zfs]
[] rrm_enter_write+0x1a/0x24 [zfs]
[] zfs_sb_teardown+0x6d/0x259 [zfs]
[] zfs_ioc_rollback+0x2b/0x7b [zfs]
[] zfsdev_ioctl+0x21e/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] do_brk+0x22d/0x280
[] remove_vma+0x64/0x6b
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

10672:
lsof^@-i^@udp:ntp^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_read_impl+0x4a/0x130 [zfs]
[] zfs_getattr_fast+0x33/0x171 [zfs]
[] zpl_getattr+0x34/0x4a [zfs]
[] vfs_fstatat+0x48/0x60
[] sys_newstat+0x12/0x2b
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

12091:
/sbin/zfs^@Rollback^@-r^@localzfs/[email protected]^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_write+0x34/0x88 [zfs]
[] rrm_enter_write+0x1a/0x24 [zfs]
[] zfs_sb_teardown+0x6d/0x259 [zfs]
[] zfs_ioc_rollback+0x2b/0x7b [zfs]
[] zfsdev_ioctl+0x21e/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] do_brk+0x22d/0x280
[] remove_vma+0x64/0x6b
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

14851:
/sbin/zfs^@Rollback^@-r^@localzfs/[email protected]^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_write+0x34/0x88 [zfs]
[] rrm_enter_write+0x1a/0x24 [zfs]
[] zfs_sb_teardown+0x6d/0x259 [zfs]
[] zfs_ioc_rollback+0x2b/0x7b [zfs]
[] zfsdev_ioctl+0x21e/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] do_brk+0x22d/0x280
[] remove_vma+0x64/0x6b
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

15971:
lsof^@-i^@udp:ntp^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_read_impl+0x4a/0x130 [zfs]
[] zfs_getattr_fast+0x33/0x171 [zfs]
[] zpl_getattr+0x34/0x4a [zfs]
[] vfs_fstatat+0x48/0x60
[] sys_newstat+0x12/0x2b
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

17393:
/sbin/zfs^@Rollback^@-r^@localzfs/[email protected]^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_write+0x34/0x88 [zfs]
[] rrm_enter_write+0x1a/0x24 [zfs]
[] zfs_sb_teardown+0x6d/0x259 [zfs]
[] zfs_ioc_rollback+0x2b/0x7b [zfs]
[] zfsdev_ioctl+0x21e/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] do_brk+0x22d/0x280
[] remove_vma+0x64/0x6b
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

20315:
/sbin/zfs^@Rollback^@-r^@localzfs/[email protected]^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_write+0x34/0x88 [zfs]
[] rrm_enter_write+0x1a/0x24 [zfs]
[] zfs_sb_teardown+0x6d/0x259 [zfs]
[] zfs_ioc_rollback+0x2b/0x7b [zfs]
[] zfsdev_ioctl+0x21e/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] do_brk+0x22d/0x280
[] remove_vma+0x64/0x6b
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

20859:
/sbin/zfs^@receive^@localzfs/[email protected]^@
[] mutex_spin_on_owner+0x2a/0x39
[] zpl_nr_cached_objects+0x1a/0x5f [zfs]
[] prune_super+0x65/0x149
[] shrink_slab+0x96/0x24d
[] do_try_to_free_pages+0x2a7/0x41c
[] try_to_free_pages+0xa9/0xe9
[] __alloc_pages_nodemask+0x4ee/0x7ab
[] kmem_getpages+0x4c/0xd9
[] fallback_alloc+0x13c/0x1dd
[] kmem_cache_alloc+0x73/0xea
[] spl_kmem_cache_alloc+0x3e/0x654 [spl]
[] arch_local_irq_save+0x11/0x17
[] RW_WRITE_HELD+0x29/0x3f [zfs]
[] dbuf_read+0x143/0x5a0 [zfs]
[] dbuf_hold_impl+0x7b/0x8c [zfs]
[] kfree+0x5b/0x6c
[] dnode_create+0x29/0x1bc [zfs]
[] dnode_hold_impl+0x2b5/0x406 [zfs]
[] dmu_bonus_hold+0x1e/0x194 [zfs]
[] zfs_rezget+0x18b/0x4b1 [zfs]
[] zfs_resume_fs+0x229/0x2f1 [zfs]
[] zfs_ioc_recv+0x926/0xa85 [zfs]
[] __cache_free.isra.41+0x3d/0x195
[] kfree+0x5b/0x6c
[] virt_to_head_page+0x6/0x29
[] virt_to_slab+0x6/0x16
[] __cache_free.isra.41+0x3d/0x195
[] rrw_exit+0xbb/0x11a [zfs]
[] strdup+0x2b/0x3f [spl]
[] strdup+0x2b/0x3f [spl]
[] __kmalloc+0x100/0x112
[] zfsdev_ioctl+0x330/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

21448:
lsof^@-i^@udp:ntp^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_read_impl+0x4a/0x130 [zfs]
[] zfs_getattr_fast+0x33/0x171 [zfs]
[] zpl_getattr+0x34/0x4a [zfs]
[] vfs_fstatat+0x48/0x60
[] sys_newstat+0x12/0x2b
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

21997:
/usr/bin/lsof^@-n^@+c15^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_read_impl+0x4a/0x130 [zfs]
[] zfs_getattr_fast+0x33/0x171 [zfs]
[] zpl_getattr+0x34/0x4a [zfs]
[] vfs_fstatat+0x48/0x60
[] sys_newstat+0x12/0x2b
[] vfs_read+0xb8/0xe6
[] sys_read+0x5f/0x6b
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

22931:
/sbin/zfs^@Rollback^@-r^@localzfs/[email protected]^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_write+0x34/0x88 [zfs]
[] rrm_enter_write+0x1a/0x24 [zfs]
[] zfs_sb_teardown+0x6d/0x259 [zfs]
[] zfs_ioc_rollback+0x2b/0x7b [zfs]
[] zfsdev_ioctl+0x21e/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] do_brk+0x22d/0x280
[] remove_vma+0x64/0x6b
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

23416:
/sbin/zfs^@receive^@localzfs/[email protected]^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_write+0x34/0x88 [zfs]
[] rrm_enter_write+0x1a/0x24 [zfs]
[] zfs_sb_teardown+0x6d/0x259 [zfs]
[] zfs_ioc_recv+0x8f4/0xa85 [zfs]
[] __cache_free.isra.41+0x3d/0x195
[] kfree+0x5b/0x6c
[] virt_to_head_page+0x6/0x29
[] virt_to_slab+0x6/0x16
[] __cache_free.isra.41+0x3d/0x195
[] rrw_exit+0xbb/0x11a [zfs]
[] strdup+0x2b/0x3f [spl]
[] strdup+0x2b/0x3f [spl]
[] __kmalloc+0x100/0x112
[] zfsdev_ioctl+0x330/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

25076:
lsof^@-i^@udp:ntp^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_read_impl+0x4a/0x130 [zfs]
[] zfs_getattr_fast+0x33/0x171 [zfs]
[] zpl_getattr+0x34/0x4a [zfs]
[] vfs_fstatat+0x48/0x60
[] sys_newstat+0x12/0x2b
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

25637:
/sbin/zfs^@Rollback^@-r^@localzfs/[email protected]^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_write+0x34/0x88 [zfs]
[] rrm_enter_write+0x1a/0x24 [zfs]
[] zfs_sb_teardown+0x6d/0x259 [zfs]
[] zfs_ioc_rollback+0x2b/0x7b [zfs]
[] zfsdev_ioctl+0x21e/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] do_brk+0x22d/0x280
[] remove_vma+0x64/0x6b
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

25849:
/sbin/zfs^@Rollback^@-r^@localzfs/[email protected]^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_write+0x34/0x88 [zfs]
[] rrm_enter_write+0x1a/0x24 [zfs]
[] zfs_sb_teardown+0x6d/0x259 [zfs]
[] zfs_ioc_rollback+0x2b/0x7b [zfs]
[] zfsdev_ioctl+0x21e/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] do_brk+0x22d/0x280
[] remove_vma+0x64/0x6b
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

26738:
lsof^@-i^@udp:ntp^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_read_impl+0x4a/0x130 [zfs]
[] zfs_getattr_fast+0x33/0x171 [zfs]
[] zpl_getattr+0x34/0x4a [zfs]
[] vfs_fstatat+0x48/0x60
[] sys_newstat+0x12/0x2b
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

28258:
/sbin/zfs^@Rollback^@-r^@localzfs/[email protected]^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_write+0x34/0x88 [zfs]
[] rrm_enter_write+0x1a/0x24 [zfs]
[] zfs_sb_teardown+0x6d/0x259 [zfs]
[] zfs_ioc_rollback+0x2b/0x7b [zfs]
[] zfsdev_ioctl+0x21e/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] do_brk+0x22d/0x280
[] remove_vma+0x64/0x6b
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

28365:
/sbin/zfs^@Rollback^@-r^@localzfs/[email protected]^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_write+0x34/0x88 [zfs]
[] rrm_enter_write+0x1a/0x24 [zfs]
[] zfs_sb_teardown+0x6d/0x259 [zfs]
[] zfs_ioc_rollback+0x2b/0x7b [zfs]
[] zfsdev_ioctl+0x21e/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] do_brk+0x22d/0x280
[] remove_vma+0x64/0x6b
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

30013:
lsof^@-i^@udp:ntp^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_read_impl+0x4a/0x130 [zfs]
[] zfs_getattr_fast+0x33/0x171 [zfs]
[] zpl_getattr+0x34/0x4a [zfs]
[] vfs_fstatat+0x48/0x60
[] sys_newstat+0x12/0x2b
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

30831:
/sbin/zfs^@Rollback^@-r^@localzfs/[email protected]^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_write+0x34/0x88 [zfs]
[] rrm_enter_write+0x1a/0x24 [zfs]
[] zfs_sb_teardown+0x6d/0x259 [zfs]
[] zfs_ioc_rollback+0x2b/0x7b [zfs]
[] zfsdev_ioctl+0x21e/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] do_brk+0x22d/0x280
[] remove_vma+0x64/0x6b
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

31022:
/sbin/zfs^@Rollback^@-r^@localzfs/[email protected]^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_write+0x34/0x88 [zfs]
[] rrm_enter_write+0x1a/0x24 [zfs]
[] zfs_sb_teardown+0x6d/0x259 [zfs]
[] zfs_ioc_rollback+0x2b/0x7b [zfs]
[] zfsdev_ioctl+0x21e/0x3e2 [zfs]
[] do_vfs_ioctl+0x459/0x49a
[] do_brk+0x22d/0x280
[] remove_vma+0x64/0x6b
[] sys_ioctl+0x4b/0x72
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

32191:
lsof^@-i^@udp:ntp^@
[] cv_wait_common+0xb5/0x117 [spl]
[] autoremove_wake_function+0x0/0x2a
[] should_resched+0x5/0x23
[] rrw_enter_read_impl+0x4a/0x130 [zfs]
[] zfs_getattr_fast+0x33/0x171 [zfs]
[] zpl_getattr+0x34/0x4a [zfs]
[] vfs_fstatat+0x48/0x60
[] sys_newstat+0x12/0x2b
[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

@whatarcane
Copy link
Author

As a side note, the zfs rollback commands are something new I just added in the sync script since I stopped using the "-F" option for receive. They are issued as a naive attempt to recover after a failed receive operation.

@behlendorf behlendorf added this to the 0.7.0 milestone Sep 22, 2015
@whatarcane
Copy link
Author

It happened again this morning, without even the rollback commands. We'll fall back on rsync until 0.7.0, thanks for classifying the report.

behlendorf added a commit to behlendorf/zfs that referenced this issue Sep 23, 2015
The zpl_nr_cached_objects() function has been disabled because in the
current code it doesn't provide any critical functionality and it may
result in a deadlock under certain circumstances.  However, because
we expect to need these hooks in the future this code has not been
entirely removed.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3719
@behlendorf
Copy link
Contributor

@whatarcane I think we can sort this out for 0.6.5.2. I had a chance to look at the stack you posted more carefully and was able to identify the deadlock your hitting. It goes like this:

[] zpl_nr_cached_objects+0x1a/0x5f [zfs] -> Takes zsb->z_znodes_lock again - deadlocked.
[] prune_super+0x65/0x149
[] shrink_slab+0x96/0x24d
[] do_try_to_free_pages+0x2a7/0x41c
[] try_to_free_pages+0xa9/0xe9 (Direct reclaim entered)
[] __alloc_pages_nodemask+0x4ee/0x7ab
[] kmem_getpages+0x4c/0xd9
[] kmem_cache_alloc+0x73/0xea
[] spl_kmem_cache_alloc+0x3e/0x654 [spl]
[] dbuf_read+0x143/0x5a0 [zfs]
[] dbuf_hold_impl+0x7b/0x8c [zfs]
[] dnode_create+0x29/0x1bc [zfs]
[] dnode_hold_impl+0x2b5/0x406 [zfs]
[] dmu_bonus_hold+0x1e/0x194 [zfs]
[] zfs_rezget+0x18b/0x4b1 [zfs]
[] zfs_resume_fs+0x229/0x2f1 [zfs] -> Takes zsb->z_znodes_lock
[] zfs_ioc_recv+0x926/0xa85 [zfs]

I've proposed a fix in #3825.

@behlendorf behlendorf modified the milestones: 0.6.5.2, 0.7.0 Sep 23, 2015
behlendorf added a commit that referenced this issue Sep 25, 2015
The zpl_nr_cached_objects() function has been disabled because in the
current code it doesn't provide any critical functionality and it may
result in a deadlock under certain circumstances.  However, because
we expect to need these hooks in the future this code has not been
entirely removed.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue #3719
@behlendorf
Copy link
Contributor

Resolved by 0487056 which will be cherry-picked in to 0.6.5.2.

behlendorf added a commit that referenced this issue Sep 30, 2015
The zpl_nr_cached_objects() function has been disabled because in the
current code it doesn't provide any critical functionality and it may
result in a deadlock under certain circumstances.  However, because
we expect to need these hooks in the future this code has not been
entirely removed.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue #3719
MorpheusTeam pushed a commit to Xyratex/lustre-stable that referenced this issue Oct 17, 2015
ZFS/SPL 0.6.5.2

Bug Fixes
* Init script fixes openzfs/zfs#3816
* Fix uioskip crash when skip to end openzfs/zfs#3806
  openzfs/zfs#3850
* Userspace can trigger an assertion openzfs/zfs#3792
* Fix quota userused underflow bug openzfs/zfs#3789
* Fix performance regression from unwanted synchronous I/O
  openzfs/zfs#3780
* Fix deadlock during ARC reclaim openzfs/zfs#3808
  openzfs/zfs#3834
* Fix deadlock with zfs receive and clamscan openzfs/zfs#3719
* Allow NFS activity to defer snapshot unmounts openzfs/zfs#3794
* Linux 4.3 compatibility openzfs/zfs#3799
* Zed reload fixes openzfs/zfs#3773
* Fix PAX Patch/Grsec SLAB_USERCOPY panic openzfs/zfs#3796
* Always remove during dkms uninstall/update openzfs/spl#476

ZFS/SPL 0.6.5.1

Bug Fixes

* Fix zvol corruption with TRIM/discard openzfs/zfs#3798
* Fix NULL as mount(2) syscall data parameter openzfs/zfs#3804
* Fix xattr=sa dataset property not honored openzfs/zfs#3787

ZFS/SPL 0.6.5

Supported Kernels

* Compatible with 2.6.32 - 4.2 Linux kernels.

New Functionality

* Support for temporary mount options.
* Support for accessing the .zfs/snapshot over NFS.
* Support for estimating send stream size when source is a bookmark.
* Administrative commands are allowed to use reserved space improving
  robustness.
* New notify ZEDLETs support email and pushbullet notifications.
* New keyword 'slot' for vdev_id.conf to control what is use for the
  slot number.
* New zpool export -a option unmounts and exports all imported pools.
* New zpool iostat -y omits the first report with statistics since
  boot.
* New zdb can now open the root dataset.
* New zdb can print the numbers of ganged blocks.
* New zdb -ddddd can print details of block pointer objects.
* New zdb -b performance improved.
* New zstreamdump -d prints contents of blocks.

New Feature Flags

* large_blocks - This feature allows the record size on a dataset to
be set larger than 128KB. We currently support block sizes from 512
bytes to 16MB. The benefits of larger blocks, and thus larger IO, need
to be weighed against the cost of COWing a giant block to modify one
byte. Additionally, very large blocks can have an impact on I/O
latency, and also potentially on the memory allocator. Therefore, we
do not allow the record size to be set larger than zfs_max_recordsize
(default 1MB). Larger blocks can be created by changing this tuning,
pools with larger blocks can always be imported and used, regardless
of this setting.

* filesystem_limits - This feature enables filesystem and snapshot
limits. These limits can be used to control how many filesystems
and/or snapshots can be created at the point in the tree on which the
limits are set.

*Performance*

* Improved zvol performance on all kernels (>50% higher throughput,
  >20% lower latency)
* Improved zil performance on Linux 2.6.39 and earlier kernels (10x
  lower latency)
* Improved allocation behavior on mostly full SSD/file pools (5% to
  10% improvement on 90% full pools)
* Improved performance when removing large files.
* Caching improvements (ARC):
** Better cached read performance due to reduced lock contention.
** Smarter heuristics for managing the total size of the cache and the
   distribution of data/metadata.
** Faster release of cached buffers due to unexpected memory pressure.

*Changes in Behavior*

* Default reserved space was increased from 1.6% to 3.3% of total pool
capacity. This default percentage can be controlled through the new
spa_slop_shift module option, setting it to 6 will restore the
previous percentage.

* Loading of the ZFS module stack is now handled by systemd or the
sysv init scripts. Invoking the zfs/zpool commands will not cause the
modules to be automatically loaded. The previous behavior can be
restored by setting the ZFS_MODULE_LOADING=yes environment variable
but this functionality will be removed in a future release.

* Unified SYSV and Gentoo OpenRC initialization scripts. The previous
functionality has been split in to zfs-import, zfs-mount, zfs-share,
and zfs-zed scripts. This allows for independent control of the
services and is consistent with the unit files provided for a systemd
based system. Complete details of the functionality provided by the
updated scripts can be found here.

* Task queues are now dynamic and worker threads will be created and
destroyed as needed. This allows the system to automatically tune
itself to ensure the optimal number of threads are used for the active
workload which can result in a performance improvement.

* Task queue thread priorities were correctly aligned with the default
Linux file system thread priorities. This allows ZFS to compete fairly
with other active Linux file systems when the system is under heavy
load.

* When compression=on the default compression algorithm will be lz4 as
long as the feature is enabled. Otherwise the default remains lzjb.
Similarly lz4 is now the preferred method for compressing meta data
when available.

* The use of mkdir/rmdir/mv in the .zfs/snapshot directory has been
disabled by default both locally and via NFS clients. The
zfs_admin_snapshot module option can be used to re-enable this
functionality.

* LBA weighting is automatically disabled on files and SSDs ensuring
the entire device is used fairly.
* iostat accounting on zvols running on kernels older than Linux 3.19
is no longer supported.

* The known issues preventing swap on zvols for Linux 3.9 and newer
kernels have been resolved. However, deadlocks are still possible for
older kernels.

Module Options

* Changed zfs_arc_c_min default from 4M to 32M to accommodate large
  blocks.
* Added metaslab_aliquot to control how many bytes are written to a
  top-level vdev before moving on to the next one. Increasing this may
  be helpful when using blocks larger than 1M.
* Added spa_slop_shift, see 'reserved space' comment in the 'changes
  to behavior' section.
* Added zfs_admin_snapshot, enable/disable the use of mkdir/rmdir/mv
  in .zfs/snapshot directory.
* Added zfs_arc_lotsfree_percent, throttle I/O when free system
  memory drops below this percentage.
* Added zfs_arc_num_sublists_per_state, used to allow more
  fine-grained locking.
* Added zfs_arc_p_min_shift, used to set a floor on arc_p.
* Added zfs_arc_sys_free, the target number of bytes the ARC should
  leave as free.
* Added zfs_dbgmsg_enable, used to enable the 'dbgmsg' kstat.
* Added zfs_dbgmsg_maxsize, sets the maximum size of the dbgmsg
  buffer.
* Added zfs_max_recordsize, used to control the maximum allowed
  record size.
* Added zfs_arc_meta_strategy, used to select the preferred ARC
  reclaim strategy.
* Removed metaslab_min_alloc_size, it was unused internally due to
  prior changes.
* Removed zfs_arc_memory_throttle_disable, replaced by
  zfs_arc_lotsfree_percent.
* Removed zvol_threads, zvols no longer require a dedicated task
  queue.
* See zfs-module-parameters(5) for complete details on available
  module options.

Bug Fixes

* Improved documentation with many updates, corrections, and
  additions.
* Improved sysv, systemd, initramfs, and dracut support.
* Improved block pointer validation before issuing IO.
* Improved scrub pause heuristics.
* Improved test coverage.
* Improved heuristics for automatic repair when zfs_recover=1 module
  option is set.
* Improved debugging infrastructure via 'dbgmsg' kstat.
* Improved zpool import performance.
* Fixed deadlocks in direct memory reclaim.
* Fixed deadlock on db_mtx and dn_holds.
* Fixed deadlock in dmu_objset_find_dp().
* Fixed deadlock during zfs rollback.
* Fixed kernel panic due to tsd_exit() in ZFS_EXIT.
* Fixed kernel panic when adding a duplicate dbuf to dn_dbufs.
* Fixed kernel panic due to security / ACL creation failure.
* Fixed kernel panic on unmount due to iput taskq.
* Fixed panic due to corrupt nvlist when running utilities.
* Fixed panic on unmount due to not waiting for all znodes to be
  released.
* Fixed panic with zfs clone from different source and target pools.
* Fixed NULL pointer dereference in dsl_prop_get_ds().
* Fixed NULL pointer dereference in dsl_prop_notify_all_cb().
* Fixed NULL pointer dereference in zfsdev_getminor().
* Fixed I/Os are now aggregated across ZIO priority classes.
* Fixed .zfs/snapshot auto-mounting for all supported kernels.
* Fixed 3-digit octal escapes by changing to 4-digit which
  disambiguate the output.
* Fixed hard lockup due to infinite loop in zfs_zget().
* Fixed misreported 'alloc' value for cache devices.
* Fixed spurious hung task watchdog stack traces.
* Fixed direct memory reclaim deadlocks.
* Fixed module loading in zfs import systemd service.
* Fixed intermittent libzfs_init() failure to open /dev/zfs.
* Fixed hot-disk sparing for disk vdevs
* Fixed system spinning during ARC reclaim.
* Fixed formatting errors in {{zfs(8)}}
* Fixed zio pipeline stall by having callers invoke next stage.
* Fixed assertion failed in zrl_tryenter().
* Fixed memory leak in make_root_vdev().
* Fixed memory leak in zpool_in_use().
* Fixed memory leak in libzfs when doing rollback.
* Fixed hold leak in dmu_recv_end_check().
* Fixed refcount leak in bpobj_iterate_impl().
* Fixed misuse of input argument in traverse_visitbp().
* Fixed missing missing mutex_destroy() calls.
* Fixed integer overflows in dmu_read/dmu_write.
* Fixed verify() failure in zio_done().
* Fixed zio_checksum_error() to only include info for ECKSUM errors.
* Fixed -ESTALE to force lookup on missing NFS file handles.
* Fixed spurious failures from dsl_dataset_hold_obj().
* Fixed zfs compressratio when using with 4k sector size.
* Fixed spurious watchdog warnings in prefetch thread.
* Fixed unfair disk space allocation when vdevs are of unequal size.
* Fixed ashift accounting error writing to cache devices.
* Fixed zdb -d has false positive warning when
  feature@large_blocks=disabled.
* Fixed zdb -h | -i seg fault.
* Fixed force-received full stream into a dataset if it has a
  snapshot.
* Fixed snapshot error handling.
* Fixed 'hangs' while deleting large files.
* Fixed lock contention (rrw_exit) while running a read only load.
* Fixed error message when creating a pool to include all problematic
  devices.
* Fixed Xen virtual block device detection, partitions are now
  created.
* Fixed missing E2BIG error handling in zfs_setprop_error().
* Fixed zpool import assertion in libzfs_import.c.
* Fixed zfs send -nv output to stderr.
* Fixed idle pool potentially running itself out of space.
* Fixed narrow race which allowed read(2) to access beyond fstat(2)'s
  reported end-of-file.
* Fixed support for VPATH builds.
* Fixed double counting of HDR_L2ONLY_SIZE in ARC.
* Fixed 'BUG: Bad page state' warning from kernel due to writeback
  flag.
* Fixed arc_available_memory() to check freemem.
* Fixed arc_memory_throttle() to check pageout.
* Fixed'zpool create warning when using zvols in debug builds.
* Fixed loop devices layered on ZFS with 4.1 kernels.
* Fixed zvol contribution to kernel entropy pool.
* Fixed handling of compression flags in arc header.
* Substantial changes to realign code base with illumos.
* Many additional bug fixes.

Signed-off-by: Nathaniel Clark <[email protected]>
Change-Id: I87c012aec9ec581b10a417d699dafc7d415abf63
Reviewed-on: http://review.whamcloud.com/16399
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <[email protected]>
Tested-by: Maloo <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants