Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARC grows well past the zfs_arc_max #676

Closed
DeHackEd opened this issue Apr 19, 2012 · 60 comments
Closed

ARC grows well past the zfs_arc_max #676

DeHackEd opened this issue Apr 19, 2012 · 60 comments
Labels
Component: Memory Management kernel memory management
Milestone

Comments

@DeHackEd
Copy link
Contributor

While running a lot of rsync instances (~45 at once... it makes sense in context) I found my ARC expanding beyond the set limit of 2 gigabytes until ZFS deadlocked - almost all rsync processes hung in the D state and some kernel threads as well. A reboot was necessary. It only took about 5-10 minutes of this kind of pressure to break.

# cat /proc/spl/kstat/zfs/arcstats 
4 1 0x01 77 3696 55271634900 965748228174
name                            type data
hits                            4    5620340
misses                          4    693436
demand_data_hits                4    212193
demand_data_misses              4    6069
demand_metadata_hits            4    3873689
demand_metadata_misses          4    320630
prefetch_data_hits              4    29368
prefetch_data_misses            4    77018
prefetch_metadata_hits          4    1505090
prefetch_metadata_misses        4    289719
mru_hits                        4    872626
mru_ghost_hits                  4    204427
mfu_hits                        4    3213295
mfu_ghost_hits                  4    70138
deleted                         4    221845
recycle_miss                    4    373068
mutex_miss                      4    120515
evict_skip                      4    279067774
evict_l2_cached                 4    0
evict_l2_eligible               4    11435467776
evict_l2_ineligible             4    5027812352
hash_elements                   4    201635
hash_elements_max               4    359811
hash_collisions                 4    211399
hash_chains                     4    47203
hash_chain_max                  4    8
p                               4    97061458
c                               4    753210703
c_min                           4    293601280
c_max                           4    2147483648
size                            4    3514897752
hdr_size                        4    96012384
data_size                       4    2391985664
other_size                      4    1026899704
anon_size                       4    4358144
anon_evict_data                 4    0
anon_evict_metadata             4    0
mru_size                        4    267850240
mru_evict_data                  4    2872320
mru_evict_metadata              4    10894848
mru_ghost_size                  4    465448960
mru_ghost_evict_data            4    8077824
mru_ghost_evict_metadata        4    457371136
mfu_size                        4    2119777280
mfu_evict_data                  4    0
mfu_evict_metadata              4    5223424
mfu_ghost_size                  4    272368640
mfu_ghost_evict_data            4    12076032
mfu_ghost_evict_metadata        4    260292608
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_hdr_miss              4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_free_on_write                4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_hdr_size                     4    0
memory_throttle_count           4    0
memory_direct_count             4    0
memory_indirect_count           4    0
arc_no_grow                     4    1
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    1746
arc_meta_used                   4    3512025432
arc_meta_limit                  4    536870912
arc_meta_max                    4    3518600752

Not too long after all ZFS-involved processes froze up.

Stack dumps for some hung processes:

SysRq : Show Blocked State
 task                        PC stack   pid father
kswapd0         D ffff8802235d7898     0   407      2 0x00000000
ffff8802235d75f0 0000000000000046 ffffffff8115539c ffff880221d2db20
ffff880216d7d0e0 000000000000ffc0 ffff880221d2dfd8 0000000000004000
ffff880221d2dfd8 000000000000ffc0 ffff8802235d75f0 ffff880221d2c010
Call Trace:
[] ? cpumask_any_but+0x28/0x37
[] ? __pagevec_release+0x19/0x22
[] ? move_active_pages_to_lru+0x130/0x154
[] ? __mutex_lock_slowpath+0xe2/0x128
[] ? arc_buf_remove_ref+0xe6/0xf4 [zfs]
[] ? mutex_lock+0x12/0x25
[] ? zfs_zinactive+0x5a/0xd4 [zfs]
[] ? zfs_inactive+0x106/0x19e [zfs]
[] ? evict+0x78/0x117
[] ? dispose_list+0x2c/0x36
[] ? shrink_icache_memory+0x278/0x2a8
[] ? shrink_slab+0xe3/0x153
[] ? balance_pgdat+0x33c/0x650
[] ? calculate_pressure_threshold+0x18/0x3b
[] ? kswapd+0x252/0x26b
[] ? wake_up_bit+0x23/0x23
[] ? balance_pgdat+0x650/0x650
[] ? balance_pgdat+0x650/0x650
[] ? kthread+0x7a/0x82
[] ? kernel_thread_helper+0x4/0x10
[] ? kthread_worker_fn+0x139/0x139
[] ? gs_change+0xb/0xb
snmpd           D ffff88022369a368     0  2158      1 0x00000000
ffff88022369a0c0 0000000000000086 ffffffff8115539c 0000000000000000
ffffffff81435020 000000000000ffc0 ffff88022150dfd8 0000000000004000
ffff88022150dfd8 000000000000ffc0 ffff88022369a0c0 ffff88022150c010
Call Trace:
[] ? cpumask_any_but+0x28/0x37
[] ? __pagevec_release+0x19/0x22
[] ? move_active_pages_to_lru+0x130/0x154
[] ? __mutex_lock_slowpath+0xe2/0x128
[] ? shrink_active_list+0x2e1/0x2fa
[] ? mutex_lock+0x12/0x25
[] ? zfs_zinactive+0x5a/0xd4 [zfs]
[] ? zfs_inactive+0x106/0x19e [zfs]
[] ? evict+0x78/0x117
[] ? dispose_list+0x2c/0x36
[] ? shrink_icache_memory+0x278/0x2a8
[] ? shrink_slab+0xe3/0x153
[] ? do_try_to_free_pages+0x253/0x3f0
[] ? try_to_free_pages+0x79/0x7e
[] ? __alloc_pages_nodemask+0x48b/0x6c8
[] ? __pollwait+0xd6/0xd6
[] ? __pollwait+0xd6/0xd6
[] ? __pollwait+0xd6/0xd6
[] ? handle_pte_fault+0x17f/0x7d9
[] ? bit_waitqueue+0x14/0x92
[] ? handle_mm_fault+0x3e/0x2cd
[] ? do_page_fault+0x31a/0x33f
[] ? seq_printf+0x56/0x7b
[] ? kobject_get+0x12/0x17
[] ? disk_part_iter_next+0x19/0xba
[] ? diskstats_show+0x3b5/0x3c9
[] ? page_fault+0x1f/0x30
[] ? copy_user_generic_string+0x2d/0x40
[] ? seq_read+0x2c2/0x339
[] ? proc_reg_read+0x6f/0x88
[] ? vfs_read+0xaa/0x14d
[] ? sys_read+0x45/0x6e
[] ? system_call_fastpath+0x16/0x1b
arc_reclaim     D ffff88022189ec78     0  2446      2 0x00000000
ffff88022189e9d0 0000000000000046 0000000000000202 0000000100000000
ffff88022e8a6750 000000000000ffc0 ffff880218767fd8 0000000000004000
ffff880218767fd8 000000000000ffc0 ffff88022189e9d0 ffff880218766010
Call Trace:
[] ? vsnprintf+0x7e/0x428
[] ? getnstimeofday+0x54/0xa5
[] ? call_function_single_interrupt+0xe/0x20
[] ? apic_timer_interrupt+0xe/0x20
[] ? call_function_single_interrupt+0xe/0x20
[] ? apic_timer_interrupt+0xe/0x20
[] ? __mutex_lock_slowpath+0xe2/0x128
[] ? arc_buf_remove_ref+0xe6/0xf4 [zfs]
[] ? mutex_lock+0x12/0x25
[] ? zfs_zinactive+0x5a/0xd4 [zfs]
[] ? zfs_inactive+0x106/0x19e [zfs]
[] ? evict+0x78/0x117
[] ? dispose_list+0x2c/0x36
[] ? shrink_icache_memory+0x278/0x2a8
[] ? zpl_inode_alloc+0x6e/0x6e [zfs]
[] ? zpl_prune_sbs+0x53/0x5e [zfs]
[] ? arc_adjust_meta+0x137/0x1a4 [zfs]
[] ? __thread_create+0x2df/0x2df [spl]
[] ? arc_reclaim_thread+0xb0/0x11c [zfs]
[] ? arc_adjust_meta+0x1a4/0x1a4 [zfs]
[] ? arc_adjust_meta+0x1a4/0x1a4 [zfs]
[] ? thread_generic_wrapper+0x6a/0x75 [spl]
[] ? kthread+0x7a/0x82
[] ? kernel_thread_helper+0x4/0x10
[] ? kthread_worker_fn+0x139/0x139
[] ? gs_change+0xb/0xb
watch           D ffff88021801e978     0  3060   3027 0x00000004
ffff88021801e6d0 0000000000000082 ffffffff8115539c ffff8802078f5a20
ffff88021807aea0 000000000000ffc0 ffff8802078f5fd8 0000000000004000
ffff8802078f5fd8 000000000000ffc0 ffff88021801e6d0 ffff8802078f4010
Call Trace:
[] ? cpumask_any_but+0x28/0x37
[] ? __pagevec_release+0x19/0x22
[] ? move_active_pages_to_lru+0x130/0x154
[] ? apic_timer_interrupt+0xe/0x20
[] ? __mutex_lock_slowpath+0xe2/0x128
[] ? arc_buf_remove_ref+0xe6/0xf4 [zfs]
[] ? mutex_lock+0x12/0x25
[] ? zfs_zinactive+0x5a/0xd4 [zfs]
[] ? zfs_inactive+0x106/0x19e [zfs]
[] ? evict+0x78/0x117
[] ? dispose_list+0x2c/0x36
[] ? shrink_icache_memory+0x278/0x2a8
[] ? shrink_slab+0xe3/0x153
[] ? d_instantiate+0x39/0x48
[] ? do_try_to_free_pages+0x253/0x3f0
[] ? try_to_free_pages+0x79/0x7e
[] ? __alloc_pages_nodemask+0x48b/0x6c8
[] ? copy_process+0xd7/0xff3
[] ? get_empty_filp+0x93/0x135
[] ? do_fork+0xf0/0x232
[] ? fd_install+0x27/0x4d
[] ? stub_clone+0x13/0x20
[] ? system_call_fastpath+0x16/0x1b
rsync           D ffff880221873958     0  3265   3264 0x00000000
ffff8802218736b0 0000000000000082 ffffffff8115539c ffff88011f229408
ffff880216d3a240 000000000000ffc0 ffff88011f229fd8 0000000000004000
ffff88011f229fd8 000000000000ffc0 ffff8802218736b0 ffff88011f228010
Call Trace:
[] ? cpumask_any_but+0x28/0x37
[] ? __pagevec_release+0x19/0x22
[] ? move_active_pages_to_lru+0x130/0x154
[] ? __mutex_lock_slowpath+0xe2/0x128
[] ? fsnotify_clear_marks_by_inode+0x20/0xbd
[] ? mutex_lock+0x12/0x25
[] ? zfs_zinactive+0x5a/0xd4 [zfs]
[] ? zfs_inactive+0x106/0x19e [zfs]
[] ? evict+0x78/0x117
[] ? dispose_list+0x2c/0x36
[] ? shrink_icache_memory+0x278/0x2a8
[] ? shrink_slab+0xe3/0x153
[] ? do_try_to_free_pages+0x253/0x3f0
[] ? try_to_free_pages+0x79/0x7e
[] ? __alloc_pages_nodemask+0x48b/0x6c8
[] ? __get_free_pages+0x12/0x52
[] ? spl_kmem_cache_alloc+0x236/0x975 [spl]
[] ? cache_alloc_refill+0x86/0x48f
[] ? dnode_create+0x2e/0x144 [zfs]
[] ? dnode_hold_impl+0x2ea/0x43c [zfs]
[] ? dmu_bonus_hold+0x22/0x26e [zfs]
[] ? zfs_zget+0x5c/0x19f [zfs]
[] ? zfs_dirent_lock+0x447/0x48f [zfs]
[] ? zfs_zaccess_aces_check+0x1d5/0x203 [zfs]
[] ? zfs_dirlook+0x20a/0x276 [zfs]
[] ? zfs_lookup+0x26e/0x2b6 [zfs]
[] ? zpl_lookup+0x47/0x80 [zfs]
[] ? d_alloc_and_lookup+0x43/0x60
[] ? do_lookup+0x1c9/0x2bb
[] ? path_lookupat+0xe2/0x5af
[] ? do_path_lookup+0x1d/0x5f
[] ? user_path_at_empty+0x49/0x84
[] ? tsd_exit+0x83/0x18d [spl]
[] ? __schedule+0x727/0x7b0
[] ? cp_new_stat+0xdf/0xf1
[] ? vfs_fstatat+0x43/0x70
[] ? sys_newlstat+0x11/0x2d
[] ? system_call_fastpath+0x16/0x1b
rsync           D ffff880208aa3088     0  3270   3269 0x00000000
ffff880208aa2de0 0000000000000082 000000000000000a ffff880100000000
ffff88022e8d6790 000000000000ffc0 ffff88012f173fd8 0000000000004000
ffff88012f173fd8 000000000000ffc0 ffff880208aa2de0 ffff88012f172010
Call Trace:
[] ? zap_get_leaf_byblk+0x1b5/0x249 [zfs]
[] ? zap_leaf_array_match+0x166/0x197 [zfs]
[] ? remove_reference+0x93/0x9f [zfs]
[] ? arc_buf_remove_ref+0xe6/0xf4 [zfs]
[] ? dbuf_rele_and_unlock+0x12b/0x19a [zfs]
[] ? __mutex_lock_slowpath+0xe2/0x128
[] ? mutex_lock+0x12/0x25
[] ? zfs_zget+0x46/0x19f [zfs]
[] ? zfs_dirent_lock+0x447/0x48f [zfs]
[] ? zfs_zaccess_aces_check+0x1d5/0x203 [zfs]
[] ? zfs_dirlook+0x20a/0x276 [zfs]
[] ? zfs_lookup+0x26e/0x2b6 [zfs]
[] ? zpl_lookup+0x47/0x80 [zfs]
[] ? d_alloc_and_lookup+0x43/0x60
[] ? do_lookup+0x1c9/0x2bb
[] ? path_lookupat+0xe2/0x5af
[] ? do_path_lookup+0x1d/0x5f
[] ? user_path_at_empty+0x49/0x84
[] ? tsd_exit+0x83/0x18d [spl]
[] ? mutex_lock+0x12/0x25
[] ? cp_new_stat+0xdf/0xf1
[] ? vfs_fstatat+0x43/0x70
[] ? sys_newlstat+0x11/0x2d
[] ? system_call_fastpath+0x16/0x1b
rsync           D ffff88010ec74b38     0  3497   3496 0x00000000
ffff88010ec74890 0000000000000086 ffffffff8115539c 0000000000000000
ffff88022e8d6790 000000000000ffc0 ffff8800b2fabfd8 0000000000004000
ffff8800b2fabfd8 000000000000ffc0 ffff88010ec74890 ffff8800b2faa010
Call Trace:
[] ? cpumask_any_but+0x28/0x37
[] ? __pagevec_release+0x19/0x22
[] ? move_active_pages_to_lru+0x130/0x154
[] ? __mutex_lock_slowpath+0xe2/0x128
[] ? shrink_active_list+0x2e1/0x2fa
[] ? mutex_lock+0x12/0x25
[] ? zfs_zinactive+0x5a/0xd4 [zfs]
[] ? zfs_inactive+0x106/0x19e [zfs]
[] ? evict+0x78/0x117
[] ? dispose_list+0x2c/0x36
[] ? shrink_icache_memory+0x278/0x2a8
[] ? shrink_slab+0xe3/0x153
[] ? do_try_to_free_pages+0x253/0x3f0
[] ? try_to_free_pages+0x79/0x7e
[] ? __alloc_pages_nodemask+0x48b/0x6c8
[] ? __get_free_pages+0x12/0x52
[] ? spl_kmem_cache_alloc+0x236/0x975 [spl]
[] ? __wake_up+0x35/0x46
[] ? cv_wait_common+0xf5/0x141 [spl]
[] ? wake_up_bit+0x23/0x23
[] ? dbuf_create+0x38/0x32e [zfs]
[] ? __dbuf_hold_impl+0x39a/0x3c6 [zfs]
[] ? remove_reference+0x93/0x9f [zfs]
[] ? __dbuf_hold_impl+0x271/0x3c6 [zfs]
[] ? __dbuf_hold_impl+0x17d/0x3c6 [zfs]
[] ? dbuf_hold_impl+0x6e/0x97 [zfs]
[] ? dbuf_prefetch+0x105/0x23d [zfs]
[] ? dmu_zfetch_dofetch+0xd7/0x113 [zfs]
[] ? dmu_zfetch+0x4b1/0xc5e [zfs]
[] ? dbuf_read+0xd1/0x5c0 [zfs]
[] ? dnode_hold_impl+0x1a8/0x43c [zfs]
[] ? dmu_buf_hold+0x33/0x161 [zfs]
[] ? zap_lockdir+0x57/0x5cc [zfs]
[] ? dmu_bonus_hold+0x212/0x26e [zfs]
[] ? zap_lookup_norm+0x40/0x160 [zfs]
[] ? kmem_alloc_debug+0x13c/0x2ba [spl]
[] ? zap_lookup+0x2a/0x30 [zfs]
[] ? zfs_dirent_lock+0x3cc/0x48f [zfs]
[] ? zfs_zaccess_aces_check+0x1d5/0x203 [zfs]
[] ? zfs_dirlook+0x20a/0x276 [zfs]
[] ? zfs_lookup+0x26e/0x2b6 [zfs]
[] ? zpl_lookup+0x47/0x80 [zfs]
[] ? d_alloc_and_lookup+0x43/0x60
[] ? do_lookup+0x1c9/0x2bb
[] ? path_lookupat+0xe2/0x5af
[] ? do_path_lookup+0x1d/0x5f
[] ? user_path_at_empty+0x49/0x84
[] ? tsd_exit+0x83/0x18d [spl]
[] ? cp_new_stat+0xdf/0xf1
[] ? vfs_fstatat+0x43/0x70
[] ? sys_newlstat+0x11/0x2d
[] ? system_call_fastpath+0x16/0x1b

Machine specs:
Single-socket quad-core Xeon
8 Gigs of RAM

SPL version: b29012b
ZFS version: 409dc1a
Kernel version: 3.0.28 vanilla custom build
ZPool version 26 (originally built/run by zfs-fuse)

  pool: tankname1
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
    still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
    pool will no longer be accessible on older software versions.
 scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    tankname1   ONLINE       0     0     0
      sda       ONLINE       0     0     0

errors: No known data errors

I've also tried using the module/zfs/arc.c from #669 for testing and reducing the ARC size. RAM usage still exceeds the limits set.

# egrep "^(c_|size)" /proc/spl/kstat/zfs/arcstats
c_min                           4    293601280
c_max                           4    1610612736
size                            4    2602236704

Nevertheless it's been running for a few hours now reliably.

(Edit: I also raised vm.min_free_kbytes from its default up to 262144 as part of a shotgun attempt to make this more stable.)

@ryao
Copy link
Contributor

ryao commented Apr 19, 2012

Unfortunately, I seem to have given you the wrong pull request number in IRC. The one that you should have tried was #618, not #669.

@GregorKopka
Copy link
Contributor

How did you manage to create pool version 26 with zfs-fuse?

@ryao
Copy link
Contributor

ryao commented Apr 19, 2012

Perhaps it is a typo and he meant pool version 23.

@behlendorf
Copy link
Contributor

I was going to suggest that you try the patch in issue #618 but you beat me too it. I'm a bit surprised that it doesn't resolve the issue, clearly there's more to be done.

@DeHackEd
Copy link
Contributor Author

I haven't had the chance to apply #618 yet. I'll try it tomorrow, time permitting.

@DeHackEd
Copy link
Contributor Author

No love. With #618 applied the same scenario still deadlocks after ignoring the arc size limit

These outputs are all taken after ZFS jammed up.

# free
             total       used       free     shared    buffers     cached
Mem:       8182236    7540728     641508          0      28528      43364
-/+ buffers/cache:    7468836     713400
Swap:      4194300      14492    4179808

# history | grep modprobe
modprobe zfs zfs_vdev_max_pending=20 zfs_arc_max=2147483648

    # cat /proc/spl/kstat/zfs/arcstats
    10 1 0x01 77 3696 141145250104682 144360035578750
    name                            type data
    hits                            4    5947062
    misses                          4    939766
    demand_data_hits                4    215008
    demand_data_misses              4    13180
    demand_metadata_hits            4    4365140
    demand_metadata_misses          4    469195
    prefetch_data_hits              4    20650
    prefetch_data_misses            4    53521
    prefetch_metadata_hits          4    1346264
    prefetch_metadata_misses        4    403870
    mru_hits                        4    895846
    mru_ghost_hits                  4    277838
    mfu_hits                        4    3684335
    mfu_ghost_hits                  4    189079
    deleted                         4    248071
    recycle_miss                    4    488895
    mutex_miss                      4    8046
    evict_skip                      4    362916066
    evict_l2_cached                 4    0
    evict_l2_eligible               4    14598742528
    evict_l2_ineligible             4    5304508416
    hash_elements                   4    253442
    hash_elements_max               4    253519
    hash_collisions                 4    240599
    hash_chains                     4    66173
    hash_chain_max                  4    7
    p                               4    134225408
    c                               4    2147483648
    c_min                           4    293601280
    c_max                           4    2147483648
    size                            4    3320870416
    hdr_size                        4    85214688
    data_size                       4    2363930112
    other_size                      4    871725616
    anon_size                       4    27773952
    anon_evict_data                 4    0
    anon_evict_metadata             4    0
    mru_size                        4    161863680
    mru_evict_data                  4    0
    mru_evict_metadata              4    3620864
    mru_ghost_size                  4    1977015808
    mru_ghost_evict_data            4    908544000
    mru_ghost_evict_metadata        4    1068471808
    mfu_size                        4    2174292480
    mfu_evict_data                  4    0
    mfu_evict_metadata              4    1738240
    mfu_ghost_size                  4    166193152
    mfu_ghost_evict_data            4    0
    mfu_ghost_evict_metadata        4    166193152
    l2_hits                         4    0
    l2_misses                       4    0
    l2_feeds                        4    0
    l2_rw_clash                     4    0
    l2_read_bytes                   4    0
    l2_write_bytes                  4    0
    l2_writes_sent                  4    0
    l2_writes_done                  4    0
    l2_writes_error                 4    0
    l2_writes_hdr_miss              4    0
    l2_evict_lock_retry             4    0
    l2_evict_reading                4    0
    l2_free_on_write                4    0
    l2_abort_lowmem                 4    0
    l2_cksum_bad                    4    0
    l2_io_error                     4    0
    l2_size                         4    0
    l2_hdr_size                     4    0
    memory_throttle_count           4    0
    memory_direct_count             4    0
    memory_indirect_count           4    0
    arc_no_grow                     4    0
    arc_tempreserve                 4    475136
    arc_loaned_bytes                4    262144
    arc_prune                       4    2313
    arc_meta_used                   4    3309051920
    arc_meta_limit                  4    536870912
    arc_meta_max                    4    3309157328

Stack traces, some are additional kernel threads.

z_wr_int/15     D ffff880113a49408     0 18763      2 0x00000000
ffff880113a49160 0000000000000046 ffffffff811ddf2f ffff880100000000
ffff88022e8e67d0 000000000000ffc0 ffff8801a8d05fd8 0000000000004000
ffff8801a8d05fd8 000000000000ffc0 ffff880113a49160 ffff8801a8d04010
Call Trace:
[] ? scsi_request_fn+0x314/0x3ef
[] ? __make_request+0x22a/0x243
[] ? generic_make_request+0x201/0x262
[] ? __mutex_lock_slowpath+0xe2/0x128
[] ? mutex_lock+0x12/0x25
[] ? vdev_cache_write+0x57/0x11d [zfs]
[] ? zio_vdev_io_done+0x6f/0x141 [zfs]
[] ? zio_execute+0xad/0xd1 [zfs]
[] ? taskq_thread+0x2c2/0x508 [spl]
[] ? try_to_wake_up+0x1d9/0x1eb
[] ? try_to_wake_up+0x1eb/0x1eb
[] ? spl_taskq_init+0x159/0x159 [spl]
[] ? spl_taskq_init+0x159/0x159 [spl]
[] ? kthread+0x7a/0x82
[] ? kernel_thread_helper+0x4/0x10
[] ? kthread_worker_fn+0x139/0x139
[] ? gs_change+0xb/0xb

txg_sync        D ffff88018865b898     0 18781      2 0x00000000
ffff88018865b5f0 0000000000000046 ffff88022fc10028 ffff88011160c248
ffff88011160c200 000000000000ffc0 ffff88015dd89fd8 0000000000004000
ffff88015dd89fd8 000000000000ffc0 ffff88018865b5f0 ffff88015dd88010
Call Trace:
[] ? check_preempt_curr+0x25/0x62
[] ? ttwu_do_wakeup+0x11/0x83
[] ? try_to_wake_up+0x1d9/0x1eb
[] ? __wake_up_common+0x41/0x78
[] ? cv_wait_common+0xb8/0x141 [spl]
[] ? wake_up_bit+0x23/0x23
[] ? zio_wait+0xe8/0x11c [zfs]
[] ? dsl_pool_sync+0xbf/0x428 [zfs]
[] ? spa_sync+0x47d/0x829 [zfs]
[] ? txg_sync_thread+0x29a/0x3f6 [zfs]
[] ? set_user_nice+0x115/0x139
[] ? txg_thread_exit+0x2b/0x2b [zfs]
[] ? __thread_create+0x2df/0x2df [spl]
[] ? thread_generic_wrapper+0x6a/0x75 [spl]
[] ? kthread+0x7a/0x82
[] ? kernel_thread_helper+0x4/0x10
[] ? kthread_worker_fn+0x139/0x139
[] ? gs_change+0xb/0xb

rsync           D ffff88020b19f008     0 19506  19505 0x00000000
ffff88020b19ed60 0000000000000082 0000000000000001 0000000000000007
ffff880151257930 000000000000ffc0 ffff88015f65dfd8 0000000000004000
ffff88015f65dfd8 000000000000ffc0 ffff88020b19ed60 ffff88015f65c010
Call Trace:
[] ? extract_buf+0x76/0xc2
[] ? invalidate_interrupt1+0xe/0x20
[] ? __mutex_lock_slowpath+0xe2/0x128
[] ? mutex_lock+0x12/0x25
[] ? vdev_cache_read+0x92/0x2de [zfs]
[] ? zio_vdev_io_start+0x1c1/0x228 [zfs]
[] ? zio_nowait+0xd0/0xf4 [zfs]
[] ? vdev_mirror_io_start+0x2fa/0x313 [zfs]
[] ? vdev_config_generate+0xac7/0xac7 [zfs]
[] ? zio_nowait+0xd0/0xf4 [zfs]
[] ? arc_read_nolock+0x662/0x673 [zfs]
[] ? arc_read+0xc2/0x146 [zfs]
[] ? dnode_block_freed+0xfe/0x119 [zfs]
[] ? dbuf_fill_done+0x61/0x61 [zfs]
[] ? dbuf_read+0x3ce/0x5c0 [zfs]
[] ? dnode_hold_impl+0x1a8/0x43c [zfs]
[] ? remove_reference+0x93/0x9f [zfs]
[] ? dmu_bonus_hold+0x22/0x26e [zfs]
[] ? zfs_zget+0x5c/0x19f [zfs]
[] ? zfs_dirent_lock+0x447/0x48f [zfs]
[] ? zfs_zaccess_aces_check+0x1d5/0x203 [zfs]
[] ? zfs_dirlook+0x20a/0x276 [zfs]
[] ? zfs_lookup+0x26e/0x2b6 [zfs]
[] ? zpl_lookup+0x47/0x80 [zfs]
[] ? d_alloc_and_lookup+0x43/0x60
[] ? do_lookup+0x1c9/0x2bb
[] ? path_lookupat+0xe2/0x5af
[] ? do_path_lookup+0x1d/0x5f
[] ? user_path_at_empty+0x49/0x84
[] ? tsd_exit+0x83/0x18d [spl]
[] ? cp_new_stat+0xdf/0xf1
[] ? vfs_fstatat+0x43/0x70
[] ? sys_newlstat+0x11/0x2d
[] ? system_call_fastpath+0x16/0x1b

rsync           D ffff8801ba0c6cb8     0 19511  19510 0x00000000
ffff8801ba0c6a10 0000000000000086 ffffffff8115539c 0000000000000001
ffff8801512572a0 000000000000ffc0 ffff8801b1051fd8 0000000000004000
ffff8801b1051fd8 000000000000ffc0 ffff8801ba0c6a10 ffff8801b1050010
Call Trace:
[] ? cpumask_any_but+0x28/0x37
[] ? __schedule+0x727/0x7b0
[] ? rcu_implicit_dynticks_qs+0x3f/0x60
[] ? force_quiescent_state+0x1c3/0x230
[] ? __mutex_lock_slowpath+0xe2/0x128
[] ? arc_buf_remove_ref+0xe6/0xf4 [zfs]
[] ? mutex_lock+0x12/0x25
[] ? zfs_zinactive+0x5a/0xd4 [zfs]
[] ? zfs_inactive+0x106/0x19e [zfs]
[] ? evict+0x78/0x117
[] ? dispose_list+0x2c/0x36
[] ? shrink_icache_memory+0x278/0x2a8
[] ? shrink_slab+0xe3/0x153
[] ? do_try_to_free_pages+0x253/0x3f0
[] ? get_page_from_freelist+0x47a/0x4ad
[] ? try_to_free_pages+0x79/0x7e
[] ? __alloc_pages_nodemask+0x48b/0x6c8
[] ? __get_free_pages+0x12/0x52
[] ? spl_kmem_cache_alloc+0x236/0x975 [spl]
[] ? dbuf_create+0x38/0x32e [zfs]
[] ? dnode_hold_impl+0x3f0/0x43c [zfs]
[] ? dbuf_create_bonus+0x16/0x1f [zfs]
[] ? dmu_bonus_hold+0x10b/0x26e [zfs]
[] ? zfs_zget+0x5c/0x19f [zfs]
[] ? zfs_dirent_lock+0x447/0x48f [zfs]
[] ? zfs_zaccess_aces_check+0x1d5/0x203 [zfs]
[] ? zfs_dirlook+0x20a/0x276 [zfs]
[] ? zfs_lookup+0x26e/0x2b6 [zfs]
[] ? zpl_lookup+0x47/0x80 [zfs]
[] ? d_alloc_and_lookup+0x43/0x60
[] ? do_lookup+0x1c9/0x2bb
[] ? path_lookupat+0xe2/0x5af
[] ? strncpy_from_user+0x9/0x4e
[] ? do_path_lookup+0x1d/0x5f
[] ? user_path_at_empty+0x49/0x84
[] ? tsd_exit+0x83/0x18d [spl]
[] ? cp_new_stat+0xdf/0xf1
[] ? vfs_fstatat+0x43/0x70
[] ? sys_newlstat+0x11/0x2d
[] ? system_call_fastpath+0x16/0x1b

rsync           D ffff88022365e728     0 19522  19521 0x00000000
ffff88022365e480 0000000000000082 ffffffff8115539c ffff8801b219b438
ffff8802219bcea0 000000000000ffc0 ffff8801b219bfd8 0000000000004000
ffff8801b219bfd8 000000000000ffc0 ffff88022365e480 ffff8801b219a010
Call Trace:
[] ? cpumask_any_but+0x28/0x37
[] ? __pagevec_release+0x19/0x22
[] ? move_active_pages_to_lru+0x130/0x154
[] ? select_task_rq_fair+0x35e/0x791
[] ? common_interrupt+0xe/0x13
[] ? sched_clock_local+0x13/0x76
[] ? __mutex_lock_slowpath+0xe2/0x128
[] ? check_preempt_curr+0x25/0x62
[] ? mutex_lock+0x12/0x25
[] ? zfs_zinactive+0x5a/0xd4 [zfs]
[] ? zfs_inactive+0x106/0x19e [zfs]
[] ? evict+0x78/0x117
[] ? dispose_list+0x2c/0x36
[] ? shrink_icache_memory+0x278/0x2a8
[] ? shrink_slab+0xe3/0x153
[] ? do_try_to_free_pages+0x253/0x3f0
[] ? get_page_from_freelist+0x47a/0x4ad
[] ? try_to_free_pages+0x79/0x7e
[] ? __alloc_pages_nodemask+0x48b/0x6c8
[] ? __get_free_pages+0x12/0x52
[] ? spl_kmem_cache_alloc+0x236/0x975 [spl]
[] ? dbuf_create+0x38/0x32e [zfs]
[] ? dnode_hold_impl+0x3f0/0x43c [zfs]
[] ? dbuf_create_bonus+0x16/0x1f [zfs]
[] ? dmu_bonus_hold+0x10b/0x26e [zfs]
[] ? zfs_zget+0x5c/0x19f [zfs]
[] ? zfs_dirent_lock+0x447/0x48f [zfs]
[] ? zfs_zaccess_aces_check+0x1d5/0x203 [zfs]
[] ? zfs_dirlook+0x20a/0x276 [zfs]
[] ? zfs_lookup+0x26e/0x2b6 [zfs]
[] ? zpl_lookup+0x47/0x80 [zfs]
[] ? d_alloc_and_lookup+0x43/0x60
[] ? do_lookup+0x1c9/0x2bb
[] ? path_lookupat+0xe2/0x5af
[] ? do_path_lookup+0x1d/0x5f
[] ? user_path_at_empty+0x49/0x84
[] ? tsd_exit+0x83/0x18d [spl]
[] ? cp_new_stat+0xdf/0xf1
[] ? vfs_fstatat+0x43/0x70
[] ? sys_newlstat+0x11/0x2d
[] ? system_call_fastpath+0x16/0x1b

rsync           D ffff8802216c93c8     0 19527  19526 0x00000000
ffff8802216c9120 0000000000000082 000000000000000a ffff880000000000
ffff88022e8a6750 000000000000ffc0 ffff88021c0f1fd8 0000000000004000
ffff88021c0f1fd8 000000000000ffc0 ffff8802216c9120 ffff88021c0f0010
Call Trace:
[] ? zap_get_leaf_byblk+0x1b5/0x249 [zfs]
[] ? zap_leaf_array_match+0x166/0x197 [zfs]
[] ? remove_reference+0x93/0x9f [zfs]
[] ? arc_buf_remove_ref+0xe6/0xf4 [zfs]
[] ? dbuf_rele_and_unlock+0x12b/0x19a [zfs]
[] ? __mutex_lock_slowpath+0xe2/0x128
[] ? mutex_lock+0x12/0x25
[] ? zfs_zget+0x46/0x19f [zfs]
[] ? zfs_dirent_lock+0x447/0x48f [zfs]
[] ? zfs_zaccess_aces_check+0x1d5/0x203 [zfs]
[] ? zfs_dirlook+0x20a/0x276 [zfs]
[] ? zfs_lookup+0x26e/0x2b6 [zfs]
[] ? zpl_lookup+0x47/0x80 [zfs]
[] ? d_alloc_and_lookup+0x43/0x60
[] ? do_lookup+0x1c9/0x2bb
[] ? path_lookupat+0xe2/0x5af
[] ? do_path_lookup+0x1d/0x5f
[] ? user_path_at_empty+0x49/0x84
[] ? tsd_exit+0x83/0x18d [spl]
[] ? cp_new_stat+0xdf/0xf1
[] ? vfs_fstatat+0x43/0x70
[] ? sys_newlstat+0x11/0x2d
[] ? system_call_fastpath+0x16/0x1b

zfs-fuse has pool v26 available in the Git repo.

[Edit: full arcstats included]

@behlendorf
Copy link
Contributor

We're working on some spl kmem improvements which may help with this. Expect patches in the next week or so. Two cases have been identified which can result in memory reclaim from the slab being less the optimal.

@jspiros
Copy link

jspiros commented Jun 12, 2012

I'm also experiencing this exact issue when attempting an rsync. Has there been any progress; is there anything I should try? I'm running 0.6.0-rc8 on Debian wheezy/sid using the deb instructions on the website.

@behlendorf
Copy link
Contributor

Several patches which should help with this have been merged in to master post rc8. Please try the latest master source. Additionally, you can try increasing /proc/sys/vm/min_free_kbytes to say 256m or so.

@jspiros
Copy link

jspiros commented Jun 12, 2012

I was actually just looking at #154 and it seemed to be related, and was just looking for confirmation that I should indeed try the latest master. So, thanks, will do, and will report back.

@ryao
Copy link
Contributor

ryao commented Jun 12, 2012

@jspiros Is there anything useful being output to dmesg?

You might want to run sysctl kernel.hung_task_timeout_secs=15 to reduce the amount of time it takes for the hang check timer to report hangs to dmesg from 120 seconds. This requires that your kernel is compiled with CONFIG_HANGCHECK_TIMER=Y.

@jspiros
Copy link

jspiros commented Jun 12, 2012

@ryao Indeed, there were many hung tasks, even with the default of 120 seconds. The first three were kswapd0, arc_reclaim, and rsync, though many others hung after that.

I'm currently running the latest master, and I set min_free_kbytes to 256MB as recommended, but if you tell me what information you want should the same problem occur again tonight (I have some rsync backups that run nightly), I'll try to get it here. I was able to run some commands as root, including dmesg, so just tell me what would be helpful and I'll be sure to get as much information as I can.

@ryao
Copy link
Contributor

ryao commented Jun 13, 2012

The hung task information in the dmesg output would be useful.

@DeHackEd
Copy link
Contributor Author

Fresh git pulls from this morning on both spl and zfs. Versions are 2371321 and ee191e8

Arc size 2^31 bytes. It's the same setup from my first post.

# cat /proc/spl/kstat/zfs/arcstats
10 1 0x01 77 3696 216989312967 3547642914356
name                            type data
hits                            4    4820390
misses                          4    1632473
demand_data_hits                4    2
demand_data_misses              4    58777
demand_metadata_hits            4    4437436
demand_metadata_misses          4    875202
prefetch_data_hits              4    0
prefetch_data_misses            4    0
prefetch_metadata_hits          4    382952
prefetch_metadata_misses        4    698494
mru_hits                        4    676077
mru_ghost_hits                  4    483254
mfu_hits                        4    3761394
mfu_ghost_hits                  4    483721
deleted                         4    418327
recycle_miss                    4    643047
mutex_miss                      4    2093
evict_skip                      4    96315269
evict_l2_cached                 4    0
evict_l2_eligible               4    15802661888
evict_l2_ineligible             4    7292848128
hash_elements                   4    267779
hash_elements_max               4    299201
hash_collisions                 4    328936
hash_chains                     4    71327
hash_chain_max                  4    8
p                               4    135680512
c                               4    2147483648
c_min                           4    293601280
c_max                           4    2147483648
size                            4    2306523480
hdr_size                        4    90425088
data_size                       4    1618198528
other_size                      4    597899864
anon_size                       4    3948544
anon_evict_data                 4    0
anon_evict_metadata             4    0
mru_size                        4    199776256
mru_evict_data                  4    0
mru_evict_metadata              4    0
mru_ghost_size                  4    1946481152
mru_ghost_evict_data            4    16462336
mru_ghost_evict_metadata        4    1930018816
mfu_size                        4    1414473728
mfu_evict_data                  4    0
mfu_evict_metadata              4    0
mfu_ghost_size                  4    200998912
mfu_ghost_evict_data            4    0
mfu_ghost_evict_metadata        4    200998912
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_hdr_miss              4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_free_on_write                4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_hdr_size                     4    0
memory_throttle_count           4    0
memory_direct_count             4    0
memory_indirect_count           4    0
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    1992
arc_meta_used                   4    2306523480
arc_meta_limit                  4    536870912
arc_meta_max                    4    2657959776

I caught it before it the OS threatened to lock up and SIGSTOP'd all my rsync processes. While it was paused the arc 'size' was slowly dropping and I resumed once it got below 1.6G. Right now the size is 2,718,915,552 and still climbing unless I stop it again.

@jspiros
Copy link

jspiros commented Jun 13, 2012

@behlendorf Your suggestions helped me make it through the nightly rsync, and everything seems to be working fine.

@ryao Due to this, I'm afraid I do not currently have any dmesg output for you (yet). If something goes wrong again, I'll provide it, of course.

@behlendorf
Copy link
Contributor

@jspiros That's good news. The hope is that going forward some of Richards VM work will remove the need for this sort of tuning but we're not quite there yet.

@DeHackEd
Copy link
Contributor Author

I (partially?) retract my previous post. I left the rsync running so it would presumably ruin the system, but it didn't die. That was some hours ago and the ARC is now restraining itself at about 570-580 megabytes. I'm turning up the heat a bit to see if I can coerce a useful stack dump out of it or find out if it properly behaves now.

Edit: after turning up the pressure the ARC did eventually grow to the breaking point. It seems to fare better but still cracks under intense pressure.

@jspiros
Copy link

jspiros commented Jun 14, 2012

Okay, well, this morning was doing alright until just now. It probably doesn't help that I decided to scrub my pool, which will take about three days (18TB in [usable] capacity), overlapping with the nightly rsync.

Here's what dmesg is looking like right now.


[161160.692017] INFO: task kswapd0:33 blocked for more than 120 seconds.
[161160.692055] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.692093] kswapd0         D ffff88042f388240     0    33      2 0x00000000
[161160.692098]  ffff88042f388240 0000000000000046 ffff8801a8c8b600 ffff88042df12830
[161160.692103]  0000000000013540 ffff88042c3affd8 ffff88042c3affd8 ffff88042f388240
[161160.692108]  ffff88042d645800 ffffffffa0e81aa4 00000000000080d0 ffff880412b89c20
[161160.692113] Call Trace:
[161160.692158]  [<ffffffffa0e81aa4>] ? sa_attr_register_sync.isra.6+0x19d/0x1ac [zfs]
[161160.692169]  [<ffffffff81348a0f>] ? __mutex_lock_common.isra.5+0xff/0x164
[161160.692172]  [<ffffffff810363c7>] ? should_resched+0x5/0x23
[161160.692175]  [<ffffffff813488fd>] ? mutex_lock+0x1a/0x2d
[161160.692178]  [<ffffffff810ea5e0>] ? kfree+0x5b/0x6c
[161160.692194]  [<ffffffffa0ec0518>] ? zfs_zinactive+0x43/0xa7 [zfs]
[161160.692212]  [<ffffffffa0eb72e6>] ? zfs_inactive+0x102/0x139 [zfs]
[161160.692216]  [<ffffffff8110bb48>] ? evict+0x96/0x148
[161160.692218]  [<ffffffff8110bdb2>] ? dispose_list+0x27/0x31
[161160.692221]  [<ffffffff8110c8f2>] ? prune_icache_sb+0x24f/0x25e
[161160.692224]  [<ffffffff810fb5cb>] ? prune_super+0xd5/0x13f
[161160.692227]  [<ffffffff810c13f5>] ? shrink_slab+0x18f/0x24d
[161160.692229]  [<ffffffff810c3cea>] ? balance_pgdat+0x283/0x4b7
[161160.692232]  [<ffffffff810c41e8>] ? kswapd+0x2ca/0x2e6
[161160.692235]  [<ffffffff8105f533>] ? add_wait_queue+0x3c/0x3c
[161160.692237]  [<ffffffff810c3f1e>] ? balance_pgdat+0x4b7/0x4b7
[161160.692239]  [<ffffffff8105eeed>] ? kthread+0x76/0x7e
[161160.692243]  [<ffffffff81350534>] ? kernel_thread_helper+0x4/0x10
[161160.692245]  [<ffffffff8105ee77>] ? kthread_worker_fn+0x139/0x139
[161160.692248]  [<ffffffff81350530>] ? gs_change+0x13/0x13
[161160.692258] INFO: task arc_adapt:954 blocked for more than 120 seconds.
[161160.692281] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.692308] arc_adapt       D ffff88042bf18140     0   954      2 0x00000000
[161160.692311]  ffff88042bf18140 0000000000000046 ffff880300000001 ffff88041b5690a0
[161160.692315]  0000000000013540 ffff88040ed4dfd8 ffff88040ed4dfd8 ffff88042bf18140
[161160.692318]  ffff88042d645800 00000001a0e81aa4 ffff880363073680 ffff880412b89c20
[161160.692321] Call Trace:
[161160.692324]  [<ffffffff81348a0f>] ? __mutex_lock_common.isra.5+0xff/0x164
[161160.692326]  [<ffffffff813488fd>] ? mutex_lock+0x1a/0x2d
[161160.692345]  [<ffffffffa0e914c2>] ? txg_rele_to_sync+0x48/0x55 [zfs]
[161160.692361]  [<ffffffffa0ec0518>] ? zfs_zinactive+0x43/0xa7 [zfs]
[161160.692377]  [<ffffffffa0eb72e6>] ? zfs_inactive+0x102/0x139 [zfs]
[161160.692381]  [<ffffffff8110bb48>] ? evict+0x96/0x148
[161160.692383]  [<ffffffff8110bdb2>] ? dispose_list+0x27/0x31
[161160.692385]  [<ffffffff8110c8f2>] ? prune_icache_sb+0x24f/0x25e
[161160.692388]  [<ffffffff810fb5cb>] ? prune_super+0xd5/0x13f
[161160.692404]  [<ffffffffa0eb47fa>] ? zfs_sb_prune+0x76/0x9a [zfs]
[161160.692420]  [<ffffffffa0ecab51>] ? zpl_inode_alloc+0x66/0x66 [zfs]
[161160.692434]  [<ffffffffa0ecab6a>] ? zpl_prune_sb+0x19/0x1e [zfs]
[161160.692438]  [<ffffffff810fab4d>] ? iterate_supers_type+0x61/0xc2
[161160.692452]  [<ffffffffa0ecab6f>] ? zpl_prune_sb+0x1e/0x1e [zfs]
[161160.692467]  [<ffffffffa0ecab9d>] ? zpl_prune_sbs+0x2e/0x39 [zfs]
[161160.692479]  [<ffffffffa0e517a3>] ? arc_adjust_meta+0x101/0x171 [zfs]
[161160.692490]  [<ffffffffa0e51813>] ? arc_adjust_meta+0x171/0x171 [zfs]
[161160.692503]  [<ffffffffa0e51864>] ? arc_adapt_thread+0x51/0xb4 [zfs]
[161160.692510]  [<ffffffffa0440848>] ? thread_generic_wrapper+0x6a/0x75 [spl]
[161160.692517]  [<ffffffffa04407de>] ? __thread_create+0x2be/0x2be [spl]
[161160.692519]  [<ffffffff8105eeed>] ? kthread+0x76/0x7e
[161160.692522]  [<ffffffff81350534>] ? kernel_thread_helper+0x4/0x10
[161160.692525]  [<ffffffff8105ee77>] ? kthread_worker_fn+0x139/0x139
[161160.692527]  [<ffffffff81350530>] ? gs_change+0x13/0x13
[161160.692546] INFO: task txg_quiesce:1178 blocked for more than 120 seconds.
[161160.692570] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.692597] txg_quiesce     D ffff88043fc13540     0  1178      2 0x00000000
[161160.692599]  ffff8804161a7020 0000000000000046 ffffffff00000000 ffffffff8160d020
[161160.692603]  0000000000013540 ffff88041a32dfd8 ffff88041a32dfd8 ffff8804161a7020
[161160.692606]  0000000000000202 0000000181349424 ffffc900066d12a8 ffffc900066d12a0
[161160.692609] Call Trace:
[161160.692616]  [<ffffffffa0445ec6>] ? cv_wait_common+0xb8/0x141 [spl]
[161160.692619]  [<ffffffff8105f533>] ? add_wait_queue+0x3c/0x3c
[161160.692638]  [<ffffffffa0e92330>] ? txg_quiesce_thread+0x202/0x2ae [zfs]
[161160.692659]  [<ffffffffa0e9212e>] ? txg_sync_thread+0x3c6/0x3c6 [zfs]
[161160.692666]  [<ffffffffa0440848>] ? thread_generic_wrapper+0x6a/0x75 [spl]
[161160.692672]  [<ffffffffa04407de>] ? __thread_create+0x2be/0x2be [spl]
[161160.692675]  [<ffffffff8105eeed>] ? kthread+0x76/0x7e
[161160.692677]  [<ffffffff81350534>] ? kernel_thread_helper+0x4/0x10
[161160.692680]  [<ffffffff8105ee77>] ? kthread_worker_fn+0x139/0x139
[161160.692682]  [<ffffffff81350530>] ? gs_change+0x13/0x13
[161160.692693] INFO: task avahi-daemon:3937 blocked for more than 120 seconds.
[161160.692718] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.692745] avahi-daemon    D ffff88043fc13540     0  3937      1 0x00000000
[161160.692748]  ffff8804247827b0 0000000000000082 ffff88033734d440 ffffffff8160d020
[161160.692751]  0000000000013540 ffff8804236f9fd8 ffff8804236f9fd8 ffff8804247827b0
[161160.692754]  ffff88042d645800 ffffffffa0e81aa4 ffff88033734d440 ffff880412b89c20
[161160.692757] Call Trace:
[161160.692777]  [<ffffffffa0e81aa4>] ? sa_attr_register_sync.isra.6+0x19d/0x1ac [zfs]
[161160.692780]  [<ffffffff81348a0f>] ? __mutex_lock_common.isra.5+0xff/0x164
[161160.692782]  [<ffffffff813488fd>] ? mutex_lock+0x1a/0x2d
[161160.692801]  [<ffffffffa0e914c2>] ? txg_rele_to_sync+0x48/0x55 [zfs]
[161160.692816]  [<ffffffffa0ec0518>] ? zfs_zinactive+0x43/0xa7 [zfs]
[161160.692832]  [<ffffffffa0eb72e6>] ? zfs_inactive+0x102/0x139 [zfs]
[161160.692836]  [<ffffffff8110bb48>] ? evict+0x96/0x148
[161160.692838]  [<ffffffff8110bdb2>] ? dispose_list+0x27/0x31
[161160.692840]  [<ffffffff8110c8f2>] ? prune_icache_sb+0x24f/0x25e
[161160.692843]  [<ffffffff810fb5cb>] ? prune_super+0xd5/0x13f
[161160.692845]  [<ffffffff810c13f5>] ? shrink_slab+0x18f/0x24d
[161160.692848]  [<ffffffff810c354d>] ? do_try_to_free_pages+0x25f/0x3de
[161160.692850]  [<ffffffff810c3937>] ? try_to_free_pages+0xa9/0xe9
[161160.692853]  [<ffffffff810baa2b>] ? __alloc_pages_nodemask+0x4a9/0x726
[161160.692856]  [<ffffffff810e3dec>] ? alloc_pages_current+0xb2/0xcc
[161160.692859]  [<ffffffff810b7631>] ? __get_free_pages+0x6/0x42
[161160.692862]  [<ffffffff81106ebd>] ? __pollwait+0x56/0xce
[161160.692864]  [<ffffffff810ff741>] ? pipe_poll+0x2a/0x7b
[161160.692866]  [<ffffffff81107b8f>] ? do_sys_poll+0x1dc/0x347
[161160.692869]  [<ffffffff81106e67>] ? poll_freewait+0x97/0x97
[161160.692871]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692873]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692876]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692878]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692880]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692883]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692885]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692887]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692889]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692892]  [<ffffffff8104b197>] ? timespec_add_safe+0x32/0x63
[161160.692895]  [<ffffffff81107d85>] ? sys_poll+0x43/0xab
[161160.692897]  [<ffffffff8134e3d2>] ? system_call_fastpath+0x16/0x1b
[161160.692917] INFO: task ntpd:4648 blocked for more than 120 seconds.
[161160.692939] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.692966] ntpd            D ffff88041d332ee0     0  4648      1 0x00000004
[161160.692969]  ffff88041d332ee0 0000000000000082 ffff880115471cc0 ffff88014956a080
[161160.692972]  0000000000013540 ffff880426907fd8 ffff880426907fd8 ffff88041d332ee0
[161160.692976]  ffff88042d645800 ffffffffa0e81aa4 ffff880115471cc0 ffff880412b89c20
[161160.692979] Call Trace:
[161160.692998]  [<ffffffffa0e81aa4>] ? sa_attr_register_sync.isra.6+0x19d/0x1ac [zfs]
[161160.693001]  [<ffffffff81348a0f>] ? __mutex_lock_common.isra.5+0xff/0x164
[161160.693003]  [<ffffffff810363c7>] ? should_resched+0x5/0x23
[161160.693006]  [<ffffffff813488fd>] ? mutex_lock+0x1a/0x2d
[161160.693009]  [<ffffffff8107066d>] ? arch_local_irq_save+0x11/0x17
[161160.693024]  [<ffffffffa0ec0518>] ? zfs_zinactive+0x43/0xa7 [zfs]
[161160.693041]  [<ffffffffa0eb72e6>] ? zfs_inactive+0x102/0x139 [zfs]
[161160.693045]  [<ffffffff8110bb48>] ? evict+0x96/0x148
[161160.693047]  [<ffffffff8110bdb2>] ? dispose_list+0x27/0x31
[161160.693049]  [<ffffffff8110c8f2>] ? prune_icache_sb+0x24f/0x25e
[161160.693052]  [<ffffffff810fb5cb>] ? prune_super+0xd5/0x13f
[161160.693054]  [<ffffffff810c13f5>] ? shrink_slab+0x18f/0x24d
[161160.693056]  [<ffffffff810c354d>] ? do_try_to_free_pages+0x25f/0x3de
[161160.693059]  [<ffffffff810c3937>] ? try_to_free_pages+0xa9/0xe9
[161160.693061]  [<ffffffff810baa2b>] ? __alloc_pages_nodemask+0x4a9/0x726
[161160.693064]  [<ffffffff810e3dec>] ? alloc_pages_current+0xb2/0xcc
[161160.693066]  [<ffffffff810b88a9>] ? get_pageblock_flags_group+0x3a/0x6e
[161160.693069]  [<ffffffff810b7631>] ? __get_free_pages+0x6/0x42
[161160.693071]  [<ffffffff81106ebd>] ? __pollwait+0x56/0xce
[161160.693074]  [<ffffffff81283dff>] ? datagram_poll+0x1b/0xd0
[161160.693077]  [<ffffffff812d0741>] ? udp_poll+0xf/0x42
[161160.693079]  [<ffffffff81107394>] ? do_select+0x2b0/0x46a
[161160.693082]  [<ffffffff81106e67>] ? poll_freewait+0x97/0x97
[161160.693084]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693086]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693089]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693091]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693093]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693096]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693098]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693100]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693102]  [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693105]  [<ffffffff811076c3>] ? core_sys_select+0x175/0x21a
[161160.693107]  [<ffffffff810363c7>] ? should_resched+0x5/0x23
[161160.693110]  [<ffffffff81055c48>] ? set_current_blocked+0x2d/0x43
[161160.693113]  [<ffffffff8100e342>] ? do_signal+0x547/0x610
[161160.693116]  [<ffffffff810363c7>] ? should_resched+0x5/0x23
[161160.693118]  [<ffffffff8134810f>] ? _cond_resched+0x7/0x1c
[161160.693121]  [<ffffffff810159d8>] ? check_for_xstate+0x27/0x7c
[161160.693123]  [<ffffffff81015cc5>] ? restore_i387_xstate+0xaa/0x15d
[161160.693125]  [<ffffffff810566b3>] ? do_sigaltstack+0xaa/0x13e
[161160.693128]  [<ffffffff811077f0>] ? sys_select+0x88/0xad
[161160.693130]  [<ffffffff8134e8ec>] ? stub_rt_sigreturn+0x6c/0xc0
[161160.693133]  [<ffffffff8134e3d2>] ? system_call_fastpath+0x16/0x1b
[161160.693135] INFO: task radvd:4719 blocked for more than 120 seconds.
[161160.693157] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.693183] radvd           D ffff88041d389610     0  4719      1 0x00000000
[161160.693186]  ffff88041d389610 0000000000000082 ffff880288008f00 ffff880427534080
[161160.693189]  0000000000013540 ffff8804201d3fd8 ffff8804201d3fd8 ffff88041d389610
[161160.693193]  ffff88042d645800 ffffffffa0e81aa4 ffff880288008f00 ffff880412b89c20
[161160.693196] Call Trace:
[161160.693214]  [<ffffffffa0e81aa4>] ? sa_attr_register_sync.isra.6+0x19d/0x1ac [zfs]
[161160.693217]  [<ffffffff81348a0f>] ? __mutex_lock_common.isra.5+0xff/0x164
[161160.693220]  [<ffffffff810363c7>] ? should_resched+0x5/0x23
[161160.693222]  [<ffffffff813488fd>] ? mutex_lock+0x1a/0x2d
[161160.693224]  [<ffffffff810ea5e0>] ? kfree+0x5b/0x6c
[161160.693240]  [<ffffffffa0ec0518>] ? zfs_zinactive+0x43/0xa7 [zfs]
[161160.693257]  [<ffffffffa0eb72e6>] ? zfs_inactive+0x102/0x139 [zfs]
[161160.693261]  [<ffffffff8110bb48>] ? evict+0x96/0x148
[161160.693263]  [<ffffffff8110bdb2>] ? dispose_list+0x27/0x31
[161160.693265]  [<ffffffff8110c8f2>] ? prune_icache_sb+0x24f/0x25e
[161160.693268]  [<ffffffff810fb5cb>] ? prune_super+0xd5/0x13f
[161160.693270]  [<ffffffff810c13f5>] ? shrink_slab+0x18f/0x24d
[161160.693272]  [<ffffffff810c354d>] ? do_try_to_free_pages+0x25f/0x3de
[161160.693275]  [<ffffffff810c3937>] ? try_to_free_pages+0xa9/0xe9
[161160.693277]  [<ffffffff810baa2b>] ? __alloc_pages_nodemask+0x4a9/0x726
[161160.693281]  [<ffffffff8113e179>] ? proc_alloc_inode+0x96/0x96
[161160.693284]  [<ffffffff810e9b55>] ? kmem_getpages+0x4c/0xd9
[161160.693287]  [<ffffffff810eaef5>] ? fallback_alloc+0x123/0x1c2
[161160.693289]  [<ffffffff810eb4c8>] ? kmem_cache_alloc_trace+0x79/0xce
[161160.693292]  [<ffffffff8111160b>] ? seq_read+0xb5/0x34c
[161160.693294]  [<ffffffff8113e6f3>] ? proc_reg_read+0x78/0x91
[161160.693297]  [<ffffffff810f9321>] ? vfs_read+0x9f/0xe6
[161160.693300]  [<ffffffff810d5933>] ? sys_mmap_pgoff+0x125/0x160
[161160.693302]  [<ffffffff810f93ad>] ? sys_read+0x45/0x6b
[161160.693304]  [<ffffffff8134e3d2>] ? system_call_fastpath+0x16/0x1b
[161160.693308] INFO: task dropbox:4920 blocked for more than 120 seconds.
[161160.693331] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.693357] dropbox         D ffff88043fd13540     0  4920      1 0x00000000
[161160.693360]  ffff880427534080 0000000000000086 ffff88018fc43600 ffff88042d074730
[161160.693363]  0000000000013540 ffff8803f479bfd8 ffff8803f479bfd8 ffff880427534080
[161160.693366]  ffff88042d645800 ffffffffa0e81aa4 ffff88018fc43600 ffff880412b89c20
[161160.693369] Call Trace:
[161160.693388]  [<ffffffffa0e81aa4>] ? sa_attr_register_sync.isra.6+0x19d/0x1ac [zfs]
[161160.693391]  [<ffffffff81348a0f>] ? __mutex_lock_common.isra.5+0xff/0x164
[161160.693394]  [<ffffffff813488fd>] ? mutex_lock+0x1a/0x2d
[161160.693413]  [<ffffffffa0e914c2>] ? txg_rele_to_sync+0x48/0x55 [zfs]
[161160.693428]  [<ffffffffa0ec0518>] ? zfs_zinactive+0x43/0xa7 [zfs]
[161160.693444]  [<ffffffffa0eb72e6>] ? zfs_inactive+0x102/0x139 [zfs]
[161160.693448]  [<ffffffff8110bb48>] ? evict+0x96/0x148
[161160.693450]  [<ffffffff8110bdb2>] ? dispose_list+0x27/0x31
[161160.693452]  [<ffffffff8110c8f2>] ? prune_icache_sb+0x24f/0x25e
[161160.693455]  [<ffffffff810fb5cb>] ? prune_super+0xd5/0x13f
[161160.693457]  [<ffffffff810c13f5>] ? shrink_slab+0x18f/0x24d
[161160.693459]  [<ffffffff810c354d>] ? do_try_to_free_pages+0x25f/0x3de
[161160.693462]  [<ffffffff810c3937>] ? try_to_free_pages+0xa9/0xe9
[161160.693464]  [<ffffffff810baa2b>] ? __alloc_pages_nodemask+0x4a9/0x726
[161160.693467]  [<ffffffff810e5123>] ? alloc_pages_vma+0xe2/0xe7
[161160.693470]  [<ffffffff810cff31>] ? handle_pte_fault+0x165/0x79f
[161160.693472]  [<ffffffff810cd641>] ? pte_offset_kernel+0x16/0x35
[161160.693475]  [<ffffffff8134c3c3>] ? do_page_fault+0x312/0x337
[161160.693477]  [<ffffffff81037e30>] ? set_next_entity+0x32/0x55
[161160.693480]  [<ffffffff8100d755>] ? __switch_to+0x1e5/0x258
[161160.693482]  [<ffffffff810397ec>] ? finish_task_switch+0x88/0xba
[161160.693485]  [<ffffffff813480b4>] ? __schedule+0x5ac/0x5c3
[161160.693487]  [<ffffffff813499b5>] ? page_fault+0x25/0x30
[161160.693492] INFO: task irssi:7797 blocked for more than 120 seconds.
[161160.693514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.693540] irssi           D ffff88043fc13540     0  7797   4834 0x00000000
[161160.693543]  ffff8804269de300 0000000000000082 ffff880157971800 ffff88042d075510
[161160.693546]  0000000000013540 ffff880366853fd8 ffff880366853fd8 ffff8804269de300
[161160.693549]  ffff88042d645800 ffffffffa0e81aa4 ffff880157971800 ffff880412b89c20
[161160.693553] Call Trace:
[161160.693572]  [<ffffffffa0e81aa4>] ? sa_attr_register_sync.isra.6+0x19d/0x1ac [zfs]
[161160.693575]  [<ffffffff81348a0f>] ? __mutex_lock_common.isra.5+0xff/0x164
[161160.693577]  [<ffffffff813488fd>] ? mutex_lock+0x1a/0x2d
[161160.693596]  [<ffffffffa0e914c2>] ? txg_rele_to_sync+0x48/0x55 [zfs]
[161160.693611]  [<ffffffffa0ec0518>] ? zfs_zinactive+0x43/0xa7 [zfs]
[161160.693627]  [<ffffffffa0eb72e6>] ? zfs_inactive+0x102/0x139 [zfs]
[161160.693631]  [<ffffffff8110bb48>] ? evict+0x96/0x148
[161160.693633]  [<ffffffff8110bdb2>] ? dispose_list+0x27/0x31
[161160.693636]  [<ffffffff8110c8f2>] ? prune_icache_sb+0x24f/0x25e
[161160.693638]  [<ffffffff810fb5cb>] ? prune_super+0xd5/0x13f
[161160.693640]  [<ffffffff810c13f5>] ? shrink_slab+0x18f/0x24d
[161160.693643]  [<ffffffff810c354d>] ? do_try_to_free_pages+0x25f/0x3de
[161160.693645]  [<ffffffff810c3937>] ? try_to_free_pages+0xa9/0xe9
[161160.693648]  [<ffffffff810baa2b>] ? __alloc_pages_nodemask+0x4a9/0x726
[161160.693651]  [<ffffffff810e9b55>] ? kmem_getpages+0x4c/0xd9
[161160.693653]  [<ffffffff810eaef5>] ? fallback_alloc+0x123/0x1c2
[161160.693656]  [<ffffffff810ebb5c>] ? kmem_cache_alloc+0x73/0xea
[161160.693658]  [<ffffffff811254c6>] ? fsnotify_create_event+0x37/0x141
[161160.693661]  [<ffffffff81124bb5>] ? send_to_group.isra.1+0x130/0x177
[161160.693663]  [<ffffffff81124d56>] ? fsnotify+0x15a/0x20c
[161160.693665]  [<ffffffff81124f9f>] ? __fsnotify_parent+0x8a/0xa3
[161160.693668]  [<ffffffff810f8a36>] ? fsnotify_modify+0x3b/0x5c
[161160.693670]  [<ffffffff810f9254>] ? vfs_write+0xbb/0xe9
[161160.693672]  [<ffffffff810f9418>] ? sys_write+0x45/0x6b
[161160.693674]  [<ffffffff8134e3d2>] ? system_call_fastpath+0x16/0x1b
[161160.693679] INFO: task java:598 blocked for more than 120 seconds.
[161160.693700] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.693727] java            D ffff88043fc93540     0   598      1 0x00000000
[161160.693730]  ffff880425297850 0000000000000082 0000000000000000 ffff88042f1140c0
[161160.693733]  0000000000013540 ffff8801062bdfd8 ffff8801062bdfd8 ffff880425297850
[161160.693736]  0000000000000001 00000001817a8dd0 ffffffff817a8770 ffff880369fe7be0
[161160.693739] Call Trace:
[161160.693742]  [<ffffffff813493b7>] ? rwsem_down_failed_common+0xe0/0x114
[161160.693746]  [<ffffffff811b1b93>] ? call_rwsem_down_write_failed+0x13/0x20
[161160.693748]  [<ffffffff81348d24>] ? down_write+0x25/0x27
[161160.693751]  [<ffffffff810d63dd>] ? sys_mprotect+0xc5/0x1fc
[161160.693753]  [<ffffffff8134e3d2>] ? system_call_fastpath+0x16/0x1b

I will try to avoid rebooting the system, in case anyone has any suggestions that might resolve this problem online.

Update: Nevermind, I realized that with kswapd0 in deadlock it's highly unlikely that there's anything I could do to fix this problem without a reboot, so I rebooted. On an unrelated note, I'm happy to see that the scrub is continuing from where it left off.

@DeHackEd
Copy link
Contributor Author

While using ryao's "gentoo" branch of his own fork here on on github I killed the same machine again. Here's more data dumps.

Edit: must have been too much for github. Here's a pastebin of what I wanted: http://pastebin.com/303uhMen

@DeHackEd
Copy link
Contributor Author

After talking with ryao in IRC I've been asked to add a few points to this issue.

The vdev is a single device, /dev/sda, which is actually 13x drives in a hardware RAID-6 array. It can churn out pretty high IOPS for random reads which is the significant portion of the system's job - lots of rsync (as a sync target) instances all scanning what is tens of millions of files. It's also compressed with a ratio of about 1.7x so it grows pretty fast. (No Dedup)

The 'size' field from the kstats for the ARC, during heavy IO, just grows past the c and c_max fields, but it also seems to have some kind of background job that is draining the ARC at all times to around 512 megabytes. When completely idle, the size just shrinks mysteriously. When under relatively light load, it may float in place but still stay under c. It seems to stop shrinking at about 512 megabytes whether idle or not. I'd guess that under heavy load the background shrink just isn't keeping up with the foreground fill.

As a hack I can keep the system stable by SIGSTOP'ing all the rsync processes if the size gets dangerously high (my breaking point seems to be around 3.3 billion bytes) and SIGCONT once it's fallen back down again.

In my most recent reports I've been setting sysctl vm.min_free_kbytes to 524288

@jspiros
Copy link

jspiros commented Jun 20, 2012

After switching to vm.min_free_kbytes = 512M from 256M I've been able to survive the nightly rsyncs, however, now I'm running into exactly the same problem except with CrashPlan (a java application for remote backups) causing it instead of rsync. Exactly the same problem in that kswapd0 deadlocks followed by arc_reclaim and then everything else. I needed to get the system back online and neglected to copy the dmesg output, but I confirmed that it is essentially identical to the previous output from the rsync-caused problem. I am hesitant to turn CrashPlan back on until there's something new to try.

@DeHackEd
Copy link
Contributor Author

Still trying to collect useful information. I set the arc_max to 256 megabytes just for experimentation. It still deadlocked.

Stack traces and arcstats from kstat in pastebin:
http://pastebin.com/cgKZwaSJ

Two sets of stack traces are provided, set about 2.5 hours apart. The first is from long after the system had hung - I don't know when that actually happened.

vm.min_free_kbytes = 67584 (bootup default)

# free -m
             total       used       free     shared    buffers     cached
Mem:          7990       7835        155          0         52        111
-/+ buffers/cache:       7671        319
Swap:         4095          1       4094

# history | grep zfs.ko
insmod zfs.ko zfs_arc_max=268435456 zfs_arc_min=134217728 zfs_arc_shrink_shift=4 zfs_vdev_max_pending=20 zfs_vdev_min_pending=8

@DeHackEd
Copy link
Contributor Author

http://pastebin.com/pH7KyBKa

I turned on debugging and turned off a bunch of compiler optimizations and locked it up again the same way. These stack traces look a lot nicer to decode.

@jspiros
Copy link

jspiros commented Jun 26, 2012

After speaking with @DeHackEd on IRC about the cause of this problem, I tried limiting the queue depth (to 16 from 32) on the SATA drives that make up my pool using hdparm. I experienced success with this, in that I survived the night with both CrashPlan and the nightly rsyncs running. This isn't a long term fix, as it does slow everything down, but if someone else is in a situation like mine, it might help until this issue is solved.

@DeHackEd
Copy link
Contributor Author

DeHackEd commented Jul 3, 2012

The crashing/deadlocking side of the issue is apparently fixed by https://bugs.gentoo.org/show_bug.cgi?id=416685 (which I applied and cleaned up for kernel 3.0.x). Now it can run for over an hour with nothing controlling its execution speed. I'll update if anything goes wrong on me.

While running in this mode the ARC appears to behave itself better with regard to following the c value. This is just a rough observation right now as the ARC size is unlimited and generally can't reach the maximum due to memory usage. That said I did witness c shrink below c_max and the size was generally following the new c value. This is just one brief observation though - more grinding is required.

Note: I am currently using ryao's 'gentoo' tree.

@behlendorf
Copy link
Contributor

The memory management and swap improvements in issue #883 have now been merged in to master and should remove the need for the kernel vmalloc() patch. If you could verify these resolve your issue I think we'll be able to close this issue.

@DeHackEd
Copy link
Contributor Author

DeHackEd commented Sep 1, 2012

I can try installing it probably on Tuesday, but it may be a few days before the result is yay/nay.

@behlendorf
Copy link
Contributor

No great rush, I'm not going anywhere. But it would be nice to know. :)

@aarrpp
Copy link

aarrpp commented Oct 9, 2012

I am experiencing similar problems.
Under heavy load (~ 60 rsyncs) ARC grows over limit.
arc_adapt and kswapd1 load CPU very intensive
The system freezes for a short time and performance degradation occure.
In the end, the system hangs.
Tested on the latest version from git.

load average: 49.08, 48.50, 43.52

#cat /proc/spl/kstat/zfs/arcstats | grep -e arc_meta_used -e "^size " -e c_max -e limit
c_max 4 13421772800
size 4 10599128312
arc_meta_used 4 10474886904
arc_meta_limit 4 3355443200

#free
total used free shared buffers cached
Mem: 24736592 24458768 277824 0 2588 45360
-/+ buffers/cache: 24410820 325772
Swap: 7811064 304 7810760

#top

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

24718 root 0 -20 0 0 0 R 100 0.0 21:24.07 arc_adapt
144 root 20 0 0 0 0 R 75 0.0 0:17.07 kswapd1
143 root 20 0 0 0 0 R 67 0.0 0:30.36 kswapd0

0x7a [zfs]
[4879179.203765] [] ? vfs_write+0xa9/0x102
[4879179.203768] [] ? sys_write+0x45/0x6e
[4879179.203772] [] ? system_call_fastpath+0x16/0x1b
[4879179.203775] rsync D ffff88000c22fa20 0 23594 26441 0x00000000
[4879179.203779] ffff88063e44b880 0000000000000082 ffff88052c40d9d8 ffff88052c40d9d4
[4879179.203783] 0000000000020000 0000000000000006 000000000000f9e0 ffff88052c40dfd8
[4879179.203787] 0000000000015780 0000000000015780 ffff88031c9be2e0 ffff88031c9be5d8
[4879179.203791] Call Trace:
[4879179.203814] [] ? zio_wait_for_children+0x4f/0x59 [zfs]
[4879179.203821] [] ? cv_wait_common+0xb9/0x141 [spl]
[4879179.203824] [] ? autoremove_wake_function+0x0/0x2e
[4879179.203847] [] ? zio_wait+0x129/0x161 [zfs]
[4879179.203866] [] ? dbuf_read+0x3fd/0x4b9 [zfs]
[4879179.203886] [] ? dmu_buf_hold+0xb0/0xe4 [zfs]
[4879179.203912] [] ? zap_lockdir+0x58/0x541 [zfs]
[4879179.203916] [] ? mutex_lock+0xd/0x31
[4879179.203941] [] ? zap_cursor_retrieve+0x53/0x1ff [zfs]
[4879179.203962] [] ? RW_WRITE_HELD+0x67/0x81 [zfs]
[4879179.203983] [] ? dmu_prefetch+0x95/0x169 [zfs]
[4879179.204008] [] ? zfs_readdir+0x1ec/0x3b2 [zfs]
[4879179.204012] [] ? filldir+0x0/0xb7
[4879179.204016] [] ? acl_permission_check+0x5c/0x7e
[4879179.204020] [] ? file_move+0x1d/0x49
[4879179.204023] [] ? __dentry_open+0x1c4/0x2bf
[4879179.204027] [] ? do_filp_open+0x4e4/0x94b
[4879179.204031] [] ? cp_new_stat+0xe9/0xfc
[4879179.204035] [] ? filldir+0x0/0xb7
[4879179.204056] [] ? zpl_readdir+0x44/0x5c [zfs]
[4879179.204060] [] ? filldir+0x0/0xb7
[4879179.204063] [] ? vfs_readdir+0x75/0xa7
[4879179.204067] [] ? sys_getdents+0x7a/0xc7
[4879179.204071] [] ? system_call_fastpath+0x16/0x1b
[4879179.204073] rsync S ffff88000c28fa20 0 23602 23594 0x00000000
[4879179.204078] ffff88063e478000 0000000000000086 0000000000000000 ffffffff81192ddc
[4879179.204082] 0000000000000000 ffff88023e145978 000000000000f9e0 ffff88023e145fd8
[4879179.204086] 0000000000015780 0000000000015780 ffff88033c2d1c40 ffff88033c2d1f38
[4879179.204090] Call Trace:
[4879179.204094] [] ? rb_insert_color+0xba/0xe2
[4879179.204098] [] ? __hrtimer_start_range_ns+0x22f/0x242
[4879179.204102] [] ? hrtimer_try_to_cancel+0x3a/0x43
[4879179.204106] [] ? schedule_hrtimeout_range+0xcd/0x112
[4879179.204110] [] ? hrtimer_wakeup+0x0/0x22
[4879179.204114] [] ? schedule_hrtimeout_range+0xaf/0x112
[4879179.204118] [] ? poll_schedule_timeout+0x31/0x4f
[4879179.204122] [] ? do_select+0x50b/0x554
[4879179.204126] [] ? __pollwait+0x0/0xd6
[4879179.204129] [] ? pollwake+0x0/0x59
[4879179.204134] [] ? __kmalloc_node_track_caller+0xfa/0x11b
[4879179.204139] [] ? sock_alloc_send_pskb+0xbd/0x2d8
[4879179.204142] [] ? __alloc_skb+0x69/0x15a
[4879179.204146] [] ? sock_alloc_send_pskb+0xbd/0x2d8
[4879179.204150] [] ? skb_queue_tail+0x17/0x3e
[4879179.204154] [] ? sock_def_readable+0x10/0x62
[4879179.204158] [] ? unix_stream_sendmsg+0x24a/0x2c8
[4879179.204162] [] ? sock_aio_write+0xb1/0xbc
[4879179.204166] [] ? core_sys_select+0x184/0x21e
[4879179.204171] [] ? autoremove_wake_function+0x0/0x2e
[4879179.204174] [] ? ktime_get_ts+0x68/0xb2
[4879179.204178] [] ? read_tsc+0xa/0x20
[4879179.204182] [] ? ktime_get_ts+0x68/0xb2
[4879179.204186] [] ? sys_select+0x92/0xbb
[4879179.204189] [] ? system_call_fastpath+0x16/0x1b
[4879179.204191] rsync S ffff88034acafa20 0 23632 26441 0x00000000
[4879179.204195] ffff88033e638e20 0000000000000086 0000000000000000 ffffffff81192ddc
[4879179.204199] 0000000000000000 ffff8802a0357978 000000000000f9e0 ffff8802a0357fd8
[4879179.204202] 0000000000015780 0000000000015780 ffff880304548710 ffff880304548a08
[4879179.204206] Call Trace:
[4879179.204209] [] ? rb_insert_color+0xba/0xe2
[4879179.204213] [] ? __hrtimer_start_range_ns+0x22f/0x242
[4879179.204217] [] ? schedule_hrtimeout_range+0xcd/0x112
[4879179.204220] [] ? hrtimer_wakeup+0x0/0x22
[4879179.204223] [] ? schedule_hrtimeout_range+0xaf/0x112
[4879179.204227] [] ? poll_schedule_timeout+0x31/0x4f
[4879179.204230] [] ? do_select+0x50b/0x554
[4879179.204234] [] ? __pollwait+0x0/0xd6
[4879179.204237] [] ? pollwake+0x0/0x59
[4879179.204245] [] ? scsi_dispatch_cmd+0x1cf/0x23c [scsi_mod]
[4879179.204249] [] ? __switch_to+0x285/0x297
[4879179.204253] [] ? finish_task_switch+0x3a/0xaf
[4879179.204257] [] ? thread_return+0x79/0xe0
[4879179.204260] [] ? read_tsc+0xa/0x20
[4879179.204263] [] ? radix_tree_delete+0xbf/0x1ba
[4879179.204266] [] ? ktime_get_ts+0x68/0xb2
[4879179.204270] [] ? core_sys_select+0x184/0x21e
[4879179.204274] [] ? handle_mm_fault+0x3b8/0x80f
[4879179.204277] [] ? ktime_get_ts+0x68/0xb2
[4879179.204281] [] ? read_tsc+0xa/0x20
[4879179.204283] [] ? ktime_get_ts+0x68/0xb2
[4879179.204287] [] ? sys_select+0x92/0xbb
[4879179.204290] [] ? system_call_fastpath+0x16/0x1b
[4879179.204292] rsync D ffff8801b383f100 0 23638 23632 0x00000000
[4879179.204296] ffff8801b383f100 0000000000000082 50749dfa0000000d 0000000000035a34
[4879179.204301] 00005c5600000000 000000000000023d 000000000000f9e0 ffff8804070b7fd8
[4879179.204305] 0000000000015780 0000000000015780 ffff88063cea54c0 ffff88063cea57b8
[4879179.204309] Call Trace:
[4879179.204315] [] ? kmem_free_debug+0xc5/0x10d [spl]
[4879179.204340] [] ? dsl_dir_tempreserve_clear+0xbd/0xf1 [zfs]
[4879179.204346] [] ? cv_wait_common+0xb9/0x141 [spl]
[4879179.204350] [] ? autoremove_wake_function+0x0/0x2e
[4879179.204376] [] ? txg_wait_open+0xd4/0xf3 [zfs]
[4879179.204400] [] ? zfs_create+0x3df/0x699 [zfs]
[4879179.204407] [] ? kmem_alloc_debug+0x142/0x2cd [spl]
[4879179.204429] [] ? zpl_create+0x94/0xc0 [zfs]
[4879179.204434] [] ? generic_permission+0xe/0x8a
[4879179.204437] [] ? vfs_create+0x6d/0x89
[4879179.204441] [] ? do_filp_open+0x31e/0x94b
[4879179.204445] [] ? handle_mm_fault+0x3b8/0x80f
[4879179.204450] [] ? do_sys_open+0x55/0xfc
[4879179.204453] [] ? system_call_fastpath+0x16/0x1b
[4879179.204456] rsync S ffff88000c28fa20 0 23660 26441 0x00000000
[4879179.204461] ffff88063e478000 0000000000000086 0000000000000000 ffffffff81192ddc
[4879179.204465] 0000000000000000 ffff8803370e3978 000000000000f9e0 ffff8803370e3fd8
[4879179.204469] 0000000000015780 0000000000015780 ffff88030454b170 ffff88030454b468
[4879179.204473] Call Trace:
[4879179.204477] [] ? rb_insert_color+0xba/0xe2
[4879179.204481] [] ? __hrtimer_start_range_ns+0x22f/0x242
[4879179.204485] [] ? schedule_hrtimeout_range+0xcd/0x112
[4879179.204489] [] ? hrtimer_wakeup+0x0/0x22
[4879179.204493] [] ? schedule_hrtimeout_range+0xaf/0x112
[4879179.204497] [] ? poll_schedule_timeout+0x31/0x4f
[4879179.204501] [] ? do_select+0x50b/0x554
[4879179.204505] [] ? __pollwait+0x0/0xd6
[4879179.204509] [] ? pollwake+0x0/0x59
[4879179.204513] [] ? __switch_to+0x285/0x297
[4879179.204517] [] ? radix_tree_delete+0xbf/0x1ba
[4879179.204521] [] ? thread_return+0x79/0xe0
[4879179.204525] [] ? blk_peek_request+0x199/0x1ad
[4879179.204529] [] ? read_tsc+0xa/0x20
[4879179.204532] [] ? cpumask_any_but+0x28/0x34
[4879179.204536] [] ? core_sys_select+0x184/0x21e
[4879179.204540] [] ? __lock_page+0x5d/0x63
[4879179.204545] [] ? handle_mm_fault+0x7aa/0x80f
[4879179.204549] [] ? ktime_get_ts+0x68/0xb2
[4879179.204552] [] ? read_tsc+0xa/0x20
[4879179.204556] [] ? ktime_get_ts+0x68/0xb2
[4879179.204559] [] ? sys_select+0x92/0xbb
[4879179.204563] [] ? system_call_fastpath+0x16/0x1b
[4879179.204566] rsync D ffff8802ee90c6a0 0 23661 23660 0x00000000
[4879179.204570] ffff8802ee90c6a0 0000000000000082 50749dfa00000000 0000000000035a6d
[4879179.204574] 00005c6d00000000 000000000000023d 000000000000f9e0 ffff8804e23f7fd8
[4879179.204578] 0000000000015780 0000000000015780 ffff880637011530 ffff880637011828
[4879179.204582] Call Trace:
[4879179.204589] [] ? kmem_free_debug+0xc5/0x10d [spl]
[4879179.204613] [] ? dsl_dir_tempreserve_clear+0xbd/0xf1 [zfs]
[4879179.204620] [] ? cv_wait_common+0xb9/0x141 [spl]
[4879179.204624] [] ? autoremove_wake_function+0x0/0x2e
[4879179.204650] [] ? txg_wait_open+0xd4/0xf3 [zfs]
[4879179.204674] [] ? zfs_setattr+0xad1/0x13d7 [zfs]
[4879179.204681] [] ? kmem_alloc_debug+0x161/0x2cd [spl]
[4879179.204685] [] ? __kmalloc+0x12f/0x141
[4879179.204691] [] ? kmem_alloc_debug+0x142/0x2cd [spl]
[4879179.204698] [] ? kmem_alloc_debug+0x142/0x2cd [spl]
[4879179.204702] [] ? capable+0x22/0x41
[4879179.204725] [] ? zpl_setattr+0xd2/0xf6 [zfs]
[4879179.204729] [] ? notify_change+0x180/0x2c5
[4879179.204733] [] ? chown_common+0x66/0x7c
[4879179.204736] [] ? sys_lchown+0x43/0x61
[4879179.204740] [] ? system_call_fastpath+0x16/0x1b
[4879179.204742] rsync S ffff88034acefb30 0 23749 26441 0x00000000
[4879179.204747] ffff88033e63c6a0 0000000000000082 ffff8801d6057938 ffff8801d6057934
[4879179.204751] 0000000000000000 ffff8801d6057978 000000000000f9e0 ffff8801d6057fd8
[4879179.204755] 0000000000015780 0000000000015780 ffff8801b383bf90 ffff8801b383c288
[4879179.204759] Call Trace:
[4879179.204763] [] ? __hrtimer_start_range_ns+0x22f/0x242
[4879179.204767] [] ? schedule_hrtimeout_range+0xcd/0x112
[4879179.204771] [] ? hrtimer_wakeup+0x0/0x22
[4879179.204775] [] ? schedule_hrtimeout_range+0xaf/0x112
[4879179.204779] [] ? poll_schedule_timeout+0x31/0x4f
[4879179.204783] [] ? do_select+0x50b/0x554
[4879179.204787] [] ? __pollwait+0x0/0xd6
[4879179.204790] [] ? pollwake+0x0/0x59
[4879179.204795] [] ? blk_peek_request+0x199/0x1ad
[4879179.204803] [] ? scsi_dispatch_cmd+0x1cf/0x23c [scsi_mod]
[4879179.204808] [] ? scm_recv+0x33/0xaf
[4879179.204811] [] ? skb_dequeue+0x50/0x58
[4879179.204815] [] ? unix_stream_recvmsg+0x59c/0x5d3
[4879179.204819] [] ? update_curr+0xa6/0x147
[4879179.204823] [] ? thread_return+0x79/0xe0
[4879179.204827] [] ? sock_aio_read+0xb9/0xc4
[4879179.204831] [] ? core_sys_select+0x184/0x21e
[4879179.204835] [] ? autoremove_wake_function+0x0/0x2e
[4879179.204839] [] ? ktime_get_ts+0x68/0xb2
[4879179.204843] [] ? read_tsc+0xa/0x20
[4879179.204846] [] ? ktime_get_ts+0x68/0xb2
[4879179.204850] [] ? sys_select+0x92/0xbb
[4879179.204854] [] ? system_call_fastpath+0x16/0x1b
[4879179.204856] rsync D ffff880304548000 0 23770 23749 0x00000000
[4879179.204861] ffff880304548000 0000000000000086 0000000000000000 00000000810412ee
[4879179.204865] ffff88000c235780 0000000000000001 000000000000f9e0 ffff880588b2bfd8
[4879179.204869] 0000000000015780 0000000000015780 ffff880637013f90 ffff880637014288
[4879179.204873] Call Trace:
[4879179.204876] [] ? __kmalloc+0x12f/0x141
[4879179.204882] [] ? cv_wait_common+0xb9/0x141 [spl]
[4879179.204885] [] ? autoremove_wake_function+0x0/0x2e
[4879179.204909] [] ? txg_wait_open+0xd4/0xf3 [zfs]
[4879179.204931] [] ? dmu_tx_assign+0x108/0x378 [zfs]
[4879179.204957] [] ? zfs_rmnode+0x187/0x2e3 [zfs]
[4879179.204982] [] ? zfs_zinactive+0x9e/0xf7 [zfs]
[4879179.205007] [] ? zfs_inactive+0x122/0x15e [zfs]
[4879179.205029] [] ? zpl_inode_delete+0x0/0x58 [zfs]
[4879179.205052] [] ? zpl_inode_delete+0x0/0x58 [zfs]
[4879179.205056] [] ? clear_inode+0x79/0xd0
[4879179.205059] [] ? generic_delete_inode+0xdc/0x168
[4879179.205063] [] ? d_kill+0x40/0x61
[4879179.205067] [] ? dput+0x152/0x15e
[4879179.205070] [] ? sys_renameat+0x1a8/0x20d
[4879179.205077] [] ? kmem_free_debug+0xc5/0x10d [spl]
[4879179.205080] [] ? notify_change+0x2b3/0x2c5
[4879179.205085] [] ? mntput_no_expire+0x23/0xee
[4879179.205089] [] ? sys_fchmodat+0xa8/0xb6
[4879179.205092] [] ? system_call_fastpath+0x16/0x1b
[4879179.205095] rsync S ffff88000c22fa20 0 23828 26441 0x00000000
[4879179.205100] ffff88063e44b880 0000000000000082 0000000000000000 ffffffff81192ddc
[4879179.205103] 0000000000000000 ffff8802526f9978 000000000000f9e0 ffff8802526f9fd8
[4879179.205107] 0000000000015780 0000000000015780 ffff88030c9d9c40 ffff88030c9d9f38
[4879179.205111] Call Trace:
[4879179.205115] [] ? rb_insert_color+0xba/0xe2
[4879179.205119] [] ? __hrtimer_start_range_ns+0x22f/0x242
[4879179.205123] [] ? hrtimer_try_to_cancel+0x3a/0x43
[4879179.205128] [] ? schedule_hrtimeout_range+0xcd/0x112
[4879179.205131] [] ? hrtimer_wakeup+0x0/0x22
[4879179.205135] [] ? schedule_hrtimeout_range+0xaf/0x112
[4879179.205139] [] ? poll_schedule_timeout+0x31/0x4f
[4879179.205143] [] ? do_select+0x50b/0x554
[4879179.205147] [] ? __pollwait+0x0/0xd6
[4879179.205151] [] ? pollwake+0x0/0x59
[4879179.205155] [] ? __switch_to+0x285/0x297
[4879179.205160] [] ? finish_task_switch+0x3a/0xaf
[4879179.205163] [] ? thread_return+0x79/0xe0
[4879179.205168] [] ? blk_peek_request+0x199/0x1ad
[4879179.205172] [] ? read_tsc+0xa/0x20
[4879179.205175] [] ? ktime_get_ts+0x68/0xb2
[4879179.205179] [] ? core_sys_select+0x184/0x21e
[4879179.205184] [] ? __lock_page+0x5d/0x63
[4879179.205188] [] ? wake_bit_function+0x0/0x23
[4879179.205192] [] ? swap_info_get+0x6a/0xa3
[4879179.205196] [] ? handle_mm_fault+0x6da/0x80f
[4879179.205200] [] ? ktime_get_ts+0x68/0xb2
[4879179.205203] [] ? read_tsc+0xa/0x20
[4879179.205207] [] ? ktime_get_ts+0x68/0xb2
[4879179.205210] [] ? sys_select+0x92/0xbb
[4879179.205214] [] ? system_call_fastpath+0x16/0x1b
[4879179.205217] rsync D ffff88063ca1eaa0 0 23831 23828 0x00000000
[4879179.205221] ffff88033d75a350 0000000000000086 0000000000000000 00000000810412ee
[4879179.205225] ffff88000c295780 0000000000000008 000000000000f9e0 ffff88028b081fd8
[4879179.205230] 0000000000015780 0000000000015780 ffff8802c816dbd0 ffff8802c816dec8
[4879179.205234] Call Trace:
[4879179.205240] [] ? cv_wait_common+0xb9/0x141 [spl]
[4879179.205244] [] ? autoremove_wake_function+0x0/0x2e
[4879179.205270] [] ? txg_wait_open+0xd4/0xf3 [zfs]
[4879179.205294] [] ? zfs_rename+0x94c/0xc11 [zfs]
[4879179.205299] [] ? tsd_hash_search+0xad/0x1a3 [spl]
[4879179.205320] [] ? zpl_rename+0x46/0x5e [zfs]
[4879179.205324] [] ? vfs_rename+0x213/0x3c5
[4879179.205328] [] ? sys_renameat+0x191/0x20d
[4879179.205332] [] ? handle_mm_fault+0x3b8/0x80f
[4879179.205336] [] ? notify_change+0x2b3/0x2c5
[4879179.205340] [] ? mntput_no_expire+0x23/0xee
[4879179.205344] [] ? do_page_fault+0x2e0/0x2fc
[4879179.205348] [] ? system_call_fastpath+0x16/0x1b
[4879179.205350] rsync R running task 0 23855 26441 0x00000000
[4879179.205354] 0000000000000001 0000000000000004 ffff880310a25118 ffffffff810bb279
[4879179.205358] 0000000000000000 ffffffff810bb2f1 0000000000000282 ffff8803fa2a35a0
[4879179.205363] 0000000000000018 ffffffffa036a1b4 ffff88033c2a7100 ffff88033c2a7400
[4879179.205367] Call Trace:
[4879179.205371] [] ? determine_dirtyable_memory+0xd/0x1d
[4879179.205375] [] ? get_dirty_limits+0x1d/0x259
[4879179.205381] [] ? spl_slab_reclaim+0x86/0x347 [spl]
[4879179.205387] [] ? spl_kmem_cache_reap_now+0x114/0x175 [spl]
[4879179.205390] [] ? down_read+0x9/0x19
[4879179.205395] [] ? __spl_kmem_cache_generic_shrinker+0x6f/0xbe [spl]
[4879179.205401] [] ? spl_kmem_cache_generic_shrinker+0x19/0x1e [spl]
[4879179.205405] [] ? shrink_slab+0xe0/0x153
[4879179.205409] [] ? try_to_free_pages+0x26a/0x38e
[4879179.205412] [] ? isolate_pages_global+0x0/0x20f
[4879179.205439] [] ? sa_build_index+0x29a/0x2ae [zfs]
[4879179.205443] [] ? __alloc_pages_nodemask+0x3d4/0x5fc
[4879179.205447] [] ? new_slab+0x42/0x1ca
[4879179.205451] [] ? __slab_alloc+0x1f0/0x39b
[4879179.205458] [] ? kmalloc_nofail+0x2b/0x3b [spl]
[4879179.205477] [] ? dbuf_read+0x10d/0x4b9 [zfs]
[4879179.205481] [] ? __kmalloc+0xf1/0x141
[4879179.205487] [] ? kmalloc_nofail+0x2b/0x3b [spl]
[4879179.205493] [] ? kmalloc_nofail+0x2b/0x3b [spl]
[4879179.205499] [] ? kmem_alloc_debug+0x161/0x2cd [spl]
[4879179.205520] [] ? dmu_buf_hold+0xd4/0xe4 [zfs]
[4879179.205524] [] ? down_read+0x9/0x19
[4879179.205549] [] ? zap_lockdir+0x371/0x541 [zfs]
[4879179.205576] [] ? zap_name_alloc+0x3e/0x119 [zfs]
[4879179.205601] [] ? zap_lookup_norm+0x5a/0x153 [zfs]
[4879179.205627] [] ? zap_lookup+0x2a/0x2f [zfs]
[4879179.205653] [] ? zfs_dirent_lock+0x3b7/0x47a [zfs]
[4879179.205679] [] ? zfs_zaccess_aces_check+0x1ec/0x21c [zfs]
[4879179.205704] [] ? zfs_dirlook+0x1cb/0x228 [zfs]
[4879179.205728] [] ? zfs_lookup+0x27a/0x2c0 [zfs]
[4879179.205732] [] ? d_rehash+0x2a/0x3a
[4879179.205754] [] ? zpl_lookup+0x47/0x80 [zfs]
[4879179.205758] [] ? d_alloc+0x14b/0x186
[4879179.205762] [] ? do_lookup+0xd3/0x15d
[4879179.205765] [] ? __link_path_walk+0x5a5/0x6f5
[4879179.205769] [] ? path_walk+0x66/0xc9
[4879179.205773] [] ? do_path_lookup+0x20/0x77
[4879179.205776] [] ? user_path_at+0x48/0x79
[4879179.205783] [] ? tsd_exit+0x83/0x193 [spl]
[4879179.205787] [] ? cp_new_stat+0xe9/0xfc
[4879179.205791] [] ? vfs_fstatat+0x2c/0x57
[4879179.205795] [] ? sys_newlstat+0x11/0x30
[4879179.205799] [] ? device_not_available+0x1b/0x20
[4879179.205803] [] ? system_call_fastpath+0x16/0x1b
[4879179.205805] rsync S ffff88034ac6fa20 0 23856 23855 0x00000000
[4879179.205810] ffff88033e4ae2e0 0000000000000086 0000000000000000 0000000000000282
[4879179.205814] 0000000000000000 ffff8803c3d13978 000000000000f9e0 ffff8803c3d13fd8
[4879179.205818] 0000000000015780 0000000000015780 ffff880395caa350 ffff880395caa648
[4879179.205822] Call Trace:
[4879179.205826] [

@aarrpp
Copy link

aarrpp commented Oct 19, 2012

I tried to reproduce this bug in a virtual environment, but everything worked correctly. I think the reason this is a lot of metadata. I have several backup servers with pool sizes from 1-4TB. On each of them ~ 1000 datasets with snapshots. And each of them has this bug. To reproduce it enough to run parallel find on all datasets. I must periodically run echo 3>/proc/sys/vm/drop_caches to clean memory, but it does not always help.
How can i help you to resolve this issue?

@ryao
Copy link
Contributor

ryao commented Oct 26, 2012

@behlendorf I had a discussion with @DeHackEd about this in IRC. It seems that he can only reproduce this issue when L2ARC is present. He also reports that my code from a few months ago that depended upon the kernel VM patch is more stable. A difference that comes to mind is the following change to the SPL:

diff --git a/include/sys/kmem.h b/include/sys/kmem.h
index 796af44..633278b 100644
--- a/include/sys/kmem.h
+++ b/include/sys/kmem.h
@@ -41,7 +41,7 @@
 /*
  * Memory allocation interfaces
  */
-#define KM_SLEEP   GFP_KERNEL  /* Can sleep, never fails */
+#define KM_SLEEP   (GFP_NOIO | __GFP_HIGH) /* Can sleep, never fails */
 #define KM_NOSLEEP GFP_ATOMIC  /* Can not sleep, may fail */
 #define KM_PUSHPAGE    (GFP_NOIO | __GFP_HIGH) /* Use reserved memory */
 #define KM_NODEBUG __GFP_NOWARN    /* Suppress warnings */

I have asked him to test it.

@ryao
Copy link
Contributor

ryao commented Oct 26, 2012

Looking at the original arcstats that @DeHackEd posted, it looks like arc_meta_limit is not honored when L2ARC is present. I do not have time to look through the code right now, but I think that this has to do with the L2ARC-only buffers used to keep track of the L2ARC map. Here are some references:

The current version of the ZFS ARC splits the lists above into separate data and metadata lists, and also has a list for anonymous buffers and one for L2ARC only buffers (which I added when I developed the L2ARC).

http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/

Nexenta by default allocates 1/4 of your ARC cache (RAM) to metadata caching.  Your L2ARC map is considered metadata.

http://www.zfsbuild.com/2011/11/18/dedupe-be-careful/

It feels to me like memory management of the L2ARC map is not being done correctly under sustained load conditions.

@behlendorf
Copy link
Contributor

@ryao @DeHackEd Ahh! That makes perfect sense. Now I'm kicking myself for not exposing the l2c_only arc_state_t in arcstats when I originally did it for the other arc states. The above patch corrects that issue. Can you please give it a try and verify that the new l2c_only_* entries in arcstats account for most of the ARC meta data.

Now exactly what do it about it is a different question. At the moment there is no mechanism for memory pressure on the system to cause these headers to be evicted. Doing so would effectively abandon the data buffers in the L2ARC. This is safe to do but not exactly optimal for performance. There is probably a case to be made that this is preferable as a last resort when the system needs memory.

@aarrpp
Copy link

aarrpp commented Oct 26, 2012

In my case l2 cache is not used.
I did some traces and found that the function arc_adjust_meta from arc_adapt thread does not release memory under heavy load. And ARC continues to grow.

cat /proc/spl/kstat/zfs/arcstats 
18 1 0x01 77 3696 1385740506293 550072514185777
name                            type data
hits                            4    1776619018
misses                          4    376121117
demand_data_hits                4    1
demand_data_misses              4    3663549
demand_metadata_hits            4    1460578271
demand_metadata_misses          4    223287392
prefetch_data_hits              4    0
prefetch_data_misses            4    1
prefetch_metadata_hits          4    316040746
prefetch_metadata_misses        4    149170175
mru_hits                        4    752453116
mru_ghost_hits                  4    79549979
mfu_hits                        4    708235442
mfu_ghost_hits                  4    38397570
deleted                         4    284816991
recycle_miss                    4    287211321
mutex_miss                      4    23972601
evict_skip                      4    155959865459
evict_l2_cached                 4    0
evict_l2_eligible               4    958429659648
evict_l2_ineligible             4    2156683706880
hash_elements                   4    105127
hash_elements_max               4    359810
hash_collisions                 4    82501288
hash_chains                     4    9173
hash_chain_max                  4    8
p                               4    786415616
c                               4    838860800
c_min                           4    838860800
c_max                           4    1000000000
size                            4    3559781976
hdr_size                        4    51919712
data_size                       4    1377200128
other_size                      4    2130662136
anon_size                       4    209853952
anon_evict_data                 4    0
anon_evict_metadata             4    0
mru_size                        4    853744128
mru_evict_data                  4    0
mru_evict_metadata              4    26845184
mru_ghost_size                  4    122683904
mru_ghost_evict_data            4    92719104
mru_ghost_evict_metadata        4    29964800
mfu_size                        4    313602048
mfu_evict_data                  4    0
mfu_evict_metadata              4    1341440
mfu_ghost_size                  4    158740992
mfu_ghost_evict_data            4    0
mfu_ghost_evict_metadata        4    158740992
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_hdr_miss              4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_free_on_write                4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_hdr_size                     4    0
memory_throttle_count           4    31594
memory_direct_count             4    1527614
memory_indirect_count           4    0
arc_no_grow                     4    1
arc_tempreserve                 4    1441792
arc_loaned_bytes                4    0
arc_prune                       4    476481
arc_meta_used                   4    3392569944
arc_meta_limit                  4    500000000
arc_meta_max                    4    9300442064


# cat /proc/loadavg 
48.54 49.87 42.61 2/1002 19639
# ps ax | grep rsync | wc -l
86
# zfs list | wc -l
438


# zpool list
NAME      SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
backups  9.06T  1.56T  7.50T    17%  1.00x  ONLINE  -

# zpool status -v
  pool: backups
 state: ONLINE
 scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    backups     ONLINE       0     0     0
      sdb       ONLINE       0     0     0

errors: No known data errors

@behlendorf
Copy link
Contributor

@arp096 You have a different but equally interesting case. In your case the limit was exceeded because very little of your meta data can be safely evicted, it appears that most of it is of type "other". This likely means it's holding dnodes and bonus buffers for lots of active files. I would expect that when you back off the workload it quickly brings the usage under control?

@ryao
Copy link
Contributor

ryao commented Oct 26, 2012

In most of the arcstats being shown here, L2ARC does not appear to be used. It is only the most recent one from @DeHackEd that shows use. This issue could be describing two separate issues that happen to have the same symptoms.

@aarrpp
Copy link

aarrpp commented Oct 27, 2012

@behlendorf Yes, that's right. And bug is not present if the parallel rsync processes is less than ten.

@DeHackEd
Copy link
Contributor Author

That does describe my situation somewhat as well. When the number of rsync processes is relatively low (best guess of the maximum safe range is 10-15) the system is stable and the ARC size is honoured, L2ARC or not. When It's much higher (around 20 maybe?) I do get the memory usage problem.

What I can't be sure about is how L2ARC factors into things. After @ryao asked me to try stressing it with the L2ARC removed I wasn't able to. Just now I've tried putting medium pressure on (10 rsyncs + send/recv) with the L2ARC and it starts tipping over sometimes (4300 MB of ARC used out of 4096 allowed, just floating above the line but clearly moreso than should be tolerated). Remove the L2arc yet raise the pressure (15 rsyncs + send/recv) and it behaves much better.

I always assumed the cache was responsible for changing the load average from 15 processes in the D state most of the time (waiting on the disk IO) to 15 processes in the D state a lot less of the time (not waiting so much) but now I'm completely confused.

@aarrpp
Copy link

aarrpp commented Oct 27, 2012

It is strange that the metadata cache continues to grow even when turned off.

zfs set primarycache=none backups
echo 3 >/proc/sys/vm/drop_caches

And after some minutes
cat /proc/spl/kstat/zfs/arcstats
18 1 0x01 77 3696 1385740506293 637293224967258
name type data
hits 4 2067885441
misses 4 442672429
demand_data_hits 4 1
demand_data_misses 4 4061515
demand_metadata_hits 4 1701700132
demand_metadata_misses 4 264854116
prefetch_data_hits 4 0
prefetch_data_misses 4 1
prefetch_metadata_hits 4 366185308
prefetch_metadata_misses 4 173756797
mru_hits 4 883119982
mru_ghost_hits 4 91909389
mfu_hits 4 818691046
mfu_ghost_hits 4 49558553
deleted 4 331921146
recycle_miss 4 338637377
mutex_miss 4 28074479
evict_skip 4 182158175305
evict_l2_cached 4 0
evict_l2_eligible 4 979257182720
evict_l2_ineligible 4 2649794910720
hash_elements 4 471925
hash_elements_max 4 473143
hash_collisions 4 95524572
hash_chains 4 119278
hash_chain_max 4 8
p 4 52428800
c 4 838860800
c_min 4 838860800
c_max 4 1000000000
size 4 4623785504
hdr_size 4 139686000
data_size 4 1556584960
other_size 4 2927519664
anon_size 4 44545536
anon_evict_data 4 0
anon_evict_metadata 4 0
mru_size 4 1067032576
mru_evict_data 4 0
mru_evict_metadata 4 1146880
mru_ghost_size 4 688640
mru_ghost_evict_data 4 409088
mru_ghost_evict_metadata 4 279552
mfu_size 4 445050880
mfu_evict_data 4 0
mfu_evict_metadata 4 16384
mfu_ghost_size 4 741977600
mfu_ghost_evict_data 4 0
mfu_ghost_evict_metadata 4 741977600
l2_hits 4 0
l2_misses 4 0
l2_feeds 4 0
l2_rw_clash 4 0
l2_read_bytes 4 0
l2_write_bytes 4 0
l2_writes_sent 4 0
l2_writes_done 4 0
l2_writes_error 4 0
l2_writes_hdr_miss 4 0
l2_evict_lock_retry 4 0
l2_evict_reading 4 0
l2_free_on_write 4 0
l2_abort_lowmem 4 0
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 0
l2_hdr_size 4 0
memory_throttle_count 4 31594
memory_direct_count 4 2304247
memory_indirect_count 4 0
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 560316
arc_meta_used 4 4608719952
arc_meta_limit 4 500000000
arc_meta_max 4 9300442064

@behlendorf
Copy link
Contributor

It doesn't help that "metadata" is rather vaguely defined in this context.

@sciclon
Copy link

sciclon commented Nov 2, 2012

i had the same issue, when i run several rsync's in a 1TB pool under KVM-qemu system (VM) I could see on the host that my virtual machine was eating all the cpus assigned to it.
i limited ARC memory 2 days ago and everthing looks good for now.
options zfs zfs_arc_max=3221225472 ### /etc/modprobe.d/zfs.conf

Besides I'm using Dedup in this pool, I've been reading about it and I understand it uses up to 25% of this memory. I calculated the amount of neccesary memory to keep the dedup table within the ARC memory. my question is : Does DDT table stay in this memory?

@DeHackEd
Copy link
Contributor Author

Here's my most recent run of the backup cycle, which wasn't even at full speed ("only" 15 copies of rsync) and it crashed in record time. This time I was graphing a whole bunch of the arcstats from the spl data.

System information:

  • System: Dell Poweredge 1950
  • CPU: dual-core 2.33 GHz (Xeon 5140)
  • RAM: 16 GB, ECC
  • Disk: 11 TB (hardware RAID-6) for main drive, 80 GB SSD partition for L2ARC (initially empty)
  • OS: RHEL 6.3
  • Kernel 3.4.18 (no other patches) custom build
  • zpool: version=26 (zfs-fuse compatibility), ashift=9
  • zfs: atime=off, compression=gzip-9, secondarycache=metadata

ZFS: 4c837f0
SPL: openzfs/spl@366346c

(Could be more up-to-date at this point)

  pool: poolname1
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
    still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
    pool will no longer be accessible on older software versions.
 scan: scrub repaired 0 in 59h56m with 0 errors on Fri Sep 28 23:48:31 2012
config:
    NAME                                            STATE     READ WRITE CKSUM
    poolname1                                       ONLINE       0     0     0
      scsi-36001e4f022847f0016c30400b49f5666        ONLINE       0     0     0
    cache
      scsi-360000000000000001786fd14bcbe933c-part1  ONLINE       0     0     0
errors: No known data errors

Graphs:
http://dehacked.2y.net/memory-recent.png
http://dehacked.2y.net/arc-recent.png
http://dehacked.2y.net/evict-recent.png

Data polling interval is 10 seconds at the RRD level.

@behlendorf
Copy link
Contributor

@DeHackEd When you say it crashed do you mean the system paniced or became unstable in some way, or that we just exceeded that meta limit? I believe I see why that's happening now regarding the meta limit but I'd like to better understand the instability your referring too.

@behlendorf
Copy link
Contributor

@DeHackEd Can you please try 6c5207088f732168569d1a0b29f5f949b91bb503 which I believe should address your issue. Basically it was possible under a very heavy meta data workload to move all the meta data just from the MRU/MFU on to the MRU/MFU ghost lists. It would then never be free'd from there if there was data on the ghost list which could be reclaimed instead. I'd be interested to hear if this change improves your performance as well.

@DeHackEd
Copy link
Contributor Author

While it does seem to help, it hasn't completely eliminated the issue. At best it just delays the inevitable.

arp096's method of using drop_caches does seem to keep it stable though. Not a great workaround but still...

@aarrpp
Copy link

aarrpp commented Nov 17, 2012

I tried the patch, but nothing has changed

cat /proc/spl/kstat/zfs/arcstats
4 1 0x01 77 3696 2403385908311841 2403855656867260
name type data
hits 4 38524104
misses 4 368775
demand_data_hits 4 0
demand_data_misses 4 22
demand_metadata_hits 4 37250368
demand_metadata_misses 4 159192
prefetch_data_hits 4 0
prefetch_data_misses 4 0
prefetch_metadata_hits 4 1273736
prefetch_metadata_misses 4 209561
mru_hits 4 1358445
mru_ghost_hits 4 47602
mfu_hits 4 36023113
mfu_ghost_hits 4 19495
deleted 4 259553
recycle_miss 4 602959
mutex_miss 4 62668
evict_skip 4 564914143
evict_l2_cached 4 0
evict_l2_eligible 4 19083264
evict_l2_ineligible 4 4364881408
hash_elements 4 42650
hash_elements_max 4 42650
hash_collisions 4 15009
hash_chains 4 1629
hash_chain_max 4 4
p 4 144727296
c 4 1000000000
c_min 4 838860800
c_max 4 1000000000
size 4 4533766376
hdr_size 4 25180480
data_size 4 1830418432
other_size 4 2678167464
anon_size 4 8264704
anon_evict_data 4 0
anon_evict_metadata 4 0
mru_size 4 966695424
mru_evict_data 4 0
mru_evict_metadata 4 21205504
mru_ghost_size 4 57323008
mru_ghost_evict_data 4 118272
mru_ghost_evict_metadata 4 57204736
mfu_size 4 855458304
mfu_evict_data 4 0
mfu_evict_metadata 4 9123328
mfu_ghost_size 4 1480704
mfu_ghost_evict_data 4 304640
mfu_ghost_evict_metadata 4 1176064
l2_hits 4 0
l2_misses 4 0
l2_feeds 4 0
l2_rw_clash 4 0
l2_read_bytes 4 0
l2_write_bytes 4 0
l2_writes_sent 4 0
l2_writes_done 4 0
l2_writes_error 4 0
l2_writes_hdr_miss 4 0
l2_evict_lock_retry 4 0
l2_evict_reading 4 0
l2_free_on_write 4 0
l2_abort_lowmem 4 0
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 0
l2_hdr_size 4 0
memory_throttle_count 4 0
memory_direct_count 4 0
memory_indirect_count 4 0
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 79
arc_meta_used 4 4533766376
arc_meta_limit 4 500000000
arc_meta_max 4 4533770472

@DeHackEd
Copy link
Contributor Author

Yet more output from arcstats, /proc/meminfo and some IO stats, immediately before and after doing a drop_caches after I started hitting the redline of memory usage.

http://pastebin.com/8Wh0zeKG
http://pastebin.com/777anKPq

@DeHackEd
Copy link
Contributor Author

A workaround for issue #1101 seems to fix this issue as well under my typical workload.

@aarrpp
Copy link

aarrpp commented Nov 21, 2012

Yes, this workaround works for me. But I set ZFS_OBJ_MTX_SZ to 10240. It works fine with ~500 concurrent processes. Thank You!

@behlendorf
Copy link
Contributor

I'm closing this long long long issue because a fix was merged for the deadlock, see issue #1101. The remaining memory management concerns I'll open a new more succinct issue to track.

@durandalTR
Copy link

@behlendorf: Do you have a reference to that new memory management issue?

@behlendorf
Copy link
Contributor

See #1132, it's largely a place holder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Memory Management kernel memory management
Projects
None yet
Development

No branches or pull requests

8 participants