Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RC8 release - rsync crash #642

Closed
tstudios opened this issue Apr 6, 2012 · 28 comments
Closed

RC8 release - rsync crash #642

tstudios opened this issue Apr 6, 2012 · 28 comments
Milestone

Comments

@tstudios
Copy link

tstudios commented Apr 6, 2012

Replaced rc6 code with patches today. Multiple rsync streams to the pool are the current operations. CentOS6.0 kernel: Linux tsdpl.turner.com 2.6.32-71.29.1.el6.x86_64 #1 SMP Mon Jun 27 19:49:27 BST 2011 x86_64 x86_64 x86_64 GNU/Linux. SPL and ZFS RPMs made on like kernel on like OS. Installed there and here after rpm -e all old spl and zfs modules. Dmsg output below:
[root@tsdpl ~]# cat /root/dmesg.txt
INFO: task kswapd0:82 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kswapd0 D ffffffffffffffff 0 82 2 0x00000000
ffff8808083e7ab0 0000000000000046 ffff8808083e7a40 ffffffffa0439c24
ffff8808083e7a40 ffff8807b71a1e70 0000000000000000 ffffffff81013c8e
ffff8808083e5a98 ffff8808083e7fd8 0000000000010518 ffff8808083e5a98
Call Trace:
[] ? arc_buf_remove_ref+0xd4/0x120 [zfs]
[] ? apic_timer_interrupt+0xe/0x20
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x2b/0x50
[] zfs_zinactive+0x7e/0x110 [zfs]
[] zfs_inactive+0x87/0x200 [zfs]
[] zpl_clear_inode+0xe/0x10 [zfs]
[] clear_inode+0x8f/0x110
[] dispose_list+0x40/0x120
[] shrink_icache_memory+0x274/0x2e0
[] shrink_slab+0x13a/0x1a0
[] balance_pgdat+0x54e/0x770
[] ? isolate_pages_global+0x0/0x380
[] kswapd+0x134/0x390
[] ? autoremove_wake_function+0x0/0x40
[] ? kswapd+0x0/0x390
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
INFO: task rsync:3905 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsync D ffffffffffffffff 0 3905 3774 0x00000080
ffff88064d80f268 0000000000000086 0000000000000000 ffff8801161ce2d8
ffff88043dca01e8 ffffffffffffff10 ffffffff81013c8e ffff88064d80f268
ffff8808064e3068 ffff88064d80ffd8 0000000000010518 ffff8808064e3068
Call Trace:
[] ? apic_timer_interrupt+0xe/0x20
[] ? mutex_spin_on_owner+0x9b/0xc0
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x2b/0x50
[] zfs_zinactive+0x7e/0x110 [zfs]
[] zfs_inactive+0x87/0x200 [zfs]
[] zpl_clear_inode+0xe/0x10 [zfs]
[] clear_inode+0x8f/0x110
[] dispose_list+0x40/0x120
[] shrink_icache_memory+0x274/0x2e0
[] shrink_slab+0x13a/0x1a0
[] do_try_to_free_pages+0x2d6/0x500
[] ? get_page_from_freelist+0x15c/0x820
[] try_to_free_pages+0x9f/0x130
[] ? isolate_pages_global+0x0/0x380
[] __alloc_pages_nodemask+0x3ee/0x850
[] alloc_pages_current+0x9a/0x100
[] __get_free_pages+0xe/0x50
[] kv_alloc+0x3f/0xc0 [spl]
[] spl_kmem_cache_alloc+0x500/0xb90 [spl]
[] dnode_create+0x42/0x170 [zfs]
[] dnode_hold_impl+0x3ec/0x550 [zfs]
[] dnode_hold+0x19/0x20 [zfs]
[] dmu_bonus_hold+0x34/0x260 [zfs]
[] ? ifind_fast+0x3c/0xb0
[] sa_buf_hold+0xe/0x10 [zfs]
[] zfs_zget+0xca/0x1e0 [zfs]
[] ? kmem_alloc_debug+0x26b/0x350 [spl]
[] zfs_dirent_lock+0x481/0x550 [zfs]
[] zfs_dirlook+0x8b/0x270 [zfs]
[] ? arc_read+0xad/0x150 [zfs]
[] zfs_lookup+0x2ff/0x350 [zfs]
[] zpl_lookup+0x57/0xc0 [zfs]
[] do_lookup+0x18b/0x220
[] __link_path_walk+0x6f5/0x1040
[] ? __link_path_walk+0x729/0x1040
[] path_walk+0x6a/0xe0
[] do_path_lookup+0x5b/0xa0
[] user_path_at+0x57/0xa0
[] ? current_fs_time+0x27/0x30
[] vfs_fstatat+0x3c/0x80
[] vfs_lstat+0x1e/0x20
[] sys_newlstat+0x24/0x50
[] ? audit_syscall_entry+0x272/0x2a0
[] system_call_fastpath+0x16/0x1b
INFO: task rsync:4636 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsync D ffff88082f828800 0 4636 4610 0x00000080
ffff88042f04d268 0000000000000086 0000000000000000 ffffea00020b3350
ffffea00020b37e8 ffffea00020b3740 ffffffff81013c8e 000000010068e992
ffff880807a1db18 ffff88042f04dfd8 0000000000010518 ffff880807a1db18
Call Trace:
[] ? apic_timer_interrupt+0xe/0x20
[] ? mutex_spin_on_owner+0x9b/0xc0
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x2b/0x50
[] zfs_zinactive+0x7e/0x110 [zfs]
[] zfs_inactive+0x87/0x200 [zfs]
[] zpl_clear_inode+0xe/0x10 [zfs]
[] clear_inode+0x8f/0x110
[] dispose_list+0x40/0x120
[] shrink_icache_memory+0x274/0x2e0
[] shrink_slab+0x13a/0x1a0
[] do_try_to_free_pages+0x2d6/0x500
[] ? get_page_from_freelist+0x15c/0x820
[] try_to_free_pages+0x9f/0x130
[] ? isolate_pages_global+0x0/0x380
[] ? wakeup_kswapd+0x1/0x130
[] __alloc_pages_nodemask+0x3ee/0x850
[] alloc_pages_current+0x9a/0x100
[] __get_free_pages+0xe/0x50
[] kv_alloc+0x3f/0xc0 [spl]
[] spl_kmem_cache_alloc+0x500/0xb90 [spl]
[] dnode_create+0x42/0x170 [zfs]
[] dnode_hold_impl+0x3ec/0x550 [zfs]
[] dnode_hold+0x19/0x20 [zfs]
[] dmu_bonus_hold+0x34/0x260 [zfs]
[] ? ifind_fast+0x3c/0xb0
[] sa_buf_hold+0xe/0x10 [zfs]
[] zfs_zget+0xca/0x1e0 [zfs]
[] ? kmem_alloc_debug+0x26b/0x350 [spl]
[] zfs_dirent_lock+0x481/0x550 [zfs]
[] zfs_dirlook+0x8b/0x270 [zfs]
[] ? tsd_exit+0x5f/0x1c0 [spl]
[] zfs_lookup+0x2ff/0x350 [zfs]
[] zpl_lookup+0x57/0xc0 [zfs]
[] do_lookup+0x18b/0x220
[] __link_path_walk+0x6f5/0x1040
[] ? __link_path_walk+0x729/0x1040
[] path_walk+0x6a/0xe0
[] do_path_lookup+0x5b/0xa0
[] user_path_at+0x57/0xa0
[] ? _atomic_dec_and_lock+0x55/0x80
[] ? cp_new_stat+0xe4/0x100
[] vfs_fstatat+0x3c/0x80
[] vfs_lstat+0x1e/0x20
[] sys_newlstat+0x24/0x50
[] ? audit_syscall_entry+0x272/0x2a0
[] system_call_fastpath+0x16/0x1b
INFO: task kswapd0:82 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kswapd0 D ffffffffffffffff 0 82 2 0x00000000
ffff8808083e7ab0 0000000000000046 ffff8808083e7a40 ffffffffa0439c24
ffff8808083e7a40 ffff8807b71a1e70 0000000000000000 ffffffff81013c8e
ffff8808083e5a98 ffff8808083e7fd8 0000000000010518 ffff8808083e5a98
Call Trace:
[] ? arc_buf_remove_ref+0xd4/0x120 [zfs]
[] ? apic_timer_interrupt+0xe/0x20
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x2b/0x50
[] zfs_zinactive+0x7e/0x110 [zfs]
[] zfs_inactive+0x87/0x200 [zfs]
[] zpl_clear_inode+0xe/0x10 [zfs]
[] clear_inode+0x8f/0x110
[] dispose_list+0x40/0x120
[] shrink_icache_memory+0x274/0x2e0
[] shrink_slab+0x13a/0x1a0
[] balance_pgdat+0x54e/0x770
[] ? isolate_pages_global+0x0/0x380
[] kswapd+0x134/0x390
[] ? autoremove_wake_function+0x0/0x40
[] ? kswapd+0x0/0x390
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
INFO: task khugepaged:84 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
khugepaged D ffff88082f829000 0 84 2 0x00000000
ffff8808083ef8b0 0000000000000046 0000000000000000 ffffea0015a08c80
ffff88080658cb78 ffff8800456d69f0 0000000000000000 000000010069ce87
ffff8808083e4638 ffff8808083effd8 0000000000010518 ffff8808083e4638
Call Trace:
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x2b/0x50
[] zfs_zinactive+0x7e/0x110 [zfs]
[] zfs_inactive+0x87/0x200 [zfs]
[] zpl_clear_inode+0xe/0x10 [zfs]
[] clear_inode+0x8f/0x110
[] dispose_list+0x40/0x120
[] shrink_icache_memory+0x274/0x2e0
[] shrink_slab+0x13a/0x1a0
[] do_try_to_free_pages+0x2d6/0x500
[] ? get_page_from_freelist+0x15c/0x820
[] try_to_free_pages+0x9f/0x130
[] ? isolate_pages_global+0x0/0x380
[] __alloc_pages_nodemask+0x3ee/0x850
[] ? del_timer_sync+0x22/0x30
[] alloc_pages_vma+0x93/0x150
[] ? autoremove_wake_function+0x0/0x40
[] khugepaged+0xa9b/0x1210
[] ? autoremove_wake_function+0x0/0x40
[] ? khugepaged+0x0/0x1210
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
INFO: task txg_quiesce:3508 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_quiesce D ffff88082f829000 0 3508 2 0x00000080
ffff8807bfa25d60 0000000000000046 ffff8807bfa25d28 ffff8807bfa25d24
ffff8807bfa25d30 ffff88082f829000 ffff880045676980 00000001006a26f0
ffff8807eef7b0a8 ffff8807bfa25fd8 0000000000010518 ffff8807eef7b0a8
Call Trace:
[] cv_wait_common+0x9c/0x1a0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] ? __bitmap_weight+0x8c/0xb0
[] __cv_wait+0x13/0x20 [spl]
[] txg_quiesce_thread+0x1eb/0x330 [zfs]
[] ? set_user_nice+0xd7/0x140
[] ? txg_quiesce_thread+0x0/0x330 [zfs]
[] thread_generic_wrapper+0x68/0x80 [spl]
[] ? thread_generic_wrapper+0x0/0x80 [spl]
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
INFO: task rsync:3905 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsync D ffffffffffffffff 0 3905 3774 0x00000080
ffff88064d80f268 0000000000000086 0000000000000000 ffff8801161ce2d8
ffff88043dca01e8 ffffffffffffff10 ffffffff81013c8e ffff88064d80f268
ffff8808064e3068 ffff88064d80ffd8 0000000000010518 ffff8808064e3068
Call Trace:
[] ? apic_timer_interrupt+0xe/0x20
[] ? mutex_spin_on_owner+0x9b/0xc0
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x2b/0x50
[] zfs_zinactive+0x7e/0x110 [zfs]
[] zfs_inactive+0x87/0x200 [zfs]
[] zpl_clear_inode+0xe/0x10 [zfs]
[] clear_inode+0x8f/0x110
[] dispose_list+0x40/0x120
[] shrink_icache_memory+0x274/0x2e0
[] shrink_slab+0x13a/0x1a0
[] do_try_to_free_pages+0x2d6/0x500
[] ? get_page_from_freelist+0x15c/0x820
[] try_to_free_pages+0x9f/0x130
[] ? isolate_pages_global+0x0/0x380
[] __alloc_pages_nodemask+0x3ee/0x850
[] alloc_pages_current+0x9a/0x100
[] __get_free_pages+0xe/0x50
[] kv_alloc+0x3f/0xc0 [spl]
[] spl_kmem_cache_alloc+0x500/0xb90 [spl]
[] dnode_create+0x42/0x170 [zfs]
[] dnode_hold_impl+0x3ec/0x550 [zfs]
[] dnode_hold+0x19/0x20 [zfs]
[] dmu_bonus_hold+0x34/0x260 [zfs]
[] ? ifind_fast+0x3c/0xb0
[] sa_buf_hold+0xe/0x10 [zfs]
[] zfs_zget+0xca/0x1e0 [zfs]
[] ? kmem_alloc_debug+0x26b/0x350 [spl]
[] zfs_dirent_lock+0x481/0x550 [zfs]
[] zfs_dirlook+0x8b/0x270 [zfs]
[] ? arc_read+0xad/0x150 [zfs]
[] zfs_lookup+0x2ff/0x350 [zfs]
[] zpl_lookup+0x57/0xc0 [zfs]
[] do_lookup+0x18b/0x220
[] __link_path_walk+0x6f5/0x1040
[] ? __link_path_walk+0x729/0x1040
[] path_walk+0x6a/0xe0
[] do_path_lookup+0x5b/0xa0
[] user_path_at+0x57/0xa0
[] ? current_fs_time+0x27/0x30
[] vfs_fstatat+0x3c/0x80
[] vfs_lstat+0x1e/0x20
[] sys_newlstat+0x24/0x50
[] ? audit_syscall_entry+0x272/0x2a0
[] system_call_fastpath+0x16/0x1b
INFO: task rsync:4496 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsync D ffff88082f828e00 0 4496 4495 0x00000080
ffff880074b31a78 0000000000000086 0000000000000000 ffff88038cddd378
ffff880026ce9c00 0000001f00000200 ffff88053a65ce18 00000001006a1954
ffff88080447a6b8 ffff880074b31fd8 0000000000010518 ffff88080447a6b8
Call Trace:
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x2b/0x50
[] zfs_mknode+0x139/0xc70 [zfs]
[] ? txg_rele_to_quiesce+0x11/0x20 [zfs]
[] ? dmu_tx_assign+0x3e1/0x480 [zfs]
[] zfs_create+0x59a/0x6f0 [zfs]
[] zpl_create+0xa7/0xe0 [zfs]
[] ? generic_permission+0x5c/0xb0
[] vfs_create+0xb4/0xe0
[] do_filp_open+0xb70/0xd50
[] ? mntput_no_expire+0x30/0x110
[] ? alloc_fd+0x92/0x160
[] do_sys_open+0x69/0x140
[] sys_open+0x20/0x30
[] system_call_fastpath+0x16/0x1b
INFO: task rsync:4636 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsync D ffff88082f828800 0 4636 4610 0x00000080
ffff88042f04d268 0000000000000086 0000000000000000 ffffea00020b3350
ffffea00020b37e8 ffffea00020b3740 ffffffff81013c8e 000000010068e992
ffff880807a1db18 ffff88042f04dfd8 0000000000010518 ffff880807a1db18
Call Trace:
[] ? apic_timer_interrupt+0xe/0x20
[] ? mutex_spin_on_owner+0x9b/0xc0
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x2b/0x50
[] zfs_zinactive+0x7e/0x110 [zfs]
[] zfs_inactive+0x87/0x200 [zfs]
[] zpl_clear_inode+0xe/0x10 [zfs]
[] clear_inode+0x8f/0x110
[] dispose_list+0x40/0x120
[] shrink_icache_memory+0x274/0x2e0
[] shrink_slab+0x13a/0x1a0
[] do_try_to_free_pages+0x2d6/0x500
[] ? get_page_from_freelist+0x15c/0x820
[] try_to_free_pages+0x9f/0x130
[] ? isolate_pages_global+0x0/0x380
[] ? wakeup_kswapd+0x1/0x130
[] __alloc_pages_nodemask+0x3ee/0x850
[] alloc_pages_current+0x9a/0x100
[] __get_free_pages+0xe/0x50
[] kv_alloc+0x3f/0xc0 [spl]
[] spl_kmem_cache_alloc+0x500/0xb90 [spl]
[] dnode_create+0x42/0x170 [zfs]
[] dnode_hold_impl+0x3ec/0x550 [zfs]
[] dnode_hold+0x19/0x20 [zfs]
[] dmu_bonus_hold+0x34/0x260 [zfs]
[

@ryao
Copy link
Contributor

ryao commented Apr 6, 2012

Did you compile your kernel with CONFIG_PREEMPT_VOLUNTARY? It looks like this was caused by that.

@tstudios
Copy link
Author

tstudios commented Apr 6, 2012

Let me elaborate on my just posted previous reply.

Everything "kernel" was from a yum update. I am presuming that if CONFIG_PREEMPT_VOLUNTARY=y is set in the development rpm, then it is truly set in the kernel. Here, I did not compile the kernel.

-----Original Message-----
From: Richard Yao [mailto:[email protected]]
Sent: Fri 4/6/2012 4:28 PM
To: Allgood, Sam
Subject: Re: [zfs] RC8 release - rsync crash (#642)

Did you compile your kernel with CONFIG_PREEMPT_VOLUNTARY? It looks like this was caused by that.


Reply to this email directly or view it on GitHub:
#642 (comment)

@ryao
Copy link
Contributor

ryao commented Apr 6, 2012

You might be able to do zcat /proc/config.gz | grep CONFIG_PRREEMPT to find out if any sort of kernel preemption is set.

@tstudios
Copy link
Author

tstudios commented Apr 9, 2012

Noted that this reply did not make it into the issue notes last Friday.
Trying again....

-----Original Message-----
From: Allgood, Sam
Sent: Friday, April 06, 2012 6:25 PM
To: Richard Yao
Subject: RE: [zfs] RC8 release - rsync crash (#642)

No such file in /proc but:
[root@tsdpl ~]# grep VOLUNTARY /boot/config-2.6.32-71.29.1.el6.x86_64
CONFIG_PREEMPT_VOLUNTARY=y
This file was installed with just the kernel, regardless of -headers or
-development. Correct?

-----Original Message-----
From: Richard Yao
[mailto:reply+i-4008301-678a9198ecc83f5efc33ea3e27db49c92f464718-930819@
reply.github.com]
Sent: Fri 4/6/2012 6:04 PM
To: Allgood, Sam
Subject: Re: [zfs] RC8 release - rsync crash (#642)

You might be able to do zcat /proc/config.gz | grep CONFIG_PRREEMPT to
find out if any sort of kernel preemption is set.


Reply to this email directly or view it on GitHub:
#642 (comment)

@ryao
Copy link
Contributor

ryao commented Apr 9, 2012

That file describes how your kernel is configured. It confirms that your kernel is compiled in a way known to cause problems such as the one you posted.

@tstudios
Copy link
Author

Yes... my thinking is inverted here. I just searched the phrase in all the issues and realized that... finally. I should actually force a CONFIG_PREEMPT_VOLUNTARY=N as opposed to 'not set'. Just to make sure I have no more flawed logic, what is the current recommendation for the CONFIG_PREEMPT_XXX flags. Here are the current kernel defaults. I will re-compile.

-----Original Message-----
From: Richard Yao [mailto:[email protected]]
Sent: Mon 4/9/2012 3:10 PM
To: Allgood, Sam
Subject: Re: [zfs] RC8 release - rsync crash (#642)

That file describes how your kernel is configured. Tt confirms that your kernel is compiled in a way known to cause problems such as the one you posted.


Reply to this email directly or view it on GitHub:
#642 (comment)

@ryao
Copy link
Contributor

ryao commented Apr 10, 2012

Use menuconfig to set "Preemption Model (No Forced Preemption (Server))". Here is what effect that has on your .config file, although I recommend using menuconfig rather than setting this by hand.

zgrep PREEMPT /proc/config.gz

CONFIG_PREEMPT_RCU is not set

CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PREEMPT_NONE=y

CONFIG_PREEMPT_VOLUNTARY is not set

CONFIG_PREEMPT is not set

@pyavdr
Copy link
Contributor

pyavdr commented Apr 10, 2012

Im not sure if this matters, but did you see commit #1f0d8a5 ?

@tstudios
Copy link
Author

For CentOS, my new kernel shows:
[root@tsdpl-bu boot]# grep CONFIG_PREEMPT ./config-2.6.32-71.29.1.el6.LLNL_ZFS.x86_64
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PREEMPT_NONE=y

CONFIG_PREEMPT_VOLUNTARY is not set

CONFIG_PREEMPT is not set

[root@tsdpl-bu boot]
This is running on my configuration and backup system and will be running on my production system tomorrow. I will know soon.

-----Original Message-----
From: Richard Yao [mailto:[email protected]]
Sent: Mon 4/9/2012 11:48 PM
To: Allgood, Sam
Subject: Re: [zfs] RC8 release - rsync crash (#642)

Use menuconfig to set "Preemption Model (No Forced Preemption (Server))". Here is what effect that has on your .config file, although I recommend using menuconfig rather than setting this by hand.

zgrep PREEMPT /proc/config.gz

CONFIG_PREEMPT_RCU is not set

CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PREEMPT_NONE=y

CONFIG_PREEMPT_VOLUNTARY is not set

CONFIG_PREEMPT is not set


Reply to this email directly or view it on GitHub:
#642 (comment)

@ryao
Copy link
Contributor

ryao commented Apr 11, 2012

@tstudios make sure that you have the patches that were committed today.

@tstudios
Copy link
Author

Just had a crash with the new kernel. Did not pick up the patches committed 04/11/12. However, the symptom I'm seeing with rsync apparently was fixed in a patch between -rc6 and -rc7, which was the code I'm was running. Should I include the las crash trace here?

-----Original Message-----
From: Richard Yao [mailto:[email protected]]
Sent: Wed 4/11/2012 5:15 PM
To: Allgood, Sam
Subject: Re: [zfs] RC8 release - rsync crash (#642)

@tstudios make sure that you have the patches that were committed today.


Reply to this email directly or view it on GitHub:
#642 (comment)

@ryao
Copy link
Contributor

ryao commented Apr 12, 2012

It would not hurt to post the crash trace, but I do suggest updating to the latest code. Your issue might have been fixed in it.

@tstudios
Copy link
Author

The trace is below.

I loaded the system with many rsync streams and watched free -om. Free
did drop down to the 119m range many times but, did free itself.
Eventually, it did not. I don't know if the rsync streams were writing
or doing massive deletes. Currently running about three streams with
their children. A free -om command is holding okay. Any code I pull
needs to be a full tarball that I can make an rpm, or a full release
package. Looks like the rc8.tar.gz is 17 days old. What's your best
recommendation for code patches and rpm builds?

Just today, I was doing a search and read using the key words ARC and
drop_caches. The drop_caches is 0, and I do not have any modprobe file
(under modprobe.d). So, all is default.

INFO: task kswapd0:82 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kswapd0 D ffff88082f828c00 0 82 2 0x00000008
ffff880807ca3ab0 0000000000000046 ffff880807ca3a78 ffff880807ca3a74
ffffea000a1047c0 ffff88082f828c00 ffff880045676980 00000001000848ad
ffff880807ca1a98 ffff880807ca3fd8 0000000000010518 ffff880807ca1a98
Call Trace:
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x23/0x50
[] zfs_zinactive+0x7e/0x110 [zfs]
[] zfs_inactive+0x87/0x200 [zfs]
[] zpl_clear_inode+0xe/0x10 [zfs]
[] clear_inode+0x7e/0x100
[] dispose_list+0x40/0x120
[] shrink_icache_memory+0x274/0x2e0
[] shrink_slab+0x13a/0x1a0
[] balance_pgdat+0x54e/0x770
[] ? isolate_pages_global+0x0/0x380
[] kswapd+0x134/0x390
[] ? autoremove_wake_function+0x0/0x40
[] ? kswapd+0x0/0x390
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
INFO: task khugepaged:84 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
khugepaged D ffff88082f828600 0 84 2 0x00000000
ffff880807cab8b0 0000000000000046 0000000000000000 ffffea000295ef20
ffff880807ca00c8 ffff8800456369f0 0000000000000000 0000000100085cfc
ffff880807ca0638 ffff880807cabfd8 0000000000010518 ffff880807ca0638
Call Trace:
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x23/0x50
[] zfs_zinactive+0x7e/0x110 [zfs]
[] zfs_inactive+0x87/0x200 [zfs]
[] zpl_clear_inode+0xe/0x10 [zfs]
[] clear_inode+0x7e/0x100
[] dispose_list+0x40/0x120
[] shrink_icache_memory+0x274/0x2e0
[] shrink_slab+0x13a/0x1a0
[] do_try_to_free_pages+0x2d6/0x500
[] ? get_page_from_freelist+0x15c/0x820
[] try_to_free_pages+0x9f/0x130
[] ? isolate_pages_global+0x0/0x380
[] __alloc_pages_nodemask+0x3cb/0x820
[] ? try_to_del_timer_sync+0x7b/0xe0
[] alloc_pages_vma+0x93/0x150
[] khugepaged+0xa9b/0x1210
[] ? autoremove_wake_function+0x0/0x40
[] ? khugepaged+0x0/0x1210
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
INFO: task snmpd:6088 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
snmpd D ffffffffffffffff 0 6088 1 0x00000080
ffff8807f6d4d718 0000000000000082 ffffea0002963ef8 ffffea0002963ec0
ffffea0002963e88 ffffea0002963e50 0000000000000000 ffff880806df2100
ffff880806df26b8 ffff8807f6d4dfd8 0000000000010518 ffff880806df26b8
Call Trace:
[] ? dmu_buf_rele+0x30/0x40 [zfs]
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x23/0x50
[] zfs_zinactive+0x7e/0x110 [zfs]
[] zfs_inactive+0x87/0x200 [zfs]
[] zpl_clear_inode+0xe/0x10 [zfs]
[] clear_inode+0x7e/0x100
[] dispose_list+0x40/0x120
[] shrink_icache_memory+0x274/0x2e0
[] shrink_slab+0x13a/0x1a0
[] do_try_to_free_pages+0x2d6/0x500
[] ? get_page_from_freelist+0x15c/0x820
[] try_to_free_pages+0x9f/0x130
[] ? isolate_pages_global+0x0/0x380
[] __alloc_pages_nodemask+0x3cb/0x820
[] ? put_dec+0x10c/0x110
[] alloc_pages_vma+0x93/0x150
[] handle_pte_fault+0x761/0xad0
[] ? kobject_put+0x27/0x60
[] handle_mm_fault+0x1ed/0x2b0
[] do_page_fault+0x11e/0x3a0
[] page_fault+0x25/0x30
[] ? copy_user_generic_string+0x2d/0x40
[] ? seq_read+0x2ae/0x3f0
[] proc_reg_read+0x7e/0xc0
[] vfs_read+0xb5/0x1a0
[] ? audit_syscall_entry+0x272/0x2a0
[] sys_read+0x51/0x90
[] system_call_fastpath+0x16/0x1b
INFO: task smbd:7288 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
smbd D ffffffffffffffff 0 7288 7245 0x00000080
ffff8807e1189a28 0000000000000086 ffff880258468d40 0000000000000009
ffff8800cbd43238 0000000000000000 ffff8807e11899d8 ffffffffa0444e91
ffff88080667fa58 ffff8807e1189fd8 0000000000010518 ffff88080667fa58
Call Trace:
[] ? dmu_buf_will_dirty+0x81/0xd0 [zfs]
[] ? dmu_buf_rele+0x30/0x40 [zfs]
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x23/0x50
[] zfs_zinactive+0x7e/0x110 [zfs]
[] zfs_inactive+0x87/0x200 [zfs]
[] zpl_clear_inode+0xe/0x10 [zfs]
[] clear_inode+0x7e/0x100
[] dispose_list+0x40/0x120
[] shrink_icache_memory+0x274/0x2e0
[] shrink_slab+0x13a/0x1a0
[] do_try_to_free_pages+0x2d6/0x500
[] ? get_page_from_freelist+0x15c/0x820
[] try_to_free_pages+0x9f/0x130
[] ? isolate_pages_global+0x0/0x380
[] __alloc_pages_nodemask+0x3cb/0x820
[] alloc_pages_current+0x9a/0x100
[] __get_free_pages+0xe/0x50
[] sys_getcwd+0x33/0x1c0
[] system_call_fastpath+0x16/0x1b
INFO: task smbd:7290 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
smbd D ffffffffffffffff 0 7290 7245 0x00000080
ffff8807d7657a28 0000000000000082 ffffea0002965078 0000000002965040
ffffea0002965008 ffffea00015fc1c0 0000000000000000 0000000000000001
ffff88080524da58 ffff8807d7657fd8 0000000000010518 ffff88080524da58
Call Trace:
[] ? dmu_buf_rele+0x30/0x40 [zfs]
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x23/0x50
[] zfs_zinactive+0x7e/0x110 [zfs]
[] zfs_inactive+0x87/0x200 [zfs]
[] zpl_clear_inode+0xe/0x10 [zfs]
[] clear_inode+0x7e/0x100
[] dispose_list+0x40/0x120
[] shrink_icache_memory+0x274/0x2e0
[] shrink_slab+0x13a/0x1a0
[] do_try_to_free_pages+0x2d6/0x500
[] ? get_page_from_freelist+0x15c/0x820
[] try_to_free_pages+0x9f/0x130
[] ? isolate_pages_global+0x0/0x380
[] __alloc_pages_nodemask+0x3cb/0x820
[] alloc_pages_current+0x9a/0x100
[] __get_free_pages+0xe/0x50
[] sys_getcwd+0x33/0x1c0
[] system_call_fastpath+0x16/0x1b
INFO: task smbd:7293 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
smbd D ffffffffffffffff 0 7293 7245 0x00000080
ffff8807d764fa28 0000000000000082 0000000000000000 00000000000041fd
0000000000000014 0000000000000066 0000000000000000 0006000100000030
ffff8807e832e678 ffff8807d764ffd8 0000000000010518 ffff8807e832e678
Call Trace:
[] ? dmu_buf_rele+0x30/0x40 [zfs]
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x23/0x50
[] zfs_zinactive+0x7e/0x110 [zfs]
[] zfs_inactive+0x87/0x200 [zfs]
[] zpl_clear_inode+0xe/0x10 [zfs]
[] clear_inode+0x7e/0x100
[] dispose_list+0x40/0x120
[] ? _spin_lock+0x1e/0x30
[] shrink_icache_memory+0x274/0x2e0
[] shrink_slab+0x13a/0x1a0
[] do_try_to_free_pages+0x2d6/0x500
[] ? get_page_from_freelist+0x15c/0x820
[] try_to_free_pages+0x9f/0x130
[] ? isolate_pages_global+0x0/0x380
[] __alloc_pages_nodemask+0x3cb/0x820
[] alloc_pages_current+0x9a/0x100
[] __get_free_pages+0xe/0x50
[] sys_getcwd+0x33/0x1c0
[] system_call_fastpath+0x16/0x1b
INFO: task rsync_tsefx-sys:7551 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
rsync_tsefx-s D ffff88082f828600 0 7551 7550 0x00000080
ffff8807ead672c8 0000000000000086 0000000000000000 0000000000000000
ffffea0001b6abe8 0000000000000000 ffffffff81013c4e 000000010008489b
ffff8807e81a7a98 ffff8807ead67fd8 0000000000010518 ffff8807e81a7a98
Call Trace:
[] ? apic_timer_interrupt+0xe/0x20
[] ? mutex_spin_on_owner+0x8b/0xc0
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x23/0x50
[] zfs_zinactive+0x7e/0x110 [zfs]
[] zfs_inactive+0x87/0x200 [zfs]
[] zpl_clear_inode+0xe/0x10 [zfs]
[] clear_inode+0x7e/0x100
[] dispose_list+0x40/0x120
[] shrink_icache_memory+0x274/0x2e0
[] shrink_slab+0x13a/0x1a0
[] do_try_to_free_pages+0x2d6/0x500
[] ? get_page_from_freelist+0x15c/0x820
[] try_to_free_pages+0x9f/0x130
[] ? isolate_pages_global+0x0/0x380
[] __alloc_pages_nodemask+0x3cb/0x820
[] alloc_pages_current+0x9a/0x100
[] __get_free_pages+0xe/0x50
[] kv_alloc+0x3f/0xc0 [spl]
[] spl_kmem_cache_alloc+0x4fb/0xb90 [spl]
[] ? dbuf_read+0x24b/0x700 [zfs]
[] ? dnode_create+0x139/0x170 [zfs]
[] ? dbuf_rele_and_unlock+0x159/0x210 [zfs]
[] dbuf_create+0x43/0x370 [zfs]
[] ? remove_reference+0xa0/0xc0 [zfs]
[] dbuf_create_bonus+0x26/0x40 [zfs]
[] dmu_bonus_hold+0x1e5/0x260 [zfs]
[] sa_buf_hold+0xe/0x10 [zfs]
[] zfs_zget+0xca/0x1e0 [zfs]
[] ? kmem_alloc_debug+0x26b/0x350 [spl]
[] zfs_dirent_lock+0x481/0x550 [zfs]
[] zfs_dirlook+0x8b/0x270 [zfs]
[] ? tsd_exit+0x5f/0x1c0 [spl]
[] zfs_lookup+0x2fe/0x350 [zfs]
[] zpl_lookup+0x57/0xc0 [zfs]
[] do_lookup+0x18b/0x220
[] __link_path_walk+0x6f5/0x1040
[] ? __link_path_walk+0x729/0x1040
[] path_walk+0x6a/0xe0
[] do_path_lookup+0x5b/0xa0
[] user_path_at+0x57/0xa0
[] ? _atomic_dec_and_lock+0x55/0x80
[] ? cp_new_stat+0xe4/0x100
[] vfs_fstatat+0x3c/0x80
[] vfs_lstat+0x1e/0x20
[] sys_newlstat+0x24/0x50
[] ? audit_syscall_entry+0x272/0x2a0
[] system_call_fastpath+0x16/0x1b
INFO: task rsync_tsefx-hom:7781 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
rsync_tsefx-h D ffff88082f829000 0 7781 7779 0x00000080
ffff8806c1487688 0000000000000082 0000000000000000 00000000000041fd
0000000000000014 0000000000000066 0000000000000000 00000001000849a1
ffff8808066e1068 ffff8806c1487fd8 0000000000010518 ffff8808066e1068
Call Trace:
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x23/0x50
[] zfs_zinactive+0x7e/0x110 [zfs]
[] zfs_inactive+0x87/0x200 [zfs]
[] zpl_clear_inode+0xe/0x10 [zfs]
[] clear_inode+0x7e/0x100
[] dispose_list+0x40/0x120
[] shrink_icache_memory+0x274/0x2e0
[] shrink_slab+0x13a/0x1a0
[] do_try_to_free_pages+0x2d6/0x500
[] ? get_page_from_freelist+0x15c/0x820
[] try_to_free_pages+0x9f/0x130
[] ? isolate_pages_global+0x0/0x380
[] __alloc_pages_nodemask+0x3cb/0x820
[] alloc_pages_current+0x9a/0x100
[] __page_cache_alloc+0x87/0x90
[] __do_page_cache_readahead+0xdb/0x210
[] ra_submit+0x21/0x30
[] ondemand_readahead+0x115/0x240
[] page_cache_sync_readahead+0x33/0x50
[] generic_file_aio_read+0x549/0x720
[] nfs_file_read+0xca/0x130 [nfs]
[] do_sync_read+0xfa/0x140
[] ? autoremove_wake_function+0x0/0x40
[] ? security_file_permission+0x16/0x20
[] vfs_read+0xb5/0x1a0
[] ? audit_syscall_entry+0x272/0x2a0
[] sys_read+0x51/0x90
[] system_call_fastpath+0x16/0x1b
INFO: task rsync_tsefx67-i:8128 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
rsync_tsefx67 D ffff88082f828800 0 8128 8126 0x00000080
ffff8801acc93928 0000000000000086 0000000000000000 0000000000000000
ffff8801acc93980 000000000000000a ffff88036ebb9930 0000000100084952
ffff8808051ea638 ffff8801acc93fd8 0000000000010518 ffff8808051ea638
Call Trace:
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x23/0x50
[] zfs_zget+0xb5/0x1e0 [zfs]
[] ? kmem_alloc_debug+0x26b/0x350 [spl]
[] zfs_dirent_lock+0x481/0x550 [zfs]
[] zfs_dirlook+0x8b/0x270 [zfs]
[] zfs_lookup+0x2fe/0x350 [zfs]
[] zpl_lookup+0x57/0xc0 [zfs]
[] do_lookup+0x18b/0x220
[] __link_path_walk+0x6f5/0x1040
[] ? __link_path_walk+0x729/0x1040
[] path_walk+0x6a/0xe0
[] do_path_lookup+0x5b/0xa0
[] user_path_at+0x57/0xa0
[] ? list_move+0x1f/0x30
[] ? __mark_inode_dirty+0x13f/0x160
[] vfs_fstatat+0x3c/0x80
[] vfs_lstat+0x1e/0x20
[] sys_newlstat+0x24/0x50
[] ? audit_syscall_entry+0x272/0x2a0
[] system_call_fastpath+0x16/0x1b
INFO: task rsync_tscomp291:8361 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
rsync_tscomp2 D ffffffffffffffff 0 8361 8359 0x00000080
ffff8805a48a9268 0000000000000082 ffffea000296bcf8 000000000296bd30
ffffea000296bc88 ffffea000b40abc0 ffffffff81013c4e ffff8805a48a9268
ffff8807e7c3da58 ffff8805a48a9fd8 0000000000010518 ffff8807e7c3da58
Call Trace:
[] ? apic_timer_interrupt+0xe/0x20
[] ? mutex_spin_on_owner+0x9b/0xc0
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x23/0x50
[] zfs_zinactive+0x7e/0x110 [zfs]
[] zfs_inactive+0x87/0x200 [zfs]
[] zpl_clear_inode+0xe/0x10 [zfs]
[] clear_inode+0x7e/0x100
[] dispose_list+0x40/0x120
[] shrink_icache_memory+0x274/0x2e0
[] shrink_slab+0x13a/0x1a0
[] do_try_to_free_pages+0x2d6/0x500
[] ? get_page_from_freelist+0x15c/0x820
[] try_to_free_pages+0x9f/0x130
[] ? isolate_pages_global+0x0/0x380
[] __alloc_pages_nodemask+0x3cb/0x820
[] ? kmem_alloc_debug+0x26b/0x350 [spl]
[] alloc_pages_current+0x9a/0x100
[] __get_free_pages+0xe/0x50
[] kv_alloc+0x3f/0xc0 [spl]
[] spl_kmem_cache_alloc+0x4fb/0xb90 [spl]
[] dnode_create+0x42/0x170 [zfs]
[] dnode_hold_impl+0x3ec/0x550 [zfs]
[] ? remove_reference+0xa0/0xc0 [zfs]
[] dnode_hold+0x19/0x20 [zfs]
[] dmu_bonus_hold+0x34/0x260 [zfs]
[] ? ifind_fast+0x3c/0xa0
[

@tstudios
Copy link
Author

Ahhh... I see zfsonlinux-zfs-0.6.0-rc8-15-gcf81b00.tar.gz when I click
on the download links. I see the download follows the latest commit. I
had apparently forgotten this difference between the web display and
what you actually download. I'll get this and the latest spl.

I do appreciate your looking at the trace. Seems like rsync is one of
the biggest "user app caused" issues.

-----Original Message-----
From: Richard Yao
[mailto:reply+i-4008301-678a9198ecc83f5efc33ea3e27db49c92f464718-930819@
reply.github.com]
Sent: Thursday, April 12, 2012 1:31 PM
To: Allgood, Sam
Subject: Re: [zfs] RC8 release - rsync crash (#642)

It would not hurt to post the crash trace, but I do suggest updating to
the latest code. Your issue might have been fixed in it.


Reply to this email directly or view it on GitHub:
#642 (comment)

@ryao
Copy link
Contributor

ryao commented Apr 13, 2012

@tstudios I had a similar issue with crashes when doing large rsyncs on my server that had 16GB of RAM. It only occurred with the patch in issue #618, but the patch in issue #660 appears to have fixed it.

You should be able to achieve the same result with the current code by setting zfs_arc_max when the module is loaded. I suggest trying that. You could use a size of 1/4 system memory for the sake of using a round number.

@tstudios
Copy link
Author

Yes, thanks. I'm running 4 streams right now on a job that needs to
complete. So, I'll probably set Monday morning.
I choose to let a self-written init script import my pool, start smbd,
and start nfs. Then do the reverse at shutdown. This was because of a
dual boot with Solaris early on. The zpool call in the script probably
loads spl and zfs. Now I suppose a command line modpobe spl and zfs is
in order. Syntax like modprobe spl; modprobe zfs zfs_arc_max=XXX. Is the
XXX value given in bytes or mbytes. I guess modprobe zfs would load spl
though.

-----Original Message-----
From: Richard Yao
[mailto:reply+i-4008301-678a9198ecc83f5efc33ea3e27db49c92f464718-930819@
reply.github.com]
Sent: Friday, April 13, 2012 1:47 PM
To: Allgood, Sam
Subject: Re: [zfs] RC8 release - rsync crash (#642)

@tstudios I had a similar issue with crashes when doing large rsyncs on
my server that had 16GB of RAM. It only occurred with the patch in issue
#618, but the patch in issue #660 appears to have fixed it.

You should be able to achieve the same result with the current code by
setting zfs_arc_max when the module is loaded. I suggest trying that.
You could use a size of 1/4 system memory for the sake of using a round
number.


Reply to this email directly or view it on GitHub:
#642 (comment)

@tstudios
Copy link
Author

The setting: options zfs zfs_arc_max=8589934592 zfs_arc_min=0 in
/etc/modprobe.d/fs.conf appears to be working very well. Let me ask you
about 'tuning' this setting. System has 32G memory and the basic
function for the server is rsync backups. I don't mind committing most
resources to zfs and rsync.

-----Original Message-----
From: Richard Yao
[mailto:reply+i-4008301-678a9198ecc83f5efc33ea3e27db49c92f464718-930819@
reply.github.com]
Sent: Friday, April 13, 2012 1:47 PM
To: Allgood, Sam
Subject: Re: [zfs] RC8 release - rsync crash (#642)

@tstudios I had a similar issue with crashes when doing large rsyncs on
my server that had 16GB of RAM. It only occurred with the patch in issue
#618, but the patch in issue #660 appears to have fixed it.

You should be able to achieve the same result with the current code by
setting zfs_arc_max when the module is loaded. I suggest trying that.
You could use a size of 1/4 system memory for the sake of using a round
number.


Reply to this email directly or view it on GitHub:
#642 (comment)

@ryao
Copy link
Contributor

ryao commented Apr 16, 2012

@tstudios It probably would be okay to set zfs_arc_max to a few hundred megabytes below half of your system memory. The internal fragmentation issue that I describe in issue #660 means that anything higher than this has the potential to cause problems.

@tstudios
Copy link
Author

All my rsyncs ran just fine this morning. However, there was not much
"churn" within the data itself... additions, deletions, etc. It would
seem okay by me to reference your zfs_arc_max settings and the two
previous issues and close this, sinc the rsyncs have run for two days
now.

-----Original Message-----
From: Richard Yao
[mailto:reply+i-4008301-678a9198ecc83f5efc33ea3e27db49c92f464718-930819@
reply.github.com]
Sent: Monday, April 16, 2012 8:58 AM
To: Allgood, Sam
Subject: Re: [zfs] RC8 release - rsync crash (#642)

@tstudios It probably would be okay to set zfs_arc_max to a few hundred
megabytes below half of your system memory. The internal fragmentation
issue that I describe in issue #660 means that anything higher than this
has the potential to cause problems.


Reply to this email directly or view it on GitHub:
#642 (comment)

@ryao
Copy link
Contributor

ryao commented Apr 17, 2012

@tstudios Please keep this open until a commit has been made to the GIT repository to address this issue.

Also, would you try testing with zfs_arc_max set to exactly 1/2 of your system RAM? @behlendorf plans to merge the fix for this into zfsonlinux HEAD, but he would prefer to use 1/2 rather than 1/3. I will not have time to test that until next week.

@tstudios
Copy link
Author

I will do the test. It may be early tomorrow. I still have 1 large new
rsync backup stream running. MemTotal from /proc/meminfo shows
32875948kB. So set it to 32,875,948,000 Bytes / 2 = 16,437,974,000
Bytes. With the value being bytes and not kB, I just want to be sure I
have the value you are seeking.

-----Original Message-----
From: Richard Yao
[mailto:reply+i-4008301-678a9198ecc83f5efc33ea3e27db49c92f464718-930819@
reply.github.com]
Sent: Tuesday, April 17, 2012 7:25 PM
To: Allgood, Sam
Subject: Re: [zfs] RC8 release - rsync crash (#642)

@tstudios Please keep this open until a commit has been made to the GIT
repository to address this issue. Also, would you try testing with
zfs_arc_max set to exactly 1/2 of your system RAM? @behlendorf plans to
merge the fix for this into zfsonlinux HEAD, but he would prefer to use
1/2 rather than 1/3. I will not have time to test that until next week.


Reply to this email directly or view it on GitHub:
#642 (comment)

@ryao
Copy link
Contributor

ryao commented Apr 18, 2012

The value is in bytes.

@tstudios
Copy link
Author

I was looking to see if you wanted mathematical
1024_1024_1024*16=17,179,869,184 Bytes, or 1/2 what /proc/meminfo
MemTotal reports: 32875948kB. That would be 32,875,948,000 Bytes / 2 =
16,437,974,000 Bytes. For 32G of ram, those numbers diverge quite a bit.

-----Original Message-----
From: Richard Yao
[mailto:reply+i-4008301-678a9198ecc83f5efc33ea3e27db49c92f464718-930819@
reply.github.com]
Sent: Wednesday, April 18, 2012 7:35 AM
To: Allgood, Sam
Subject: Re: [zfs] RC8 release - rsync crash (#642)

The value is in bytes.


Reply to this email directly or view it on GitHub:
#642 (comment)

@ryao
Copy link
Contributor

ryao commented Apr 18, 2012

One half of what /proc/meminfo reports.

@tstudios
Copy link
Author

I was not able to reboot this morning after the modprobe.d/zfs.conf
change. A samba user had open and actively changing files. Then it
became too late in the business day. Perhaps Friday morning.

-----Original Message-----
From: Richard Yao
[mailto:reply+i-4008301-678a9198ecc83f5efc33ea3e27db49c92f464718-930819@
reply.github.com]
Sent: Tuesday, April 17, 2012 7:25 PM
To: Allgood, Sam
Subject: Re: [zfs] RC8 release - rsync crash (#642)

@tstudios Please keep this open until a commit has been made to the GIT
repository to address this issue. Also, would you try testing with
zfs_arc_max set to exactly 1/2 of your system RAM? @behlendorf plans to
merge the fix for this into zfsonlinux HEAD, but he would prefer to use
1/2 rather than 1/3. I will not have time to test that until next week.


Reply to this email directly or view it on GitHub:
#642 (comment)

@tstudios
Copy link
Author

Set zfs_arc_max=16437974000, which is 1/2 of what MemTotal reports in
/proc/meminfo. Kicked off all rsyncs. No hangs. Below are several
outputs from free -m commands. These are from random times, not any
planned interval.
[root@tsdpl modules]# free -m
total used free shared buffers
cached
Mem: 32105 31906 199 0 10
8988
-/+ buffers/cache: 22907 9197
Swap: 8197 0 8197
[root@tsdpl modules]# free -m
total used free shared buffers
cached
Mem: 32105 27814 4290 0 10
11018
-/+ buffers/cache: 16785 15319
Swap: 8197 0 8197
[root@tsdpl modules]# free -m
total used free shared buffers
cached
Mem: 32105 29435 2669 0 10
12432
-/+ buffers/cache: 16992 15112
Swap: 8197 0 8197
[root@tsdpl modules]#
[root@tsdpl modules]# free -m
total used free shared buffers
cached
Mem: 32105 30867 1237 0 10
11996
-/+ buffers/cache: 18859 13245
Swap: 8197 0 8197
[root@tsdpl modules]# free -m
total used free shared buffers
cached
Mem: 32105 30752 1352 0 10
12007
-/+ buffers/cache: 18734 13370
Swap: 8197 0 8197
[root@tsdpl modules]# free -m
total used free shared buffers
cached
Mem: 32105 31805 299 0 10
9056
-/+ buffers/cache: 22738 9367
Swap: 8197 0 8197
[root@tsdpl modules]# free -m
total used free shared buffers
cached
Mem: 32105 31787 317 0 10
9054
-/+ buffers/cache: 22721 9383
Swap: 8197 0 8197
[root@tsdpl modules]#
[root@tsdpl modules]# free -m
total used free shared buffers
cached
Mem: 32105 23435 8670 0 11
139
-/+ buffers/cache: 23283 8821
Swap: 8197 0 8197
[root@tsdpl modules]#

-----Original Message-----
From: Richard Yao
[mailto:reply+i-4008301-678a9198ecc83f5efc33ea3e27db49c92f464718-930819@
reply.github.com]
Sent: Wednesday, April 18, 2012 9:13 AM
To: Allgood, Sam
Subject: Re: [zfs] RC8 release - rsync crash (#642)

One half of what /proc/meminfo reports.


Reply to this email directly or view it on GitHub:
#642 (comment)

@behlendorf
Copy link
Contributor

Sounds promising. Thanks for the update, unless I hear otherwise I'm planning to change the default zfs_arc_max value to 1/2 of total system memory when I merge the other VM changes.

@tstudios
Copy link
Author

Excellent! Glad to provide a test bed, as much as I can on a
"production" system.

I should have some 'data churn' over the weekend. That will give my
rsyncs something to "chew" on Monday morning.

The simultaneous launch of 39 streams to backup "client" systems to my
"primary" server this morning worked flawless.

Right now the "backup" server is rsyncing the "primary" server with 4
streams... one for each file system in the 233TB zpool.

behlendorf pushed a commit to behlendorf/zfs that referenced this issue May 21, 2018
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Closes openzfs#642
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
…4ab21b6-28e4-4ed2-81df-1ca843e377ee

QA-37846 zpool_import_014_pos creates poolA but don't destory in cleanup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants