Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NFS based temporary lockups under load #1060

Closed
msmitherdc opened this issue Oct 21, 2012 · 16 comments
Closed

NFS based temporary lockups under load #1060

msmitherdc opened this issue Oct 21, 2012 · 16 comments
Labels
Type: Performance Performance improvement or performance problem

Comments

@msmitherdc
Copy link

on rc11

When this happens, NFS is non responsive but will eventually return in 5-15 minutes. When it does happen, I see messages like below

INFO: task nfsd:7576 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd          D 0000000000000006     0  7576      2 0x00000080
 ffff880c281c5b70 0000000000000046 ffff880c2fb99d80 000000012fb99d80
 ffff880c3fc01500 000412d02fb99d80 ffff880c1b70b600 ffffffffa0330500
 ffff880c201685f8 ffff880c281c5fd8 000000000000f4e8 ffff880c201685f8
Call Trace:
 [<ffffffff81090dfe>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa01c453c>] cv_wait_common+0x9c/0x1a0 [spl]
 [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa01c4673>] __cv_wait+0x13/0x20 [spl]
 [<ffffffffa031b704>] zil_commit+0x74/0x6a0 [zfs]
 [<ffffffffa01c668d>] ? tsd_set+0x32d/0x750 [spl]
 [<ffffffffa03259f0>] ? zpl_open+0x0/0x90 [zfs]
 [<ffffffff814edf3e>] ? mutex_lock+0x1e/0x50
 [<ffffffffa0310557>] zfs_fsync+0x97/0x120 [zfs]
 [<ffffffffa0325906>] zpl_fsync+0x46/0x70 [zfs]
 [<ffffffff811a5431>] vfs_fsync_range+0xa1/0xe0
 [<ffffffff811a54dd>] vfs_fsync+0x1d/0x20
 [<ffffffffa04d7dcb>] nfsd_commit+0x6b/0xa0 [nfsd]
 [<ffffffffa04de15d>] nfsd3_proc_commit+0x9d/0x100 [nfsd]
 [<ffffffffa04d043e>] nfsd_dispatch+0xfe/0x240 [nfsd]
 [<ffffffffa046b5a4>] svc_process_common+0x344/0x640 [sunrpc]
 [<ffffffff8105fa50>] ? default_wake_function+0x0/0x20
 [<ffffffffa046bbe0>] svc_process+0x110/0x160 [sunrpc]
 [<ffffffffa04d0b62>] nfsd+0xc2/0x160 [nfsd]
 [<ffffffffa04d0aa0>] ? nfsd+0x0/0x160 [nfsd]
 [<ffffffff81090886>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff810907f0>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
INFO: task nfsd:7580 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd          D 0000000000000009     0  7580      2 0x00000080
 ffff880c2e8697c0 0000000000000046 ffff880619fd8ee0 ffff880619fd8ed8
 00000000d3dd7692 0000000000000001 0000000000000000 0000000000000000
 ffff880c2b3630f8 ffff880c2e869fd8 000000000000f4e8 ffff880c2b3630f8
Call Trace:
 [<ffffffff81090dfe>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa01c453c>] cv_wait_common+0x9c/0x1a0 [spl]
 [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa01c4673>] __cv_wait+0x13/0x20 [spl]
 [<ffffffffa032151b>] zio_wait+0xeb/0x160 [zfs]
 [<ffffffffa0293c87>] dmu_buf_hold_array_by_dnode+0x237/0x570 [zfs]
 [<ffffffffa0294245>] dmu_buf_hold_array+0x65/0x90 [zfs]
 [<ffffffffa00c29c8>] ? avl_add+0x38/0x50 [zavl]
 [<ffffffffa02942b1>] dmu_read_uio+0x41/0xd0 [zfs]
 [<ffffffff814edf3e>] ? mutex_lock+0x1e/0x50
 [<ffffffffa03124c8>] zfs_read+0x178/0x4c0 [zfs]
 [<ffffffffa0325e32>] zpl_read_common+0x52/0x70 [zfs]
 [<ffffffffa0325eb8>] zpl_read+0x68/0xa0 [zfs]
 [<ffffffff81099945>] ? set_groups+0x25/0x190
 [<ffffffffa0325e50>] ? zpl_read+0x0/0xa0 [zfs]
 [<ffffffff81175259>] do_loop_readv_writev+0x59/0x90
 [<ffffffff8117733e>] do_readv_writev+0x1de/0x1f0
 [<ffffffffa03259f0>] ? zpl_open+0x0/0x90 [zfs]
 [<ffffffffa03259f0>] ? zpl_open+0x0/0x90 [zfs]
 [<ffffffff81173f2f>] ? __dentry_open+0x23f/0x360
 [<ffffffff81177563>] vfs_readv+0x43/0x60
 [<ffffffffa04d65e9>] nfsd_vfs_read+0x89/0x1c0 [nfsd]
 [<ffffffffa04d7a5e>] ? nfsd_open+0x13e/0x210 [nfsd]
 [<ffffffffa04d80c7>] nfsd_read+0x1c7/0x2e0 [nfsd]
 [<ffffffffa04df965>] nfsd3_proc_read+0xd5/0x180 [nfsd]
 [<ffffffffa04d043e>] nfsd_dispatch+0xfe/0x240 [nfsd]
 [<ffffffffa046b5a4>] svc_process_common+0x344/0x640 [sunrpc]
 [<ffffffff8105fa50>] ? default_wake_function+0x0/0x20
 [<ffffffffa046bbe0>] svc_process+0x110/0x160 [sunrpc]
 [<ffffffffa04d0b62>] nfsd+0xc2/0x160 [nfsd]
 [<ffffffffa04d0aa0>] ? nfsd+0x0/0x160 [nfsd]
 [<ffffffff81090886>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff810907f0>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
INFO: task nfsd:7581 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd          D 0000000000000003     0  7581      2 0x00000080
 ffff880c2bd19c60 0000000000000046 ffff880c2bd19c00 ffffffffa04dba46
 0000000000000000 ffffffff814edf3e ffff880c2bf44290 ffffffff8100bc0e
 ffff880c2b229ab8 ffff880c2bd19fd8 000000000000f4e8 ffff880c2b229ab8
Call Trace:
 [<ffffffffa04dba46>] ? nfsd_setuser+0x126/0x2c0 [nfsd]
 [<ffffffff814edf3e>] ? mutex_lock+0x1e/0x50
 [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff814ee0ae>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814edf4b>] mutex_lock+0x2b/0x50
 [<ffffffff811a5420>] vfs_fsync_range+0x90/0xe0
 [<ffffffff811a54dd>] vfs_fsync+0x1d/0x20
 [<ffffffffa04d7dcb>] nfsd_commit+0x6b/0xa0 [nfsd]
 [<ffffffffa04de15d>] nfsd3_proc_commit+0x9d/0x100 [nfsd]
 [<ffffffffa04d043e>] nfsd_dispatch+0xfe/0x240 [nfsd]
 [<ffffffffa046b5a4>] svc_process_common+0x344/0x640 [sunrpc]
 [<ffffffff8105fa50>] ? default_wake_function+0x0/0x20
 [<ffffffffa046bbe0>] svc_process+0x110/0x160 [sunrpc]
 [<ffffffffa04d0b62>] nfsd+0xc2/0x160 [nfsd]
 [<ffffffffa04d0aa0>] ? nfsd+0x0/0x160 [nfsd]
 [<ffffffff81090886>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff810907f0>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
INFO: task nfsd:7582 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd          D 000000000000000c     0  7582      2 0x00000080
 ffff880c203fb9c0 0000000000000046 ffff88061b2fc000 0000000000000202
 ffff880c203fb990 ffffffffa02883fa 0000000000000040 ffff880c203fba4c
 ffff880c2af310b8 ffff880c203fbfd8 000000000000f4e8 ffff880c2af310b8
Call Trace:
 [<ffffffffa02883fa>] ? arc_read+0xca/0x170 [zfs]
 [<ffffffff81090dfe>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa01c453c>] cv_wait_common+0x9c/0x1a0 [spl]
 [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa01c4673>] __cv_wait+0x13/0x20 [spl]
 [<ffffffffa032151b>] zio_wait+0xeb/0x160 [zfs]
 [<ffffffffa0293c87>] dmu_buf_hold_array_by_dnode+0x237/0x570 [zfs]
 [<ffffffffa0294be7>] dmu_read+0x97/0x180 [zfs]
 [<ffffffffa0312334>] zfs_get_data+0x224/0x240 [zfs]
 [<ffffffffa031bb4b>] zil_commit+0x4bb/0x6a0 [zfs]
 [<ffffffff814edf3e>] ? mutex_lock+0x1e/0x50
 [<ffffffffa0310557>] zfs_fsync+0x97/0x120 [zfs]
 [<ffffffffa0325906>] zpl_fsync+0x46/0x70 [zfs]
 [<ffffffff811a5431>] vfs_fsync_range+0xa1/0xe0
 [<ffffffff811a54dd>] vfs_fsync+0x1d/0x20
 [<ffffffffa04d7dcb>] nfsd_commit+0x6b/0xa0 [nfsd]
 [<ffffffffa04de15d>] nfsd3_proc_commit+0x9d/0x100 [nfsd]
 [<ffffffffa04d043e>] nfsd_dispatch+0xfe/0x240 [nfsd]
 [<ffffffffa046b5a4>] svc_process_common+0x344/0x640 [sunrpc]
 [<ffffffff8105fa50>] ? default_wake_function+0x0/0x20
 [<ffffffffa046bbe0>] svc_process+0x110/0x160 [sunrpc]
 [<ffffffffa04d0b62>] nfsd+0xc2/0x160 [nfsd]
 [<ffffffffa04d0aa0>] ? nfsd+0x0/0x160 [nfsd]
 [<ffffffff81090886>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff810907f0>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
INFO: task nfsd:7583 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd          D 000000000000000c     0  7583      2 0x00000080
 ffff880c1f76fb70 0000000000000046 0000000000000000 ffff880c2fb99d80
 ffff880c2c7f5008 ffff880c2fb99d80 ffff880c30acecc0 ffffffffa0330500
 ffff880c2a761038 ffff880c1f76ffd8 000000000000f4e8 ffff880c2a761038
Call Trace:
 [<ffffffffa01c453c>] cv_wait_common+0x9c/0x1a0 [spl]
 [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa01c4673>] __cv_wait+0x13/0x20 [spl]
 [<ffffffffa031b704>] zil_commit+0x74/0x6a0 [zfs]
 [<ffffffffa01c668d>] ? tsd_set+0x32d/0x750 [spl]
 [<ffffffffa03259f0>] ? zpl_open+0x0/0x90 [zfs]
 [<ffffffff814edf3e>] ? mutex_lock+0x1e/0x50
 [<ffffffffa0310557>] zfs_fsync+0x97/0x120 [zfs]
 [<ffffffffa0325906>] zpl_fsync+0x46/0x70 [zfs]
 [<ffffffff811a5431>] vfs_fsync_range+0xa1/0xe0
 [<ffffffff811a54dd>] vfs_fsync+0x1d/0x20
 [<ffffffffa04d7dcb>] nfsd_commit+0x6b/0xa0 [nfsd]
 [<ffffffffa04de15d>] nfsd3_proc_commit+0x9d/0x100 [nfsd]
 [<ffffffffa04d043e>] nfsd_dispatch+0xfe/0x240 [nfsd]
 [<ffffffffa046b5a4>] svc_process_common+0x344/0x640 [sunrpc]
 [<ffffffff8105fa50>] ? default_wake_function+0x0/0x20
 [<ffffffffa046bbe0>] svc_process+0x110/0x160 [sunrpc]
 [<ffffffffa04d0b62>] nfsd+0xc2/0x160 [nfsd]
 [<ffffffffa04d0aa0>] ? nfsd+0x0/0x160 [nfsd]
 [<ffffffff81090886>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff810907f0>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
INFO: task txg_sync:5682 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync      D 0000000000000007     0  5682      2 0x00000000
 ffff880629da9bc0 0000000000000046 ffff880c1cd8eb58 0000000000000086
 0000000000000008 ffff88062dc4a208 0000000000000001 ffff88062dc4a220
 ffff880627c61af8 ffff880629da9fd8 000000000000f4e8 ffff880627c61af8
Call Trace:
 [<ffffffff81090dfe>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa01c453c>] cv_wait_common+0x9c/0x1a0 [spl]
 [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa01c4673>] __cv_wait+0x13/0x20 [spl]
 [<ffffffffa032151b>] zio_wait+0xeb/0x160 [zfs]
 [<ffffffffa02b7629>] dsl_pool_sync+0xc9/0x480 [zfs]
 [<ffffffffa02ca147>] spa_sync+0x397/0x9a0 [zfs]
 [<ffffffffa02dbd21>] txg_sync_thread+0x2c1/0x490 [zfs]
 [<ffffffff810593d9>] ? set_user_nice+0xc9/0x130
 [<ffffffffa02dba60>] ? txg_sync_thread+0x0/0x490 [zfs]
 [<ffffffffa01bd5b8>] thread_generic_wrapper+0x68/0x80 [spl]
 [<ffffffffa01bd550>] ? thread_generic_wrapper+0x0/0x80 [spl]
 [<ffffffff81090886>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff810907f0>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
__ratelimit: 55 callbacks suppressed

Top will look like this

top - 21:12:47 up  3:16,  1 user,  load average: 38.77, 35.33, 23.58
Tasks: 699 total,  29 running, 670 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  49416904k total, 38768812k used, 10648092k free,    30744k buffers
Swap: 10239992k total,        0k used, 10239992k free,   102060k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                
 5290 root      39  19     0    0    0 R 99.4  0.0  22:27.36 z_wr_int/1                                                                              
 5291 root      39  19     0    0    0 R 99.4  0.0  22:32.44 z_wr_int/2                                                                              
 5292 root      39  19     0    0    0 R 99.4  0.0  22:28.59 z_wr_int/3                                                                              
 5293 root      39  19     0    0    0 R 99.4  0.0  22:16.92 z_wr_int/4                                                                              
 5294 root      39  19     0    0    0 R 99.4  0.0  22:48.00 z_wr_int/5                                                                              
 5295 root      39  19     0    0    0 R 99.4  0.0  22:48.67 z_wr_int/6                                                                              
 5296 root      39  19     0    0    0 R 99.4  0.0  22:44.84 z_wr_int/7                                                                              
 5298 root      39  19     0    0    0 R 99.4  0.0  23:00.46 z_wr_int/9                                                                              
 5299 root      39  19     0    0    0 R 99.4  0.0  23:07.29 z_wr_int/10                                                                             
 5300 root      39  19     0    0    0 R 99.4  0.0  22:56.46 z_wr_int/11                                                                             
 5301 root      39  19     0    0    0 R 99.4  0.0  22:54.59 z_wr_int/12                                                                             
 5302 root      39  19     0    0    0 R 99.4  0.0  22:39.59 z_wr_int/13                                                                             
 5304 root      39  19     0    0    0 R 99.4  0.0  22:44.86 z_wr_int/15                                                                             
 5303 root      39  19     0    0    0 R 98.8  0.0  22:45.56 z_wr_int/14                                                                             
 5297 root      39  19     0    0    0 R 96.1  0.0  22:32.09 z_wr_int/8                                                                              
 5243 root      39  19     0    0    0 R 48.6  0.0   9:16.65 z_null_int/0                                                                            
 5289 root      39  19     0    0    0 R 47.9  0.0  14:55.80 z_wr_int/0                                                                              
 4810 root       0 -20     0    0    0 S  2.3  0.0   0:22.69 txg_sync                                                                                
 5908 root       0 -20     0    0    0 S  2.3  0.0   0:44.57 txg_sync  

This is RedHat EL 6.2 - 2.6.32-220.el6.x86_64

@behlendorf
Copy link
Contributor

Can you describe the NFS load which is causing the problem? It appears to be issuing lots of fsyncs() which will probably be slow without a dedicated ZIL device. The only thing which makes me think there might be more going on here is your huge CPU usage. If it was just IOPs bound I wouldn't expect to see that.

We could look in to this more closely if you have a reproducer for the problem.

@msmitherdc
Copy link
Author

The load is an oracle database using NFS storage. 8K recordsize is used for the zvol created.

We have a large number of cores on these systems (80) and a large amount of data (Lidar pointclouds) being written into the database.

When the system goes out of this phase (which seems to last about 5-15 min at a time) , load drops down to a normal level of about 3-5.

@cwedgwood
Copy link
Contributor

#922 again?

@ryao
Copy link
Contributor

ryao commented Nov 1, 2012

What is your pool configuration? Also, which version of ZFSOnLinux is this? Have you tried HEAD?

@behlendorf
Copy link
Contributor

This could be related to #922, @msmitherdc if you get a chance you could try the following patches and see if it improves things.

https://github.com/behlendorf/spl/branches/stats
https://github.com/behlendorf/zfs/branches/stats

@behlendorf
Copy link
Contributor

@msmitherdc The suggested patches have been merged in to master. If you get a chance please update to the latest master source and let us know if it helps.

@msmitherdc
Copy link
Author

The patches seems to have made a difference however the load has been lower. I'll rebuild with master and try to stress it this weekend but it does look promising.

From: Brian Behlendorf <[email protected]mailto:[email protected]>
Reply-To: zfsonlinux/zfs <[email protected]mailto:[email protected]>
Date: Friday, November 9, 2012 10:43 PM
To: zfsonlinux/zfs <[email protected]mailto:[email protected]>
Cc: Michael Smith <[email protected]mailto:[email protected]>
Subject: Re: [zfs] NFS based temporary lockups under load (#1060)

@msmitherdchttps://github.com/msmitherdc The suggested patches have been merged in to master. If you get a chance please update to the latest master source and let us know if it helps.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1060#issuecomment-10251518.

@behlendorf
Copy link
Contributor

Just make sure you use the latest SPL and ZFS

@msmitherdc
Copy link
Author

On sha 4c837f0 and under high load (oracle database with lots of large object write activity)

INFO: task txg_sync:5783 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync D 000000000000000e 0 5783 2 0x00000080
ffff880bf509bbc0 0000000000000046 ffff880c30415358 0000000000000086
0000000000000008 ffff880c2d88f508 0000000000000001 ffff880c2d88f520
ffff880c202710b8 ffff880bf509bfd8 000000000000f4e8 ffff880c202710b8
Call Trace:
[] ? prepare_to_wait_exclusive+0x4e/0x80
[] cv_wait_common+0x9c/0x1a0 [spl]
[] ? zio_execute+0x0/0xf0 [zfs]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[] zio_wait+0xeb/0x160 [zfs]
[] dsl_pool_sync+0xc9/0x480 [zfs]
[] spa_sync+0x397/0x9a0 [zfs]
[] txg_sync_thread+0x2c1/0x490 [zfs]
[] ? set_user_nice+0xc9/0x130
[] ? txg_sync_thread+0x0/0x490 [zfs]
[] thread_generic_wrapper+0x68/0x80 [spl]
[] ? thread_generic_wrapper+0x0/0x80 [spl]
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20


Reply to this email directly or view it on GitHubhttps://github.com//issues/1060#issuecomment-10258925.

@behlendorf
Copy link
Contributor

@msmitherdc The stack shows the txg_sync thread hung in a likely a write IO is it possible it's being starved by a huge number of read IOs in your workload. I assume that eventually the txg sync completes?

@msmitherdc
Copy link
Author

That is correct. It eventually does complete. Are we just over saturating the IO? We were able to maintain this volume under solaris.

From: Brian Behlendorf <[email protected]mailto:[email protected]>
Reply-To: zfsonlinux/zfs <[email protected]mailto:[email protected]>
Date: Tue, 13 Nov 2012 11:49:02 -0800
To: zfsonlinux/zfs <[email protected]mailto:[email protected]>
Cc: Michael Smith <[email protected]mailto:[email protected]>
Subject: Re: [zfs] NFS based temporary lockups under load (#1060)
Resent-From: Michael Smith <[email protected]mailto:[email protected]>

@msmitherdchttps://github.com/msmitherdc The stack shows the txg_sync thread hung in a likely a write IO is it possible it's being starved by a huge number of read IOs in your workload. I assume that eventually the txg sync completes?


Reply to this email directly or view it on GitHubhttps://github.com//issues/1060#issuecomment-10340668.

@behlendorf
Copy link
Contributor

It's hard to say. The fact that your hung in zio_wait() just indicates that you've submitted an IO to the pipeline and are waiting for it to complete. Exactly why it's taking so long would require more digging.

Recently I've seen something very similar to this at LLNL (very very rarely) where ZFS clearly submits the IO to the block layer but it takes 100s of seconds for ZFS to be notified that it completed. In this case I strongly suspect a problem in either the driver, transport layer or SCSI mid layer. Unfortunately, I haven't run it to ground yet.

You could check if something similar is happening on your systems by running 'zpool events -v'. You should see a delay event which was generated for this really slow IO. It will include the total delay time (in miliiseconds) that the IO took after it was handled off to the block layer.

@msmitherdc
Copy link
Author

I do see wait events showing up showing up after a large number of statechange events

Nov 10 2012 17:35:16.795457000 resource.fs.zfs.statechange
version = 0x0
class = "resource.fs.zfs.statechange"
pool_guid = 0xd25168befe23cdb7
vdev_guid = 0xcad01e6094bcaff4
vdev_state = 0x7
time = 0x509ed6a4 0x2f69b5e8

Nov 10 2012 17:35:17.362036000 resource.fs.zfs.statechange
version = 0x0
class = "resource.fs.zfs.statechange"
pool_guid = 0xd25168befe23cdb7
vdev_guid = 0x3a665eec41de9994
vdev_state = 0x7
time = 0x509ed6a5 0x15943b20
...

Nov 10 2012 17:35:17.591634000 ereport.fs.zfs.vdev.unknown
class = "ereport.fs.zfs.vdev.unknown"
ena = 0x244a2095aa603c01
detector = (embedded nvlist)
version = 0x0
scheme = "zfs"
pool = 0xd25168befe23cdb7
vdev = 0xe39497c7e7c674f8
(end detector)
pool = "redo"
pool_guid = 0xd25168befe23cdb7
pool_context = 0x1
pool_failmode = "wait"
vdev_guid = 0xe39497c7e7c674f8
vdev_type = "disk"
vdev_path = "/dev/dsk/c0t5000C50034BB8507d0s0"
vdev_devid = "id1,sd@n5000c50034bb8507/a"
prev_state = 0x1
time = 0x509ed6a5 0x23439e50

From: Brian Behlendorf <[email protected]mailto:[email protected]>
Reply-To: zfsonlinux/zfs <[email protected]mailto:[email protected]>
Date: Tuesday, November 13, 2012 4:39 PM
To: zfsonlinux/zfs <[email protected]mailto:[email protected]>
Cc: Michael Smith <[email protected]mailto:[email protected]>
Subject: Re: [zfs] NFS based temporary lockups under load (#1060)
Resent-From: Michael Smith <[email protected]mailto:[email protected]>

It's hard to say. The fact that your hung in zio_wait() just indicates that you've submitted an IO to the pipeline and are waiting for it to complete. Exactly why it's taking so long would require more digging.

Recently I've seen something very similar to this at LLNL (very very rarely) where ZFS clearly submits the IO to the block layer but it takes 100s of seconds for ZFS to be notified that it completed. In this case I strongly suspect a problem in either the driver, transport layer or SCSI mid layer. Unfortunately, I haven't run it to ground yet.

You could check if something similar is happening on your systems by running 'zpool events -v'. You should see a delay event which was generated for this really slow IO. It will include the total delay time (in miliiseconds) that the IO took after it was handled off to the block layer.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1060#issuecomment-10344886.

@behlendorf
Copy link
Contributor

No good, we're looking for a ereport.fs.zfs.vdev.delay event.

@FransUrbo
Copy link
Contributor

@msmitherdc @behlendorf Considering this is old (almost a year and a half) and against an old/ancient version of ZoL and kernel, should we keep this open or close it as stale?

@msmitherdc
Copy link
Author

close it as stale

@behlendorf behlendorf modified the milestone: 0.6.6 Nov 8, 2014
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
Bumps [syn](https://github.com/dtolnay/syn) from 2.0.26 to 2.0.27.
- [Release notes](https://github.com/dtolnay/syn/releases)
- [Commits](dtolnay/syn@2.0.26...2.0.27)

---
updated-dependencies:
- dependency-name: syn
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests

5 participants