Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__cv_destroy()) ASSERTION(cvp->cv_mutex == NULL) #943

Closed
smallma opened this issue Sep 7, 2012 · 11 comments
Closed

__cv_destroy()) ASSERTION(cvp->cv_mutex == NULL) #943

smallma opened this issue Sep 7, 2012 · 11 comments
Milestone

Comments

@smallma
Copy link

smallma commented Sep 7, 2012

Hello all,

I got a problem of SPLError as below when there are a lots of I/O. Please help me out. Thank you.

Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.319641] SPLError: 15192:0:(spl-condvar.c:84:__cv_destroy()) ASSERTION(cvp->cv_mutex == NULL) failed
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.319749] SPLError: 15192:0:(spl-condvar.c:84:__cv_destroy()) SPL PANIC
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.319827] SPL: Showing stack for process 15192
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.319830] Pid: 15192, comm: glusterfsd Tainted: P            3.1.10-st0315-1 #2
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.319832] Call Trace:
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.319886]  [] spl_debug_dumpstack+0x27/0x40 [spl]
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.319894]  [] spl_debug_bug+0x82/0xe0 [spl]
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.319903]  [] __cv_destroy+0x31d/0x330 [spl]
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.319912]  [] ? _raw_spin_lock+0xe/0x20
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.319958]  [] zfs_dirent_unlock+0x96/0x260 [zfs]
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320077]  [] zfs_get_xattrdir+0x150/0x180 [zfs]
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320096]  [] ? mutex_lock+0x1d/0x50
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320099]  [] ? mutex_lock+0x1d/0x50
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320128]  [] ? refcount_add_many+0x78/0x110 [zfs]
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320163]  [] ? refcount_add+0x16/0x20 [zfs]
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320205]  [] ? rrw_enter+0xa5/0x340 [zfs]
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320239]  [] zfs_lookup+0x1d5/0x330 [zfs]
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320268]  [] __zpl_xattr_get+0xb5/0x380 [zfs]
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320330]  [] zpl_xattr_get+0x68/0x220 [zfs]
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320354]  [] zpl_xattr_trusted_get+0x76/0xa0 [zfs]
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320358]  [] generic_getxattr+0x53/0x80
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320372]  [] vfs_getxattr+0xa8/0xd0
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320374]  [] getxattr+0xb1/0x170
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320377]  [] ? cp_new_stat+0xf8/0x110
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320400]  [] ? sys_newfstat+0x2a/0x40
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320402]  [] sys_fgetxattr+0x60/0xa0
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320406]  [] system_call_fastpath+0x16/0x1b
Sep  4 11:46:49 Qmulus-60eb69d22812 kernel: : [505970.320559] SPL: Dumping log to /tmp/spl-log.1346730409.15192
@behlendorf
Copy link
Contributor

I've seen this error a few times now, and I thought we already had an issue open on it, but I can't find it now. According to the ASSERT there is a process using the condition variable when it is being destroyed. This is very similar to issue #47 which was fixed.

If your able to consistently reproduce this we can put together a debugging patch. I suspect there is just a very tight race here in the cv implementation we need to fix.

@behlendorf
Copy link
Contributor

@smallma Can you please apply the above debugging patch to the SPL. It should give us a little more insight in to what's going wrong. It also closes one small race so it might fix the issue. We'll have to see.

@smallma
Copy link
Author

smallma commented Sep 10, 2012

Thanks, I'll try it :)

behlendorf added a commit to openzfs/spl that referenced this issue Sep 11, 2012
There still appears to be a race in the condition variables where
->cv_mutex is set after we are woken from the cv_destroy wait queue.
This might be possible when cv_destroy() is called immediately after
cv_broadcast().  We had some troubles with this previously but
there may still be a small race, see commit d599e4f.

The following patch closes one small race and improves the ASSERTs
such that they log the offending value.

Signed-off-by: Brian Behlendorf <[email protected]>
openzfs/zfs#943
@smallma
Copy link
Author

smallma commented Sep 24, 2012

Hello behlendorf,
After working with this patch, my host got the following error. Please help to clarify. Thanks.

Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.654854] INFO: task zfs:27935 blocked for more than 120 seconds.
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.654909] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.654993] zfs D ffff880a3622c890 0 27935 27934 0x00000000
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.654998] ffff880a1cb89c28 0000000000000082 00000000cbcb0f38 0000000000000000
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655003] ffff880a3622c4d0 ffff880a1cb89fd8 ffff880a1cb89fd8 ffff880a1cb89fd8
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655008] ffff880c14330000 ffff880a3622c4d0 0000000000000000 ffff880c16905100
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655012] Call Trace:
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655022] [] schedule+0x3f/0x60
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655027] [] rwsem_down_failed_common+0xcd/0x170
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655031] [] rwsem_down_write_failed+0x13/0x20
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655037] [] call_rwsem_down_write_failed+0x13/0x20
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655041] [] ? down_write+0x32/0x40
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655079] [] dsl_dataset_make_exclusive+0xdb/0x130 [zfs]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655110] [] ? RW_READ_HELD+0x63/0xa0 [zfs]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655140] [] dsl_dataset_destroy+0x31b/0x750 [zfs]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655166] [] dmu_objset_destroy+0x3b/0x50 [zfs]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655205] [] zfs_ioc_destroy+0x38/0x80 [zfs]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655244] [] zfsdev_ioctl+0xdc/0x1b0 [zfs]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655249] [] do_vfs_ioctl+0x98/0x550
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655253] [] ? sys_wait4+0xab/0xf0
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655257] [] sys_ioctl+0x91/0xa0
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655261] [] ? do_device_not_available+0xe/0x10
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655266] [] system_call_fastpath+0x16/0x1b
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655270] INFO: task zfs:27999 blocked for more than 120 seconds.
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655322] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655405] zfs D ffff8806c9f0df80 0 27999 27998 0x00000000
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655410] ffff8809aec65ad8 0000000000000082 ffff880900000000 ffffffff8103ba59
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655414] ffff8806c9f0dbc0 ffff8809aec65fd8 ffff8809aec65fd8 ffff8809aec65fd8
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655419] ffff8806186744d0 ffff8806c9f0dbc0 0000000000000282 00000001d92939f0
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655423] Call Trace:
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655428] [] ? default_spin_lock_flags+0x9/0x10
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655432] [] schedule+0x3f/0x60
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655444] [] cv_wait_common+0xd5/0x330 [spl]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655449] [] ? add_wait_queue+0x60/0x60
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655458] [] __cv_wait+0x13/0x20 [spl]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655488] [] dsl_dataset_hold_ref+0x1c2/0x3d0 [zfs]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655518] [] dsl_dataset_hold_obj+0x2f/0x40 [zfs]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655548] [] dsl_dataset_space_written+0x110/0x300 [zfs]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655579] [] dsl_dataset_stats+0x368/0x3d0 [zfs]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655605] [] dmu_objset_stats+0x26/0xc0 [zfs]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655643] [] zfs_ioc_objset_stats_impl+0x6e/0x120 [zfs]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655669] [] ? dmu_objset_hold+0x44/0x70 [zfs]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655707] [] zfs_ioc_objset_stats+0x39/0x60 [zfs]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655744] [] zfs_ioc_dataset_list_next+0x155/0x200 [zfs]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655783] [] zfsdev_ioctl+0xdc/0x1b0 [zfs]
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655787] [] do_vfs_ioctl+0x98/0x550
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655791] [] sys_ioctl+0x91/0xa0
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655795] [] ? do_device_not_available+0xe/0x10
Sep 20 18:03:57 Qmulus-60eb69d22812 kernel: : [248650.655799] [] system_call_fastpath+0x16/0x1b

@behlendorf
Copy link
Contributor

This is just an advisory warning about a thread which was blocked for longer than usual. It's not harmful.

@smallma
Copy link
Author

smallma commented Sep 25, 2012

Thanks. :)

I used "ps -eo wchan:30,comm,s|grep -w D" command to check why zpool hang up and found many in D as below. Do you know how to solve it? Thanks.

cv_wait_common zfs D
cv_wait_common zfs D
cv_wait_common zfs D
cv_wait_common zfs D
cv_wait_common iscsi_trx D
cv_wait_common zfs D
cv_wait_common swift-object-au D
cv_wait_common swift-object-au D
cv_wait_common zfs D
cv_wait_common zfs D
cv_wait_common zpool D
cv_wait_common LIO_fileio D
cv_wait_common zpool D
cv_wait_common LIO_fileio D
cv_wait_common LIO_fileio D
cv_wait_common LIO_fileio D
cv_wait_common txg_sync D
cv_wait_common zfs D
cv_wait_common zfs D
call_rwsem_down_write_failed zfs D
cv_wait_common zfs D
cv_wait_common zfs D
cv_wait_common zfs D
cv_wait_common zfs D
cv_wait_common zfs D
cv_wait_common zfs D
cv_wait_common zfs D
cv_wait_common zfs D
cv_wait_common iscsi_trx D
cv_wait_common zfs D
cv_wait_common zfs D
cv_wait_common iscsi_trx D
cv_wait_common zfs D
cv_wait_common zfs D
cv_wait_common zfs D
cv_wait_common zfs D

@behlendorf
Copy link
Contributor

If your not hitting the above ASSERT then there doesn't appear to be anything wrong. Does the pool appear to be hung in some way?

@smallma
Copy link
Author

smallma commented Oct 11, 2012

This time I am not hitting ASSERT problem. but there are a lot of warning message occurred. When I execute zfs list or zpool status, the console has got stuck. And I can't write anything to zfs mountpoint. Do you know how to trace it? Thanks.

By the way, I tried this patch and RC11. The zfs and zpool also hung up.

@behlendorf
Copy link
Contributor

@smallma Can you post the contents of /proc/spl/kstat/zfs/dmu_tx when this occurs. And this wouldn't happen to be 32-bit machine would it? We have some patches which might help if it is.

@behlendorf
Copy link
Contributor

@smallma I'm going to tentatively close this issue since 3c60f50 sounds like it resolved the race you were hitting. If that's not the case, or if somebody else has hit this issue with the above patch applied we can reopen it.

@smallma
Copy link
Author

smallma commented Oct 17, 2012

behlendorf, Thanks. I'll keep track :)

pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
…object_agent/async-backtrace-attributes-0.2.6

Bump async-backtrace-attributes from 0.2.5 to 0.2.6 in /cmd/zfs_object_agent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants