-
Notifications
You must be signed in to change notification settings - Fork 178
Conversation
@behlendorf Nice catch. But shouldn't |
@tuxoko technically yes but in practice it should safe because |
@behlendorf It needs |
Ack, so it does. OK, I'll refresh this and correct the ordering. |
To prevent taskq_member holding tq_lock and doing linear search, thus causing contention. We store the taskq pointer to which the thread belongs in tsd. This way taskq_member will not need to touch tq_lock, and tsd has per slot spinlock. So the contention should be reduced greatly. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#500 Issue openzfs#504 Issue openzfs#505
967a5fa
to
5ce4a5a
Compare
Refreshed, and requeued for testing with openzfs/zfs#4108. |
There be dragons here, the lurking data corrupting type. Here's the original crash, and the subsequent crashes it caused in uksmd (which only works in userspace memory, so should not have been affected) and some internal cache in the kernel around ecryptfs far as i can tell:
After that the scheduler loses its mind, and everything goes south pretty quick to a hard lock (usually sysrq can reboot, but not always). After reboot, it keeps throwing the following and proceeding to kill the entire system:
Earlier working builds of ZFS do not fix this - there's some damage at the data layer, but scrub says everything is ok in ZFS. Manually pruning caches and sqlite dbs from firefox and other apps which keep things in my home directory seems to stop the issue from resurfacing, but it is a bit of a PITA to lose all the url completions and histories from browsers/editors. C'est la vie, my own fault for using myself as a crash test dummy. Can we add ecryptfs testing to the buildbots? This isnt the first time ecryptfs has shown some strange new error in ZoL and the repairs can get a bit painful, snapshots and all (since mounting a snap doesnt really help access the encrypted data within when the user is logged into the live home dir with the same keys). Dynamic threads were enabled in SPL when this hit, disabled since, but not too eager to test that build until i have some more time to blow up a VM or 10. Anyone else see strange things happening with this patch? |
@sempervictus |
Dmesg doesn't have a precursor to it, i was doing a send/recv over SSH though. |
@sempervictus thanks for the heads up but lets determine if in fact it is this patch. The |
Taking apart my changelogs, i see spl as:
and zfs as:
As you can see, i wasn't kidding about the crashtest piece - it had built and worked fine in a VM for a few hours. Just more proof that gremlins prefer eating real data. I can pull up the branch when i free up a bit and try to hunt down the commit which introduced that message. Thank Git we have revision history (and snapshots). |
@tuxoko , @tuxoko most probably from openzfs/zfs#3830 issues #2217, #3681 - set of commits dealing with zvol__minor_() processing specifically commit: bprotopopov/zfs@3586fa9 zfsonlinux issue #3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() |
up-to-date test results from the buildbots should be available soon via openzfs/zfs#4131 [rebase + buildbot retest, pull #3830] issues #2217, #3681 - set of commits dealing with zvol__minor_() processing |
Did we determine if there's actually a problem with this patch? Or was it from an unrelated commit? |
I don't think the above discussion is related to this pull request. |
no problems observed for now - using in: https://github.com/kernelOfTruth/spl/commits/spl_kOT_08.01.2016 |
Merged, the reported issue is unrelated. 16522ac Use tsd to store tq for taskq_member |
To prevent taskq_member holding tq_lock and doing linear search, thus causing
contention. We store the taskq pointer to which the thread belongs in tsd.
This way taskq_member will not need to touch tq_lock, and tsd has per slot
spinlock. So the contention should be reduced greatly.
Signed-off-by: Chunwei Chen [email protected]
Signed-off-by: Brian Behlendorf [email protected]
Issue #500
Issue #504
Issue #505