-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZFS_OBJ_HOLD_ENTER deadlock #1101
Comments
This is likely the cause of the lockup in #676. |
http://dehacked.2y.net/sizes-4h.png Your workaround you gave in IRC (increase ZFS_OBJ_MTX_SZ in include/sys/zfs_vfs.h to 256) does wonders in regards to fixing issue 676. Normally this system would be hung in 30 minutes. |
Excellent so that's pretty good confirmation this is the root cause of your deadlocks. Increasing the array size sounds like a reasonable short term workaround, but it just makes the issue less likely. It doesn't actually fix it. If you need to you could increase the value further until we have an proper fix. |
Here's the deadlock, it's basically a lock inversion. But now that I have the exact stacks I can put together a patch to prevent it. In the meanwhile increasing ZFS_OBJ_MTX_SZ is a decent work around.
|
@DeHackEd Can you please apply the above patch to the spl. It should resolve the hard deadlock. |
Allowing the spl_cache_grow_work() function to reclaim inodes allows for two unlikely deadlocks. Therefore, we clear __GFP_FS for these allocations. The two deadlocks are: * While holding the ZFS_OBJ_HOLD_ENTER(zsb, obj1) lock a function calls kmem_cache_alloc() which happens to need to allocate a new slab. To allocate the new slab we enter FS level reclaim and attempt to evict several inodes. To evict these inodes we need to take the ZFS_OBJ_HOLD_ENTER(zsb, obj2) lock and it just happens that obj1 and obj2 use the same hashed lock. * Similar to the first case however instead of getting blocked on the hash lock we block in txg_wait_open() which is waiting for the next txg which isn't coming because the txg_sync thread is blocked in kmem_cache_alloc(). Note this isn't a 100% fix because vmalloc() won't strictly honor __GFP_FS. However, it practice this is sufficient because several very unlikely things must all occur concurrently. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs/zfs#1101
Running off the latest GIT masters for both SPL and ZFS I hung the system like so: # free total used free shared buffers cached Mem: 16448180 16109116 339064 0 1520 16212 -/+ buffers/cache: 16091384 356796 Swap: 2097148 157880 1939268 Interestingly it's not hard deadlocked and I can still work around the areas of the disk which are not ZFS filesystems (with choppy performance). After doing killall -9 rsync it unjammed itself. |
@DeHackEd Did you have similar issues with 6c5207088f732168569d1a0b29f5f949b91bb503 applied? |
Due to the slightly increased size of the ZFS super block caused by 30315d2 there are now allocation warnings. The allocation size is still small (just over 8k) and super blocks are rarely allocated so we suppress the warning. Signed-off-by: Brian Behlendorf <[email protected]> Issue #1101
Combining 6c5207088f732168569d1a0b29f5f949b91bb503 with the other patches (SPL and ZFS_OBJ_HOLD_ENTER -> 256) produced a system that under similar load may still exceed its arc_meta_limit but didn't crash. I think stacking all these patches together is outright necessary on my workload. :/ |
@DeHackEd Just to be clear with all of these patches applied your system is stable? (aside from exceeding the limit). If so I'll revisit 6c5207088f732168569d1a0b29f5f949b91bb503 to see if I can be reworked in to something which can be merged. I'd hoped to defer further changes in there area until 0.7.0 but perhaps something minimal can be done. |
Increasing this limit costs us 6144 bytes of memory per mounted filesystem, but this is small price to pay for accomplishing the following: * Allows for up to 256-way concurreny when performing lookups which helps performance when there are a large number of processes. * Minimizes the likelyhood of encountering the deadlock described in issue openzfs#1101. Because vmalloc() won't strictly honor __GFP_FS there is still a very remote chance of a deadlock. See the openzfs/spl@043f9b57 commit. Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#1101
Due to the slightly increased size of the ZFS super block caused by 30315d2 there are now allocation warnings. The allocation size is still small (just over 8k) and super blocks are rarely allocated so we suppress the warning. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#1101
) Bumps [tempfile](https://github.com/Stebalien/tempfile) from 3.7.0 to 3.7.1. - [Changelog](https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md) - [Commits](Stebalien/tempfile@v3.7.0...v3.7.1) --- updated-dependencies: - dependency-name: tempfile dependency-type: indirect update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This appears to be caused by a deadlock on the ZFS_OBJ_HOLD_ENTER() mutex. I was able to recreate it in spl/zfs-0.6.0-rc12 by running 100 concurrent rsyncs of a directory containing 1/4 million files and 1 du.
The text was updated successfully, but these errors were encountered: