-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iput_async() can deadlock in direct reclaim #3055
Comments
My original theory was not correct. There is code in the Linux VFS to prevent concurrent threads from trying to evict the same inode. kswapd0 is also blocked in the back traces:
It is not 100% clear what happened, but it looks like one thread blocked on What is clear is that allowing direct reclaim inside inode eviction allows for arbitrarily long stack recursion, so my original idea to make sure all allocations under eviction paths are under spl_fstrans_mark() is worth implementing. It might even prevent the deadlock that occurred on the buildbot, but it is not clear to me how that deadlock happened. It would not make sense for two znodes to share the same dbuf and the fact that spinlocks function as barriers on amd64 should have made I_FREEING visible. |
@ryao in the zfs_iput_taskq back trace |
There are regions in the ZFS code where it is desirable to be able to be set PF_FSTRANS while a specific mutex is held. The ZFS code could be updated to set/clear this flag in all the correct places, but this is undesirable for a few reasons. 1) It would require changes to a significant amount of the ZFS code. This would complicate applying patches from upstream. 2) It would be easy to accidentally miss a critical region in the initial patch or to have an future change introduce a new one. Both of these concerns can be addressed by adding a new mutex type which is responsible for managing PF_FSTRANS. This lets us make a a small change to the ZFS source where the mutex is initialized and then be certain that all future use of that mutex will be safe. NOTES: The ht_locks are no longer aligned on 64-byte boundaries. We've never studied if this is actually critical for performance when there are a large number of hash buckets. The dbuf hash has never made this optimization. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#3050 Issue openzfs#3055
There are regions in the ZFS code where it is desirable to be able to be set PF_FSTRANS while a specific mutex is held. The ZFS code could be updated to set/clear this flag in all the correct places, but this is undesirable for a few reasons. 1) It would require changes to a significant amount of the ZFS code. This would complicate applying patches from upstream. 2) It would be easy to accidentally miss a critical region in the initial patch or to have an future change introduce a new one. Both of these concerns can be addressed by adding a new mutex type which is responsible for managing PF_FSTRANS. This lets us make a a small change to the ZFS source where the mutex is initialized and then be certain that all future use of that mutex will be safe. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#3050 Issue openzfs#3055
There are regions in the ZFS code where it is desirable to be able to be set PF_FSTRANS while a specific mutex is held. The ZFS code could be updated to set/clear this flag in all the correct places, but this is undesirable for a few reasons. 1) It would require changes to a significant amount of the ZFS code. This would complicate applying patches from upstream. 2) It would be easy to accidentally miss a critical region in the initial patch or to have an future change introduce a new one. Both of these concerns can be addressed by adding a new mutex type which is responsible for managing PF_FSTRANS. This lets us make a a small change to the ZFS source where the mutex is initialized and then be certain that all future use of that mutex will be safe. NOTES: The ht_locks are no longer aligned on 64-byte boundaries. We've never studied if this is actually critical for performance when there are a large number of hash buckets. The dbuf hash has never made this optimization. Signed-off-by: Brian Behlendorf [email protected] Issue openzfs#3050 Issue openzfs#3055
There are regions in the ZFS code where it is desirable to be able to be set PF_FSTRANS while a specific mutex is held. The ZFS code could be updated to set/clear this flag in all the correct places, but this is undesirable for a few reasons. 1) It would require changes to a significant amount of the ZFS code. This would complicate applying patches from upstream. 2) It would be easy to accidentally miss a critical region in the initial patch or to have an future change introduce a new one. Both of these concerns can be addressed by adding a new mutex type which is responsible for managing PF_FSTRANS. This lets us make a a small change to the ZFS source where the mutex is initialized and then be certain that all future use of that mutex will be safe. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#3050 Issue openzfs#3055 Signed-off-by: Pavel Snajdr <[email protected]>
I spotted the following in the build bot output:
This deadlock is of a similar form to the one in #3050. In specific, we are holding db->db_mtx in iput(), do a memory allocation that triggers direct reclaim and then try to evict the inode that we are evicting. Of couse, that wants db->db_mtx, so we deadlock. spl_fstrans_mark()/spl_fstrans_unmark() could prevent this.
That said, it seems like we need to do some tracing to find all locks taken in direct reclaim paths and make sure that all allocations under those locks are protected by spl_fstrans_mark().
The text was updated successfully, but these errors were encountered: