-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid memory allocations in the ARC eviction thread #12985
Conversation
When the eviction thread goes to shrink an ARC state, it allocates a set of marker buffers used to hold its place in the state's sublists. This can be problematic in low memory conditions, since 1) the allocation can be substantial, as we allocate NCPU markers; 2) on at least FreeBSD, page reclamation can block in arc_wait_for_eviction() In particular, in stress tests it's possible to hit a deadlock on FreeBSD when the number of free pages is very low, wherein the system is waiting for the page daemon to reclaim memory, the page daemon is waiting for the ARC eviction thread to finish, and the ARC eviction thread is blocked waiting for more memory. Try to reduce the likelihood of such deadlocks by pre-allocating markers for the eviction thread at ARC initialization time. When evicting buffers from an ARC state, check to see if the current thread is the ARC eviction thread, and use the pre-allocated markers for that purpose rather than dynamically allocating them. Signed-off-by: Mark Johnston <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I agree that memory allocations in evict thread is a bad idea. I don't see any problems in this patch, but generally in perspective I'd really like eviction to be multi-threaded, since single thread is a known bottleneck. That would require some changes to this code, but that may be handled later.
If we can avoid the allocation here, I agree that would be a good thing. On a related note, we may have avoided this particular deadlock on Linux because of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the proposed change, don't see any problems with it, thank you. As mentioned in #12987, it seems to help with the arc_evict and arc_prune contention when there is pressure on the ARC.
I think the larger difference is that FreeBSD doesn't do direct reclaim: if a page allocation fails, the allocating thread must either handle the failure or sleep waiting for system threads (the "page daemon") to reclaim memory. That approach has some advantages - it makes system behaviour in low memory conditions much easier to reason about for one - but it does mean that threads responsible for reclaiming memory have to be disciplined and avoid memory allocations (or pre-allocate memory reserves to avoid deadlocks). This is easy to handle within the page daemon, but gets tricky once the page daemon has to request work from other threads (the ARC eviction thread in this case; system I/O threads if the page daemon is writing dirty pages to swap; etc.). For memory reclamation purposes, FreeBSD doesn't maintain a real distinction between anonymous pages and file-backed pages the way Linux does, so there's no real way to implement |
When the eviction thread goes to shrink an ARC state, it allocates a set of marker buffers used to hold its place in the state's sublists. This can be problematic in low memory conditions, since 1) the allocation can be substantial, as we allocate NCPU markers; 2) on at least FreeBSD, page reclamation can block in arc_wait_for_eviction() In particular, in stress tests it's possible to hit a deadlock on FreeBSD when the number of free pages is very low, wherein the system is waiting for the page daemon to reclaim memory, the page daemon is waiting for the ARC eviction thread to finish, and the ARC eviction thread is blocked waiting for more memory. Try to reduce the likelihood of such deadlocks by pre-allocating markers for the eviction thread at ARC initialization time. When evicting buffers from an ARC state, check to see if the current thread is the ARC eviction thread, and use the pre-allocated markers for that purpose rather than dynamically allocating them. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes openzfs#12985
When the eviction thread goes to shrink an ARC state, it allocates a set of marker buffers used to hold its place in the state's sublists. This can be problematic in low memory conditions, since 1) the allocation can be substantial, as we allocate NCPU markers; 2) on at least FreeBSD, page reclamation can block in arc_wait_for_eviction() In particular, in stress tests it's possible to hit a deadlock on FreeBSD when the number of free pages is very low, wherein the system is waiting for the page daemon to reclaim memory, the page daemon is waiting for the ARC eviction thread to finish, and the ARC eviction thread is blocked waiting for more memory. Try to reduce the likelihood of such deadlocks by pre-allocating markers for the eviction thread at ARC initialization time. When evicting buffers from an ARC state, check to see if the current thread is the ARC eviction thread, and use the pre-allocated markers for that purpose rather than dynamically allocating them. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes #12985
When the eviction thread goes to shrink an ARC state, it allocates a set of marker buffers used to hold its place in the state's sublists. This can be problematic in low memory conditions, since 1) the allocation can be substantial, as we allocate NCPU markers; 2) on at least FreeBSD, page reclamation can block in arc_wait_for_eviction() In particular, in stress tests it's possible to hit a deadlock on FreeBSD when the number of free pages is very low, wherein the system is waiting for the page daemon to reclaim memory, the page daemon is waiting for the ARC eviction thread to finish, and the ARC eviction thread is blocked waiting for more memory. Try to reduce the likelihood of such deadlocks by pre-allocating markers for the eviction thread at ARC initialization time. When evicting buffers from an ARC state, check to see if the current thread is the ARC eviction thread, and use the pre-allocated markers for that purpose rather than dynamically allocating them. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes openzfs#12985
When the eviction thread goes to shrink an ARC state, it allocates a set of marker buffers used to hold its place in the state's sublists. This can be problematic in low memory conditions, since 1) the allocation can be substantial, as we allocate NCPU markers; 2) on at least FreeBSD, page reclamation can block in arc_wait_for_eviction() In particular, in stress tests it's possible to hit a deadlock on FreeBSD when the number of free pages is very low, wherein the system is waiting for the page daemon to reclaim memory, the page daemon is waiting for the ARC eviction thread to finish, and the ARC eviction thread is blocked waiting for more memory. Try to reduce the likelihood of such deadlocks by pre-allocating markers for the eviction thread at ARC initialization time. When evicting buffers from an ARC state, check to see if the current thread is the ARC eviction thread, and use the pre-allocated markers for that purpose rather than dynamically allocating them. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes openzfs#12985
When the eviction thread goes to shrink an ARC state, it allocates a set of marker buffers used to hold its place in the state's sublists. This can be problematic in low memory conditions, since 1) the allocation can be substantial, as we allocate NCPU markers; 2) on at least FreeBSD, page reclamation can block in arc_wait_for_eviction() In particular, in stress tests it's possible to hit a deadlock on FreeBSD when the number of free pages is very low, wherein the system is waiting for the page daemon to reclaim memory, the page daemon is waiting for the ARC eviction thread to finish, and the ARC eviction thread is blocked waiting for more memory. Try to reduce the likelihood of such deadlocks by pre-allocating markers for the eviction thread at ARC initialization time. When evicting buffers from an ARC state, check to see if the current thread is the ARC eviction thread, and use the pre-allocated markers for that purpose rather than dynamically allocating them. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes openzfs#12985
When the eviction thread goes to shrink an ARC state, it allocates a set of marker buffers used to hold its place in the state's sublists. This can be problematic in low memory conditions, since 1) the allocation can be substantial, as we allocate NCPU markers; 2) on at least FreeBSD, page reclamation can block in arc_wait_for_eviction() In particular, in stress tests it's possible to hit a deadlock on FreeBSD when the number of free pages is very low, wherein the system is waiting for the page daemon to reclaim memory, the page daemon is waiting for the ARC eviction thread to finish, and the ARC eviction thread is blocked waiting for more memory. Try to reduce the likelihood of such deadlocks by pre-allocating markers for the eviction thread at ARC initialization time. When evicting buffers from an ARC state, check to see if the current thread is the ARC eviction thread, and use the pre-allocated markers for that purpose rather than dynamically allocating them. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes openzfs#12985
When the eviction thread goes to shrink an ARC state, it allocates a set of marker buffers used to hold its place in the state's sublists. This can be problematic in low memory conditions, since 1) the allocation can be substantial, as we allocate NCPU markers; 2) on at least FreeBSD, page reclamation can block in arc_wait_for_eviction() In particular, in stress tests it's possible to hit a deadlock on FreeBSD when the number of free pages is very low, wherein the system is waiting for the page daemon to reclaim memory, the page daemon is waiting for the ARC eviction thread to finish, and the ARC eviction thread is blocked waiting for more memory. Try to reduce the likelihood of such deadlocks by pre-allocating markers for the eviction thread at ARC initialization time. When evicting buffers from an ARC state, check to see if the current thread is the ARC eviction thread, and use the pre-allocated markers for that purpose rather than dynamically allocating them. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes openzfs#12985
This is not associated with a specific upstream commit but apparently a local diff applied as part of: commit e92ffd9b626833ebdbf2742c8ffddc6cd94b963e Merge: 3c3df3660072 17b2ae0 Author: Martin Matuska <[email protected]> Date: Sat Jan 22 23:05:15 2022 +0100 zfs: merge openzfs/zfs@17b2ae0b2 (master) into main Notable upstream pull request merges: openzfs#12766 Fix error propagation from lzc_send_redacted openzfs#12805 Updated the lz4 decompressor openzfs#12851 FreeBSD: Provide correct file generation number openzfs#12857 Verify dRAID empty sectors openzfs#12874 FreeBSD: Update argument types for VOP_READDIR openzfs#12896 Reduce number of arc_prune threads openzfs#12934 FreeBSD: Fix zvol_*_open() locking openzfs#12947 lz4: Cherrypick fix for CVE-2021-3520 openzfs#12961 FreeBSD: Fix leaked strings in libspl mnttab openzfs#12964 Fix handling of errors from dmu_write_uio_dbuf() on FreeBSD openzfs#12981 Introduce a flag to skip comparing the local mac when raw sending openzfs#12985 Avoid memory allocations in the ARC eviction thread Obtained from: OpenZFS OpenZFS commit: 17b2ae0
This change mitigates a low-memory deadlock observed on FreeBSD, where the ARC eviction thread fails to allocate memory and sleeps, blocking the page reclamation thread.
Motivation and Context
The change removes some memory allocations from the ARC eviction thread, thus helping ensure that arc_wait_for_eviction() won't block a thread that's attempting to reclaim pages on behalf of the rest of the system.
Description
The change modifies arc_init() to pre-allocate a set of marker buffer headers for use as placeholders in arc_evict_state(). When the ARC eviction thread calls arc_evict_state(), it uses the pre-allocated markers rather than performing a blocking memory allocation.
How Has This Been Tested?
I was able to reproduce the deadlock occasionally using parallel pgbench and FreeBSD kernel builds to incur memory pressure. With this patch I haven't been able to trigger any hangs.
Types of changes
Checklist:
Signed-off-by
.