-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lock contention on arcs mtx #3481
Lock contention on arcs mtx #3481
Conversation
…loc" This reverts commit 16fcdea in preparation for the illumos 5497 "lock contention on arcs_mtx" patch which eliminates "marker" within the ARC code. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
This reverts commit 037763e in preparation for the illumos 5497 "lock contention on arcs_mtx" patch which includes a fix for this very problem. ZoL had picked up a subset of the illumos 5497 patch to deal with the l2arc compression buffer leak. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
Illumos 5497 "lock contention on arcs_mtx" reworks eviction and obviates the need for this. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
This reverts only the l2arc_hdr part of commit ecf3d9b in preparation for the illumos 5497 "lock contention on arcs_mtx" patch which does the same thing but uses the newer two-level ARC structure following the Illumos 5408 "managing ZFS cache devices requires lots of RAM" patch. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
5369 arc flags should be an enum 5370 consistent arc_buf_hdr_t naming scheme Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Alex Reece <[email protected]> Reviewed by: Sebastien Roy <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Richard Lowe <[email protected]> Porting notes: ZoL has moved some ARC definitions into arc_impl.h. Signed-off-by: Brian Behlendorf <[email protected]> Ported by: Tim Chase <[email protected]>
5408 managing ZFS cache devices requires lots of RAM Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Don Brady <[email protected]> Reviewed by: Josef 'Jeff' Sipek <[email protected]> Approved by: Garrett D'Amore <[email protected]> Porting notes: Due to the restructuring of the ARC-related structures, this patch conflicts with at least the following existing ZoL commits: 6e1d727 Fix inaccurate arcstat_l2_hdr_size calculations The ARC_SPACE_HDRS constant no longer exists and has been somewhat equivalently replaced by HDR_L2ONLY_SIZE. e0b0ca9 Add visibility in to cached dbufs The new layering of l{1,2}arc_buf_hdr_t within the arc_buf_hdr struct requires additional structure member names to be used when referencing the inner items. Also, the presence of L1 or L2 inner member is indicated by flags using the new HDR_HAS_L{1,2}HDR macros. Ported by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
Reviewed by: George Wilson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Dan McDonald <[email protected]> Porting notes and other significant code changes: The illumos 5368 patch (ARC should cache more metadata), which was never picked up by ZoL, is mostly reverted by this patch. Since ZoL relies on the kernel asynchronously calling the shrinker to actually reap memory, the shrinker wakes up arc_reclaim_waiters_cv every time it runs. The arc_adapt_thread() function no longer calls arc_do_user_evicts() since the newly-added arc_user_evicts_thread() calls it periodically. Notable conflicting ZoL commits which conflicted with this patch or whose effects are either duplicated or un-done by this patch: 302f753 - Integrate ARC more tightly with Linux 39e055c - Adjust arc_p based on "bytes" in arc_shrink f521ce1 - Allow "arc_p" to drop to zero or grow to "arc_c" 77765b5 - Remove "arc_meta_used" from arc_adjust calculation 94520ca - Prune metadata from ghost lists in arc_adjust_meta Trace support for multilist_insert() and multilist_remove() has been added and produces the following output: fio-12498 [077] .... 112936.448324: zfs_multilist__insert: ml { offset 240 numsublists 80 sublistidx 63 } fio-12498 [077] .... 112936.448347: zfs_multilist__remove: ml { offset 240 numsublists 80 sublistidx 29 } The following arcstats have been removed: recycle_miss - Used by arcstat.py and arc_summary.py, both of which have been updated appropriately. l2_writes_hdr_miss The following arcstats have been added: evict_not_enough - Number of times arc_evict_state() was unable to evict enough buffers to reach its target amount. evict_l2_skip - Number of times arc_evict_hdr() skipped eviction because it was being written to the l2arc. l2_writes_lock_retry - Replaces l2_writes_hdr_miss. Number of times l2arc_write_done() failed to acquire hash_lock (and re-tries). arc_meta_min - Shows the value of the zfs_arc_meta_min module parameter (see below). The "index" column of the "dbuf" kstat has been removed since it doesn't have a direct analog in the new multilist scheme. Additional multilist- related stats could be added in the future but would likely require extensions to the mulilist API. The following module parameters have been added: zfs_arc_evict_batch_limit - Number of ARC headers to free per sub-list before moving on to the next sub-list. zfs_arc_meta_min - Enforce a floor on the amount of metadata in the ARC. zfs_arc_num_sublists_per_state - Number of multilist sub-lists per ARC state. zfs_arc_overflow_shift - Controls amount by which the ARC must exceed the target size to be considered "overflowing". Ported-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]
SPL commit behlendorf/spl@9cef1b5 adds the taskq_wait_outstanding() interface. See the commit log for the full justification for this addition. This patch adds the required user space counterpart. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]>
Replace taskq_wait() with taskq_wait_oustanding(). This way callers will only block until previously submitted tasks have been completed. This was the previous behavior of task_wait() prior to the introduction of taskq_wait_outstanding() so this isn't really a functionalty change for these callers. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
As described in the comment above arc_adapt_thread() it is critical that the arc_adapt_thread() function never sleep while holding a hash lock. This behavior was possible in the Linux implementation because the arc_prune() logic was implemented to be synchronous. Under illumos the analogous dnlc_reduce_cache() function is asynchronous. To address this the arc_do_user_prune() function is has been reworked in to two new functions as follows: * arc_prune_async() is an asynchronous implementation which dispatches the prune callback to be run by the system taskq. This makes it suitable to use in the context of the arc_adapt_thread(). * arc_prune() is a synchronous implementation which depends on the arc_prune_async() implementation but blocks until the outstanding callbacks complete. This is used in arc_kmem_reap_now() where it is safe, and expected, that memory will be freed. This patch additionally adds the zfs_arc_meta_strategy module option while allows the meta reclaim strategy to be configured. It defaults to a balanced strategy which has been proved to work well under Linux but the illumos meta-only strategy can be enabled. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
ZoL had lowered the minimum ARC size to 4MiB to better accommodate tiny systems such as the raspberry pi, however, as of addition of large block support, the arc_adapt() function depends on arc_c being >= 32MiB (2 * SPA_MAXBLOCKSIZE). This patch raises the minimum ARC size to 32MiB and adds a VERIFY test to arc_adapt() for future-proofing. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
e253e18
to
21fcadc
Compare
…account-for-ashift' into zfs_kOT_11.06.2015 adapted to patchstack from openzfs#3481
In short, what does this all mean in plain english? :) |
Significantly reduced lock contention in the ARC which should translate in to improved cache performance once a couple other bottlenecks are removed. But this was a biggie. |
Ok, thanx. |
Maybe we should have cut a 0.6.5 first, before accepting this… This seems like a big thing, and it will take some time for it to 'percolate' on peoples machines before we can cut a new version. To late now, but something to think about for next time perhaps? |
well, it depends on when you were expecting 0.6.5 ... there is ~ one release per year, so if 0.6.5 is released in february 2016, we will have plenty of time to check this commit :) |
|
I can live with unstable release that been tested over a week without losing data :x |
Right, I'd like to shoot for every 3 months or so but it looks like we might run a bit over because we'd planned to also get the ABD changes in the the next tag too. Which is what I'm going to push to get merged that. That way the next major tag will contain a big chunk of the memory / ARC restructuring which is a big step forward. In the meanwhile we'll definitely keep the release branch up to date with critical fixes. Which means right now we have a few build issues for newer kernels which need to be finalized. |
Technically (according to the schedule), we're only about four, five weeks away from next release… |
@FransUrbo it's definitely ambitious and it likely means the release date will slide a month or two. But this is one of the major items planned for the next tag. Roughly the full remaining planned list looks like this. We could bump ABD to the next tag but I'm inclined for the moment to try and get it finalized and merged. If that turns out to be unworkable we can bump it.
As an aside what do you think about re-enabling the wiki and using it to post this kind of information. |
The finalized patch stack for the ARC mutex lock contention changes.