Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More adaptive ARC eviction. #14359

Merged
merged 1 commit into from
Mar 8, 2023
Merged

More adaptive ARC eviction. #14359

merged 1 commit into from
Mar 8, 2023

Conversation

amotin
Copy link
Member

@amotin amotin commented Jan 7, 2023

Traditionally ARC adaptation was limited to MRU/MFU distribution. But for years people with metadata-centric workload demanded mechanisms to also manage data/metadata distribution, that in original ZFS was just a FIFO. As result ZFS effectively got separate states for data and metadata, minimum and maximum metadata limits etc, but it all required manual tuning, was not adaptive and in its heart remained a bad FIFO.

This change removes most of existing eviction logic, rewriting it from scratch. This makes MRU/MFU adaptation individual for data and metadata, same as the distribution between data and metadata themselves. Since most of required states separation was already done, it only required to make arcs_size state field specific per data/metadata.

The adaptation logic is still based on previous concept of ghost hits, just now it balances ARC capacity between 4 states: MRU data, MRU metadata, MFU data and MFU metadata. To simplify arc_c changes instead of arc_p measured in bytes, this code uses 3 variable arc_meta, arc_pd and arc_pm, representing ARC balance between metadata and data, MRU and MFU for data, and MRU and MFU for metadata respectively as 32-bit fixed point fractions. Since we care about the math result only when need to evict, this moves all the logic from arc_adapt() to arc_evict(), that reduces per-block overhead, since per-block operations are limited to stats collection, now moved from arc_adapt() to arc_access() and using cheaper wmsums. This also allows to remove ugly ARC_HDR_DO_ADAPT flag from many places.

This change also removes number of metadata specific tunables, part of which were actually not functioning correctly, since not all metadata are equal and some (like L2ARC headers) are not really evictable. Instead it introduced single opaque knob zfs_arc_meta_balance, tuning ARC's reaction on ghost hits, allowing administrator give more or less preference to metadata without setting strict limits.

Some of old code parts like arc_evict_meta() are just removed, because since introduction of ABD ARC they really make no sense: only headers referenced by small number of buffers are not evictable, and they are really not evictable no matter what this code do. Instead just call arc_prune_async() if too much metadata appear not evictable.

How Has This Been Tested?

Manually simulating different access pattern I was able to observe expected arc_meta, arc_pd and arc_pm changes.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@amotin amotin added Status: Code Review Needed Ready for review and testing Status: Design Review Needed Architecture or design is under discussion labels Jan 7, 2023
@amotin amotin force-pushed the arc_evict branch 2 times, most recently from 0feb27f to 310fbc7 Compare January 7, 2023 22:15
@amotin amotin force-pushed the arc_evict branch 3 times, most recently from 8e24992 to 3b9dd0d Compare January 9, 2023 20:16
@adamdmoss
Copy link
Contributor

(sorry for the low-fidelity pic :) ) - I get a panic during import when I test this PR - looks like some incompatibility with L2ARC rebuild - IMG_0129c

@amotin
Copy link
Member Author

amotin commented Jan 18, 2023

I get a panic during import when I test this PR - looks like some incompatibility with L2ARC rebuild - !

@adamdmoss Thank you for the report. Appears I unexpectedly changed persistent L2ARC on-disk format. Added simple shim to fix it.

@adamdmoss
Copy link
Contributor

Verified fixed - thanks!

@devZer0
Copy link

devZer0 commented Feb 1, 2023

@amotin , thanks for making this. i currently try to test this.

i have a question

manpage is telling:

 zfs_arc_meta_balance=500 (uint)
         Balance between metadata and data on ghost hits.  Values above 100 increase metadata caching by proportionally reducing effect of ghost data hits on tar‐get data/metadata rate.

what does a value of "500" exactly mean ? what should is set to maximise arc being used for metadata and avoid metadata eviction? proportional relation to what?

@amotin
Copy link
Member Author

amotin commented Feb 1, 2023

what does a value of "500" exactly mean ? what should is set to maximise arc being used for metadata and avoid metadata eviction? proportional relation to what?

@devZer0 It means data ghost hits cause 5 times smaller metadata cache reduction than metadata ghost hit cause data cache reduction. There is no upper limit. The higher you set it, the smaller pressure will be on metadata. It is not absolute, some metadata will likely be evicted, otherwise there will be no ghost state to indicate pressure, but after some time it should settle at some balance point where data and metadata ghost hits (read "almost cache hit but no") balance each other according to this coefficient. That is the whole point of being adaptive.

Copy link
Contributor

@allanjude allanjude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed-by: Allan Jude <[email protected]>

include/sys/arc_impl.h Show resolved Hide resolved
include/sys/arc_impl.h Show resolved Hide resolved
include/sys/arc_impl.h Show resolved Hide resolved
module/zfs/arc.c Show resolved Hide resolved
module/zfs/arc.c Show resolved Hide resolved
module/zfs/arc.c Show resolved Hide resolved
module/zfs/arc.c Outdated Show resolved Hide resolved
module/zfs/arc.c Outdated Show resolved Hide resolved
module/zfs/arc.c Show resolved Hide resolved
@amotin amotin force-pushed the arc_evict branch 2 times, most recently from df3c3af to 2d948d6 Compare March 2, 2023 15:03
@amotin
Copy link
Member Author

amotin commented Mar 2, 2023

While there I decided to remove unusable spa argument from arc_evict_impl() and reorder remaining more logically.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Mar 2, 2023
@behlendorf
Copy link
Contributor

@ahrens @grwilson I'd like to integrate this ARC change early next week, after a long weekend of stress testing. If you have a chance to look it over before then that would be great.

Traditionally ARC adaptation was limited to MRU/MFU distribution.  But
for years people with metadata-centric workload demanded mechanisms to
also manage data/metadata distribution, that in original ZFS was just
a FIFO.  As result ZFS effectively got separate states for data and
metadata, minimum and maximum metadata limits etc, but it all required
manual tuning, was not adaptive and in its heart remained a bad FIFO.

This change removes most of existing eviction logic, rewriting it from
scratch.  This makes MRU/MFU adaptation individual for data and meta-
data, same as the distribution between data and metadata themselves.
Since most of required states separation was already done, it only
required to make arcs_size state field specific per data/metadata.

The adaptation logic is still based on previous concept of ghost hits,
just now it balances ARC capacity between 4 states: MRU data, MRU
metadata, MFU data and MFU metadata.  To simplify arc_c changes instead
of arc_p measured in bytes, this code uses 3 variable arc_meta, arc_pd
and arc_pm, representing ARC balance between metadata and data, MRU and
MFU for data, and MRU and MFU for metadata respectively as 32-bit fixed
point fractions.  Since we care about the math result only when need to
evict, this moves all the logic from arc_adapt() to arc_evict(), that
reduces per-block overhead, since per-block operations are limited to
stats collection, now moved from arc_adapt() to arc_access() and using
cheaper wmsums.  This also allows to remove ugly ARC_HDR_DO_ADAPT flag
from many places.

This change also removes number of metadata specific tunables, part of
which were actually not functioning correctly, since not all metadata
are equal and some (like L2ARC headers) are not really evictable.
Instead it introduced single opaque knob zfs_arc_meta_balance, tuning
ARC's reaction on ghost hits, allowing administrator give more or less
preference to metadata without setting strict limits.

Some of old code parts like arc_evict_meta() are just removed, because
since introduction of ABD ARC they really make no sense: only headers
referenced by small number of buffers are not evictable, and they are
really not evictable no matter what this code do.  Instead just call
arc_prune_async() if too much metadata appear not evictable.

Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
@amotin
Copy link
Member Author

amotin commented Mar 6, 2023

I've noticed there is no more reason for arcstat_dnode_size to be an aggsum, since now it is read only once per arc_evict(), so I demoted it to cheaper wmsum.

@behlendorf behlendorf merged commit a8d83e2 into openzfs:master Mar 8, 2023
@behlendorf
Copy link
Contributor

Merged. These changes worked as intended in my testing.

@amotin amotin deleted the arc_evict branch March 8, 2023 19:23
mcmilk pushed a commit to mcmilk/zfs that referenced this pull request Mar 13, 2023
Traditionally ARC adaptation was limited to MRU/MFU distribution.  But
for years people with metadata-centric workload demanded mechanisms to
also manage data/metadata distribution, that in original ZFS was just
a FIFO.  As result ZFS effectively got separate states for data and
metadata, minimum and maximum metadata limits etc, but it all required
manual tuning, was not adaptive and in its heart remained a bad FIFO.

This change removes most of existing eviction logic, rewriting it from
scratch.  This makes MRU/MFU adaptation individual for data and meta-
data, same as the distribution between data and metadata themselves.
Since most of required states separation was already done, it only
required to make arcs_size state field specific per data/metadata.

The adaptation logic is still based on previous concept of ghost hits,
just now it balances ARC capacity between 4 states: MRU data, MRU
metadata, MFU data and MFU metadata.  To simplify arc_c changes instead
of arc_p measured in bytes, this code uses 3 variable arc_meta, arc_pd
and arc_pm, representing ARC balance between metadata and data, MRU and
MFU for data, and MRU and MFU for metadata respectively as 32-bit fixed
point fractions.  Since we care about the math result only when need to
evict, this moves all the logic from arc_adapt() to arc_evict(), that
reduces per-block overhead, since per-block operations are limited to
stats collection, now moved from arc_adapt() to arc_access() and using
cheaper wmsums.  This also allows to remove ugly ARC_HDR_DO_ADAPT flag
from many places.

This change also removes number of metadata specific tunables, part of
which were actually not functioning correctly, since not all metadata
are equal and some (like L2ARC headers) are not really evictable.
Instead it introduced single opaque knob zfs_arc_meta_balance, tuning
ARC's reaction on ghost hits, allowing administrator give more or less
preference to metadata without setting strict limits.

Some of old code parts like arc_evict_meta() are just removed, because
since introduction of ABD ARC they really make no sense: only headers
referenced by small number of buffers are not evictable, and they are
really not evictable no matter what this code do.  Instead just call
arc_prune_async() if too much metadata appear not evictable.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Allan Jude <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
Closes openzfs#14359
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Mar 16, 2023
Traditionally ARC adaptation was limited to MRU/MFU distribution.  But
for years people with metadata-centric workload demanded mechanisms to
also manage data/metadata distribution, that in original ZFS was just
a FIFO.  As result ZFS effectively got separate states for data and
metadata, minimum and maximum metadata limits etc, but it all required
manual tuning, was not adaptive and in its heart remained a bad FIFO.

This change removes most of existing eviction logic, rewriting it from
scratch.  This makes MRU/MFU adaptation individual for data and meta-
data, same as the distribution between data and metadata themselves.
Since most of required states separation was already done, it only
required to make arcs_size state field specific per data/metadata.

The adaptation logic is still based on previous concept of ghost hits,
just now it balances ARC capacity between 4 states: MRU data, MRU
metadata, MFU data and MFU metadata.  To simplify arc_c changes instead
of arc_p measured in bytes, this code uses 3 variable arc_meta, arc_pd
and arc_pm, representing ARC balance between metadata and data, MRU and
MFU for data, and MRU and MFU for metadata respectively as 32-bit fixed
point fractions.  Since we care about the math result only when need to
evict, this moves all the logic from arc_adapt() to arc_evict(), that
reduces per-block overhead, since per-block operations are limited to
stats collection, now moved from arc_adapt() to arc_access() and using
cheaper wmsums.  This also allows to remove ugly ARC_HDR_DO_ADAPT flag
from many places.

This change also removes number of metadata specific tunables, part of
which were actually not functioning correctly, since not all metadata
are equal and some (like L2ARC headers) are not really evictable.
Instead it introduced single opaque knob zfs_arc_meta_balance, tuning
ARC's reaction on ghost hits, allowing administrator give more or less
preference to metadata without setting strict limits.

Some of old code parts like arc_evict_meta() are just removed, because
since introduction of ABD ARC they really make no sense: only headers
referenced by small number of buffers are not evictable, and they are
really not evictable no matter what this code do.  Instead just call
arc_prune_async() if too much metadata appear not evictable.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Allan Jude <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
Closes openzfs#14359
@nasbdh9
Copy link

nasbdh9 commented Mar 20, 2023

Are these changes expected to be ported to 2.1.10?

@amotin
Copy link
Member Author

amotin commented Mar 20, 2023

Are these changes expected to be ported to 2.1.10?

No. Same as few other ARC refactoring PRs of mine it will stay in 2.2. Those are quite a big and invasive change for a minor release.

behlendorf added a commit that referenced this pull request Jun 30, 2023
New features:
- Fully adaptive ARC eviction (#14359)
- Block cloning (#13392)
- Scrub error log (#12812, #12355)
- Linux container support (#14070, #14097, #12263)
- BLAKE3 Checksums (#12918)
- Corrective "zfs receive" (#9372)

Signed-off-by: Brian Behlendorf <[email protected]>
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this pull request Sep 26, 2023
Traditionally ARC adaptation was limited to MRU/MFU distribution.  But
for years people with metadata-centric workload demanded mechanisms to
also manage data/metadata distribution, that in original ZFS was just
a FIFO.  As result ZFS effectively got separate states for data and
metadata, minimum and maximum metadata limits etc, but it all required
manual tuning, was not adaptive and in its heart remained a bad FIFO.

This change removes most of existing eviction logic, rewriting it from
scratch.  This makes MRU/MFU adaptation individual for data and meta-
data, same as the distribution between data and metadata themselves.
Since most of required states separation was already done, it only
required to make arcs_size state field specific per data/metadata.

The adaptation logic is still based on previous concept of ghost hits,
just now it balances ARC capacity between 4 states: MRU data, MRU
metadata, MFU data and MFU metadata.  To simplify arc_c changes instead
of arc_p measured in bytes, this code uses 3 variable arc_meta, arc_pd
and arc_pm, representing ARC balance between metadata and data, MRU and
MFU for data, and MRU and MFU for metadata respectively as 32-bit fixed
point fractions.  Since we care about the math result only when need to
evict, this moves all the logic from arc_adapt() to arc_evict(), that
reduces per-block overhead, since per-block operations are limited to
stats collection, now moved from arc_adapt() to arc_access() and using
cheaper wmsums.  This also allows to remove ugly ARC_HDR_DO_ADAPT flag
from many places.

This change also removes number of metadata specific tunables, part of
which were actually not functioning correctly, since not all metadata
are equal and some (like L2ARC headers) are not really evictable.
Instead it introduced single opaque knob zfs_arc_meta_balance, tuning
ARC's reaction on ghost hits, allowing administrator give more or less
preference to metadata without setting strict limits.

Some of old code parts like arc_evict_meta() are just removed, because
since introduction of ABD ARC they really make no sense: only headers
referenced by small number of buffers are not evictable, and they are
really not evictable no matter what this code do.  Instead just call
arc_prune_async() if too much metadata appear not evictable.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Allan Jude <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
Closes openzfs#14359
behlendorf added a commit that referenced this pull request Oct 13, 2023
New Features
- Block cloning (#13392)
- Linux container support (#14070, #14097, #12263)
- Scrub error log (#12812, #12355)
- BLAKE3 checksums (#12918)
- Corrective "zfs receive"
- Vdev and zpool user properties

Performance
- Fully adaptive ARC (#14359)
- SHA2 checksums (#13741)
- Edon-R checksums (#13618)
- Zstd early abort (#13244)
- Prefetch improvements (#14603, #14516, #14402, #14243, #13452)
- General optimization (#14121, #14123, #14039, #13680, #13613,
  #13606, #13576, #13553, #12789, #14925, #14948)

Signed-off-by: Brian Behlendorf <[email protected]>
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
New features:
- Fully adaptive ARC eviction (openzfs#14359)
- Block cloning (openzfs#13392)
- Scrub error log (openzfs#12812, openzfs#12355)
- Linux container support (openzfs#14070, openzfs#14097, openzfs#12263)
- BLAKE3 Checksums (openzfs#12918)
- Corrective "zfs receive" (openzfs#9372)

Signed-off-by: Brian Behlendorf <[email protected]>
@gertvdijk
Copy link

Hi @amotin thanks again so much for your work on this one. My tests on 2.2 (2.2.6) show a much more stable ARC size and use compared to the problematic prune storms on 2.1.x reported here: #9966 (comment).

I did notice that there is some residue in the code and docs. If I understand correctly, the module parameter zfs_arc_meta_strategy is removed in 2.2, but the arc_strategy enum is still in the header file include/sys/arc.h; is that intentional or can this be removed?

zfs/include/sys/arc.h

Lines 110 to 113 in e0039c7

typedef enum arc_strategy {
ARC_STRATEGY_META_ONLY = 0, /* Evict only meta data buffers */
ARC_STRATEGY_META_BALANCED = 1, /* Evict data buffers if needed */
} arc_strategy_t;

Also in openzfs-docs repo this module parameter is still mentioned for tuning - should I go and fix that with a note for 2.2+? 😃
https://github.com/openzfs/openzfs-docs/blob/2df53a3b8594b8663257dce1f4032f71f6880006/docs/Performance%20and%20Tuning/Module%20Parameters.rst#L2572-L2600

@amotin amotin removed the Status: Design Review Needed Architecture or design is under discussion label Dec 11, 2024
@amotin
Copy link
Member Author

amotin commented Dec 11, 2024

@gertvdijk You are right, it should be removed. Would you like to create a PR, or would prefer me to? About the openzfs-docs I have no idea, never touched it. I guess it could benefit from PR also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants