Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARC Sync Up #3533

Closed
wants to merge 7 commits into from
Closed

ARC Sync Up #3533

wants to merge 7 commits into from

Conversation

behlendorf
Copy link
Contributor

The intention of this patch stack is port the remaining Illumos ARC changes to Linux and to reconcile some of the differences which have been gradually introduced. This patch stack is a step in that direction but we also must make an effort to push some of our bug fixes upstream.

Additional changes which need to be made downstream.

  • Slightly different #includes
  • Allow C99'isms
  • ARC prune / DNLC refactoring
  • arc_watch refactoring
  • arc_buf_info refactoring for dbufstat.py
  • arc_memory_throttle() / arc_memory_available() refactoring
  • arc shrinker refactoring

Changes which need to be submitted upstream:

  • Structure definitions pulled out to headers
  • Fixes for compiler warnings
  • Fixes for missing initializers
  • Fixes for spelling mistakes / grammar
  • Change cv_timedwait() to cv_timedwait_sig()
  • Additional ARC kstat entries
  • Remove b_thawed (maybe)
  • Add ->arcs_state for debugging

Unavoidable differences:

  • Different types for module options
  • Different default tuning values
  • Use of spl_fstrans_mark()
  • Added arc_tuning_update()

@behlendorf
Copy link
Contributor Author

@dweeezil if you can spare the time any feedback you can offer on this patch stack would be welcome. In a sense it's a continuation of the work you did in #3115. It still needs some stress testing and additional work but it brings up much closer to in sync with the upstream ARC.

@dweeezil
Copy link
Contributor

@behlendorf I'll be looking over this patch stack and will try testing with it. Speaking of the #3115 patch, you should probably also add the fix outlined by Alek Pinchukin in https://www.illumos.org/issues/6033.

@behlendorf
Copy link
Contributor Author

@dweeezil I'll definitely add illumos 6033 to the stack, thanks for pointing that out. I'll also probably push a few patches which do some of the alluded to refactoring but don't include any functional changes. Lastly I should give you a heads up that Illumos #5376 ended up getting a little bigger than I expected and has some slightly unrelated changes. It may make sense to split up that patch. Thanks again.

@behlendorf
Copy link
Contributor Author

@dweeezil have you observed any issues with these patches? I'd like to make sure this is part of the 0.6.5 tag to bring the ARC back in sync as much as possible.

@dweeezil
Copy link
Contributor

@behlendorf Sorry for not following up earlier, but I'm not going to have a chance to look at these further until the week of July 20th, 2015 as I'm on (a rare) vacation now. I have got my test system set up to test this branch and do plan on running a full battery of tests on it.

@behlendorf
Copy link
Contributor Author

@dweeezil no problem. Enjoy the vacation. I'll try and make sure these patches are in good order by the time you get to them.

@behlendorf behlendorf changed the title ARC Sync Up [WIP] ARC Sync Up Jul 17, 2015
@sempervictus
Copy link
Contributor

Currently building a new stack with these changes (hoping they might address the issue i posted in #3441) and noticed a surprisingly small amount or merge fixes required to run with #3441 & @rdolbeau's raidz parity calculation changes for ABD. Will probably not know for a couple of days if this fixes the strange behavior seen in prior stack, but will report back when i know either way.

@sempervictus
Copy link
Contributor

So far so good in terms of the strangely asymptomatic hang (may take a day or two to manifest), one strange thing that i'm seeing however is that /sys/module/zfs/parameters/zfs_arc_max shows 10737418240 while arcstat claims 6.5/15G on a 32G system, while free and htop report that the entire system is only using 12G, so i think arcstat may be somewhat confused.

@rdolbeau
Copy link
Contributor

to run with #3441 #3441 &
@rdolbeau https://github.com/rdolbeau's raidz parity calculation
changes for ABD

I've updated the branch to pick the fastest implementation, and added
preliminary support for Aarch64 using NEON.

@behlendorf
Copy link
Contributor Author

@sempervictus thanks for the additional testing. It does look like there might be some trouble with setting some of the ARC tuning module options. That code was restructured slightly in this patch stack, I'll look it over again.

Aside from that I'd like to add that the patch stack has held up well for me too over the last 3 days of testing under a fairly abusive workload. So looking good.

@dweeezil
Copy link
Contributor

@behlendorf, I've finally started looking at this. In the spirit of illumos alignment, have you considered setting DBUF_MUTEXES (not directly ARC-related, of course) and/or BUF_LOCKS back to 256. I'll note that neither illumos nor Nexenta have seen fit to increase these and so much has changed since #1291 I'm not sure if they're really applicable any more.

Speaking of which, I've been meaning to port Nexenta's Nexenta/illumos-nexenta@530ed44 to address the large memory scalability issue which I've mentioned in the context of other arc-related issues.

@behlendorf
Copy link
Contributor Author

@dweeezil yes, it definitely crossed by mind. But we increased it specifically because contention was observed there for certain workloads. I'm reluctant to revert it until we have evidence to the contrary or until we just make the entire thing dynamic like it ought to be. I think it would be great if we could take Nexenta's fix and go one step farther with it.

…no_grow

5376 arc_kmem_reap_now() should not result in clearing arc_no_grow
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Steven Hartland <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Approved by: Dan McDonald <[email protected]>

References:
  https://www.illumos.org/issues/5376
  illumos/illumos-gate@2ec99e3

Porting Notes:

The good news is that many of the recent changes made upstream to the
ARC tackled issues previously observed by ZoL with similar solutions.
The bad news is those solution weren't identical to the ones we applied.
This patch is designed to split the difference and apply as much of the
upstream work as possible.

* The arc_available_memory() function was removed previous in ZoL but
due to the upstream changes it makes sense to add it back.  This function
has been customized for Linux so that it can be used to determine a low
memory.  This provides the same basic functionality as the illumos version
allowing us to minimize changes through the rest of the code base.  The
exact mechanism used to detect a low memory state remains unchanged so
this change isn't a significant as it might first appear.

* This patch includes the long standing fix for arc_shrink() which was
originally proposed in openzfs#2167.  Since there were related changes to this
function it made sense to include that work.

* The arc_init() function has been re-factored.  As before it sets sane
default values for the ARC but then calls arc_tuning_update() to apply
user specific tuning made via module options.  The arc_tuning_update()
function is then called periodically by the arc_reclaim_thread() to
apply changes to the tunings made during normal operation.

Ported-by: Brian Behlendorf <[email protected]>
5445 Add more visibility via arcstats; specifically arc_state_t
stats and differentiate between "data" and "metadata"
Reviewed by: Basil Crow <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Bayard Bell <[email protected]>
Approved by: Robert Mustacchi <[email protected]>

References:
  https://www.illumos.org/issues/5445
  illumos/illumos-gate@4076b1b

Porting Notes:

This patch is an improved version of cc7f677 which was previously
merged in ZoL.  This patch incorporates the additional improvements
which were made upstream.

Ported-by: Brian Behlendorf <[email protected]>
5817 change type of arcs_size from uint64_t to refcount_t
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Alex Reece <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Approved by: Garrett D'Amore <[email protected]>

References:
  https://www.illumos.org/issues/5817
  illumos/illumos-gate@2fd872a

Ported-by: Brian Behlendorf <[email protected]>
Commit f521ce1 removed the minimum value for "arc_p" allowing it to
drop to zero or grow to "arc_c".  This was done to improve specific
workload which constantly dirties new "metadata" but also frequently
touches a "small" amount of mfu data (e.g. mkdir's).

This change may still be desirable but it needs to be re-investigated.
in the context of the recent ARC changes from upstream.  Therefore
this code is being restored to facilitate benchmarking.  By setting
"zfs_arc_p_min_shift=64" we easily compare the performance.

Signed-off-by: Brian Behlendorf <[email protected]>
Originally removed because it wasn't required under Linux.  However,
there may still be some utility in signaling the arc reclaim thread
under Linux via reclaim.  This should already have happened by other
means but it's not harmless and reduces another point of divergence
with upstream.

Signed-off-by: Brian Behlendorf <[email protected]>
Commit d962d5d didn't quite properly resolve the HDR_L2ONLY_SIZE
accounting.  Accounting is now performed only in the constructor
and destructor which is a nice simplification.  It should have
been removed the from create and destroy functions.  This brings
up back in sync with upstream.

Signed-off-by: Brian Behlendorf <[email protected]>
Address minor differences in style between upstream and ZoL.  This
patch contains no functional differences and is solely designed to
minimize the delta from upstream.

Signed-off-by: Brian Behlendorf <[email protected]>
@behlendorf
Copy link
Contributor Author

Refreshed against master. This version improved to handing of the various ARC module options in arc_tuning_update(). Specifically it ensures that when those values are changed on a running system reasonable defaults are always applied to any related tunings.

This patch is currently testing very well for me and I haven't observed any of the symptoms described by @sempervictus. I'd like to get this merged this week so if you can provide additional feedback or testing with real workload it would be appreciated.

@sempervictus
Copy link
Contributor

I'll need to rebuild to test the changes you just added, but i do have some thoughts on real workloads:
1 - i've been running an openstack and/or cloudstack from virtualbox atop this build, which creates significant memory pressure on the system as i've killed all swap (running a bunch of services and virtual hosts from inside VMs seemed as mean a thing to do as i could think of given my current task list).
2 - ARC appears to shrink in "real-world" terms under these conditions, but arcstat has taken to smoking crack and reports that the actual arc size is still 15G which isn't possible unless it added RAM to the system while i wasnt looking.
3 - When adding L2ARC to the mix, the usual lookup performance issues (stutters) i've seen on similar setups seem to be gone. Not sure if this patch stack is responsible or the current ABD impl, either way, i'm happy as can be with that.

I'll update my build in the next day or two to incorporate the changes you just introduced, but aside from the weird misreporting on ARC size (which may be caused by other things i'm running in this deep stack anyway), i'd say merge it if nobody else has a show-stopper related to this and get wider testing done. Getting ARC in-line with upstream and allowing things like the nexenta patch to merge atop the consistent code base sounds like a major win to me.

@siebenmann
Copy link
Contributor

I'm now running the as of more or less now version of this pull request on my machine. I'll report back if I observe any problems (which may not be soon, my issues in #3616 usually take several days to show up).

behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Jul 23, 2015
5445 Add more visibility via arcstats; specifically arc_state_t
stats and differentiate between "data" and "metadata"
Reviewed by: Basil Crow <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Bayard Bell <[email protected]>
Approved by: Robert Mustacchi <[email protected]>

References:
  https://www.illumos.org/issues/5445
  illumos/illumos-gate@4076b1b

Porting Notes:

This patch is an improved version of cc7f677 which was previously
merged in ZoL.  This patch incorporates the additional improvements
which were made upstream.

Ported-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Jul 23, 2015
5817 change type of arcs_size from uint64_t to refcount_t
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Alex Reece <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Approved by: Garrett D'Amore <[email protected]>

References:
  https://www.illumos.org/issues/5817
  illumos/illumos-gate@2fd872a

Ported-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
behlendorf added a commit to behlendorf/zfs that referenced this pull request Jul 23, 2015
Commit f521ce1 removed the minimum value for "arc_p" allowing it to
drop to zero or grow to "arc_c".  This was done to improve specific
workload which constantly dirties new "metadata" but also frequently
touches a "small" amount of mfu data (e.g. mkdir's).

This change may still be desirable but it needs to be re-investigated.
in the context of the recent ARC changes from upstream.  Therefore
this code is being restored to facilitate benchmarking.  By setting
"zfs_arc_p_min_shift=64" we easily compare the performance.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
behlendorf added a commit to behlendorf/zfs that referenced this pull request Jul 23, 2015
Originally removed because it wasn't required under Linux.  However,
there may still be some utility in signaling the arc reclaim thread
under Linux via reclaim.  This should already have happened by other
means but it's not harmless and reduces another point of divergence
with upstream.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
behlendorf added a commit to behlendorf/zfs that referenced this pull request Jul 23, 2015
Commit d962d5d didn't quite properly resolve the HDR_L2ONLY_SIZE
accounting.  Accounting is now performed only in the constructor
and destructor which is a nice simplification.  It should have
been removed the from create and destroy functions.  This brings
up back in sync with upstream.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
behlendorf added a commit to behlendorf/zfs that referenced this pull request Jul 23, 2015
Address minor differences in style between upstream and ZoL.  This
patch contains no functional differences and is solely designed to
minimize the delta from upstream.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
@behlendorf
Copy link
Contributor Author

Thanks @sempervictus @siebenmann @dweeezil for the testing and review. I've merged this patch stack to master. After a nearly a week of torture testing I haven't observed any issues and it working quite well. This patch stack also does resolve some issues that were in master like #3616 and #2167.

96c080c Minor style cleanup
3056818 Remove double counting HDR_L2ONLY_SIZE
8c8af9d Add hdr_recl() reclaim callback
728d6ae Reinstate zfs_arc_p_min_shift
36da08e Illumos 5817 - change type of arcs_size from uint64_t to refcount_t
500445c Illumos 5445 - Add more visibility via arcstats
ca67b33 Illumos 5376 - arc_kmem_reap_now() should not result in clearing arc_no_grow

@behlendorf behlendorf closed this Jul 23, 2015
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this pull request Jul 23, 2015
5445 Add more visibility via arcstats; specifically arc_state_t
stats and differentiate between "data" and "metadata"
Reviewed by: Basil Crow <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Bayard Bell <[email protected]>
Approved by: Robert Mustacchi <[email protected]>

References:
  https://www.illumos.org/issues/5445
  illumos/illumos-gate@4076b1b

Porting Notes:

This patch is an improved version of cc7f677 which was previously
merged in ZoL.  This patch incorporates the additional improvements
which were made upstream.

Ported-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this pull request Jul 23, 2015
5817 change type of arcs_size from uint64_t to refcount_t
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Alex Reece <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Approved by: Garrett D'Amore <[email protected]>

References:
  https://www.illumos.org/issues/5817
  illumos/illumos-gate@2fd872a

Ported-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this pull request Jul 23, 2015
Commit f521ce1 removed the minimum value for "arc_p" allowing it to
drop to zero or grow to "arc_c".  This was done to improve specific
workload which constantly dirties new "metadata" but also frequently
touches a "small" amount of mfu data (e.g. mkdir's).

This change may still be desirable but it needs to be re-investigated.
in the context of the recent ARC changes from upstream.  Therefore
this code is being restored to facilitate benchmarking.  By setting
"zfs_arc_p_min_shift=64" we easily compare the performance.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this pull request Jul 23, 2015
Originally removed because it wasn't required under Linux.  However,
there may still be some utility in signaling the arc reclaim thread
under Linux via reclaim.  This should already have happened by other
means but it's not harmless and reduces another point of divergence
with upstream.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this pull request Jul 23, 2015
Commit d962d5d didn't quite properly resolve the HDR_L2ONLY_SIZE
accounting.  Accounting is now performed only in the constructor
and destructor which is a nice simplification.  It should have
been removed the from create and destroy functions.  This brings
up back in sync with upstream.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this pull request Jul 23, 2015
Address minor differences in style between upstream and ZoL.  This
patch contains no functional differences and is solely designed to
minimize the delta from upstream.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
janlam7 pushed a commit to janlam7/zfs that referenced this pull request Jul 25, 2015
5445 Add more visibility via arcstats; specifically arc_state_t
stats and differentiate between "data" and "metadata"
Reviewed by: Basil Crow <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Bayard Bell <[email protected]>
Approved by: Robert Mustacchi <[email protected]>

References:
  https://www.illumos.org/issues/5445
  illumos/illumos-gate@4076b1b

Porting Notes:

This patch is an improved version of cc7f677 which was previously
merged in ZoL.  This patch incorporates the additional improvements
which were made upstream.

Ported-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
janlam7 pushed a commit to janlam7/zfs that referenced this pull request Jul 25, 2015
5817 change type of arcs_size from uint64_t to refcount_t
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Alex Reece <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Approved by: Garrett D'Amore <[email protected]>

References:
  https://www.illumos.org/issues/5817
  illumos/illumos-gate@2fd872a

Ported-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
janlam7 pushed a commit to janlam7/zfs that referenced this pull request Jul 25, 2015
Commit f521ce1 removed the minimum value for "arc_p" allowing it to
drop to zero or grow to "arc_c".  This was done to improve specific
workload which constantly dirties new "metadata" but also frequently
touches a "small" amount of mfu data (e.g. mkdir's).

This change may still be desirable but it needs to be re-investigated.
in the context of the recent ARC changes from upstream.  Therefore
this code is being restored to facilitate benchmarking.  By setting
"zfs_arc_p_min_shift=64" we easily compare the performance.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
janlam7 pushed a commit to janlam7/zfs that referenced this pull request Jul 25, 2015
Originally removed because it wasn't required under Linux.  However,
there may still be some utility in signaling the arc reclaim thread
under Linux via reclaim.  This should already have happened by other
means but it's not harmless and reduces another point of divergence
with upstream.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
janlam7 pushed a commit to janlam7/zfs that referenced this pull request Jul 25, 2015
Commit d962d5d didn't quite properly resolve the HDR_L2ONLY_SIZE
accounting.  Accounting is now performed only in the constructor
and destructor which is a nice simplification.  It should have
been removed the from create and destroy functions.  This brings
up back in sync with upstream.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
janlam7 pushed a commit to janlam7/zfs that referenced this pull request Jul 25, 2015
Address minor differences in style between upstream and ZoL.  This
patch contains no functional differences and is solely designed to
minimize the delta from upstream.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3533
@behlendorf behlendorf deleted the arc-sync-up branch April 19, 2021 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants