Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugable ARC #10516

Open
zenaan opened this issue Jun 30, 2020 · 7 comments
Open

plugable ARC #10516

zenaan opened this issue Jun 30, 2020 · 7 comments
Labels
Type: Feature Feature request or new feature Type: Performance Performance improvement or performance problem

Comments

@zenaan
Copy link

zenaan commented Jun 30, 2020

ARC and Linux Page Cache duplicate one another re data cached.

ARC has various memory management issues associated with it, due to being a bolt on and not integrated with Linux's Page Cache.

ARC also has performance and tuning issues because of the above.

OpenZFS is 'bolted on' to a number of kernels - IllumOS, BSD, Linux etc.

A first step to rationalizing the ZFS ARC issues, may be to make it plugable/ optional.

This would require at least some thought and clean up to the "ARC" API if we can call it that.

Once the existing API is plugable/optional, this should provide a "code environment" in which experiments are much easier:

  • performance comparisons, ARC vs no ARC

  • API experiments and evolution

  • custom per-OS and/or per-deployment "ARC module"s

  • etc

Other than this, the requirement for OpenZFS to be cross-OS compatible appears to be too burdensome to inspire the effort to improve it, e.g. see here:

http://freebsd.1045724.x6.nabble.com/ZFS-ARC-and-mmap-page-cache-coherency-question-td6110500.html

Jul 06, 2016; 4:40am
Re: ZFS ARC and mmap/page cache coherency question
Lionel Cons

So what Oracle did (based on work done by SUN for Opensolaris) was to:

  1. Modify ZFS to prevent ANY double/multi caching [this is considered a design defect]
  2. Introduce a new VM subsystem which scales a lot better and provides hooks for [1] so there are never two or more copies of the same data in the system

Given that this was a huge, paid, multiyear effort its not likely going to happen that the design defects in opensource ZFS will ever go away.

Lionel

(See usual URLs such as https://pthree.org/2012/12/07/zfs-administration-part-iv-the-adjustable-replacement-cache/)

@ahrens ahrens added Type: Feature Feature request or new feature Type: Performance Performance improvement or performance problem labels Jun 30, 2020
@bghira
Copy link

bghira commented Aug 13, 2020

sounds like a maintenance nightmare. i look forward to your PR.

@snajpa
Copy link
Contributor

snajpa commented Feb 5, 2021

ARC is IMHO the best feature and most important differentiator of OpenZFS from other storage systems. I agree that double caching is bad, but I'd much rather welcome the ability to disable OpenZFS contents to be cached with Linux pagecache. It would make more sense to me, as pagecache has abysmal performance in most multitenant environments I have ever seen, whereas OpenZFS always saves the day and enables those workloads to achieve >90% hitrates >90% of the time (well, honestly, it's more like >98% hitrate >98% of the time, even better). Something pagecache can only dream of :)

I'm just not sure this is possible with current Linux - AFAIK it isn't - so Linux is the place, where it makes the most sense to go and do something this about this issue.

@IsaacVaughn
Copy link

IsaacVaughn commented Feb 6, 2021

I think it would be helpful if this issue could be broken down into smaller components which can actually be addressed instead of an all-or-nothing request to get rid of ARC entirely.

Do we have any performance data confirming that double caching is a problem outside of mmap? I have never seen the page cache using huge amounts of memory on my personal ZFS systems. Some data:
ZFS on root desktop: 11.4G ARC, 896M PC
ZFS data, xfs root server: 3.8G ARC, 4.5G PC

At first glance, it looks like the PC is mostly being filled by non-ZFS filesystems rather than ARC contents. Avoiding extra copies is great, but this issue doesn't make it clear if the issue is widespread or specific to mmap.

There is also a separate problem brought up of the ARC's responsiveness to memory pressure in other parts of the system. From the freebsd issue, it sounds like this could be solved if the kernels provided a means to signal that ARC should be evicted before attempting to page out to swap. e.g. Some sort of way to register "I have xGB of low priority memory, when memory pressure increases tell me to drop some." @snajpa is probably correct; this needs to be implemented in Linux first. It also might already exist, but be hidden behind a GPL export.

Edit: The freebsd issue also mentions per-vdev write buffering, which would be very nice. A less granular per-pool buffer might be more appropriate though. SSD and HDD pools have very different performance characteristics, and NVME is only making the problem worse.

@codyps
Copy link

codyps commented Feb 12, 2021

There is also a separate problem brought up of the ARC's responsiveness to memory pressure in other parts of the system. From the freebsd issue, it sounds like this could be solved if the kernels provided a means to signal that ARC should be evicted before attempting to page out to swap. e.g. Some sort of way to register "I have xGB of low priority memory, when memory pressure increases tell me to drop some." @snajpa is probably correct; this needs to be implemented in Linux first. It also might already exist, but be hidden behind a GPL export.

On linux, zfs already implements the "shrinker" api to support being notified about memory pressure that indicates caches need to shrink. (I suppose It's possible there are issues with that API's usage in the linux kernel or zfs's impl of it)

@ednadolski-ix
Copy link
Contributor

As this is approaching 2 years old, is it (still?) considered to be a feasible ask? (N.B. that the cited work is dated, and was done on Solaris, not Linux or FreeBSD.)

@grahamperrin
Copy link
Contributor

#10516 (comment)

See also:

@zenaan, a hint: if you make bullet points for the linked items, and use the URLs alone, then GitHub might automatically show each item's title and status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests

8 participants