-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use scatter-gather lists for ARC buffers #75
Labels
Component: Memory Management
kernel memory management
Milestone
Comments
behlendorf
added a commit
to behlendorf/zfs
that referenced
this issue
Dec 5, 2011
In the upstream OpenSolaris ZFS code the maximum ARC usage is limited to 3/4 of memory or all but 1GB, whichever is larger. Because of how Linux's VM subsystem is organized these defaults have proven to be too large which can lead to stability issues. To avoid making everyone manually tune the ARC the defaults are being changed to 1/2 of memory or all but 4GB. The rational for this is as follows: * Desktop Systems (less than 8GB of memory) Limiting the ARC to 1/2 of memory is desirable for desktop systems which have highly dynamic memory requirements. For example, launching your web browser can suddenly result in a demand for several gigabytes of memory. This memory must be reclaimed from the ARC cache which can take some time. The user will experience this reclaim time as a sluggish system with poor interactive performance. Thus in this case it is preferable to leave the memory as free and available for immediate use. * Server Systems (more than 8GB of memory) Using all but 4GB of memory for the ARC is preferable for server systems. These systems often run with minimal user interaction and have long running daemons with relatively stable memory demands. These systems will benefit most by having as much data cached in memory as possible. These values should work well for most configurations. However, if you have a desktop system with more than 8GB of memory you may wish to further restrict the ARC. This can still be accomplished by setting the 'zfs_arc_max' module option. Additionally, keep in mind these aren't currently hard limits. The ARC is based on a slab implementation which can suffer from memory fragmentation. Because this fragmentation is not visible from the ARC it may believe it is within the specified limits while actually consuming slightly more memory. How much more memory get's consumed will be determined by how badly fragmented the slabs are. In the long term this can be mitigated by slab defragmentation code which was OpenSolaris solution. Or preferably, using the page cache to back the ARC under Linux would be even better. See issue openzfs#75 for the benefits of more tightly integrating with the page cache. This change also fixes a issue where the default ARC max was being set incorrectly for machines with less than 2GB of memory. The constant in the arc_c_max comparison must be explicitly cast to a uint64_t type to prevent overflow and the wrong conditional branch being taken. This failure was typically observed in VMs which are commonly created with less than 2GB of memory. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#75
Rudd-O
pushed a commit
to Rudd-O/zfs
that referenced
this issue
Feb 1, 2012
In the upstream OpenSolaris ZFS code the maximum ARC usage is limited to 3/4 of memory or all but 1GB, whichever is larger. Because of how Linux's VM subsystem is organized these defaults have proven to be too large which can lead to stability issues. To avoid making everyone manually tune the ARC the defaults are being changed to 1/2 of memory or all but 4GB. The rational for this is as follows: * Desktop Systems (less than 8GB of memory) Limiting the ARC to 1/2 of memory is desirable for desktop systems which have highly dynamic memory requirements. For example, launching your web browser can suddenly result in a demand for several gigabytes of memory. This memory must be reclaimed from the ARC cache which can take some time. The user will experience this reclaim time as a sluggish system with poor interactive performance. Thus in this case it is preferable to leave the memory as free and available for immediate use. * Server Systems (more than 8GB of memory) Using all but 4GB of memory for the ARC is preferable for server systems. These systems often run with minimal user interaction and have long running daemons with relatively stable memory demands. These systems will benefit most by having as much data cached in memory as possible. These values should work well for most configurations. However, if you have a desktop system with more than 8GB of memory you may wish to further restrict the ARC. This can still be accomplished by setting the 'zfs_arc_max' module option. Additionally, keep in mind these aren't currently hard limits. The ARC is based on a slab implementation which can suffer from memory fragmentation. Because this fragmentation is not visible from the ARC it may believe it is within the specified limits while actually consuming slightly more memory. How much more memory get's consumed will be determined by how badly fragmented the slabs are. In the long term this can be mitigated by slab defragmentation code which was OpenSolaris solution. Or preferably, using the page cache to back the ARC under Linux would be even better. See issue openzfs#75 for the benefits of more tightly integrating with the page cache. This change also fixes a issue where the default ARC max was being set incorrectly for machines with less than 2GB of memory. The constant in the arc_c_max comparison must be explicitly cast to a uint64_t type to prevent overflow and the wrong conditional branch being taken. This failure was typically observed in VMs which are commonly created with less than 2GB of memory. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#75
Closed
Closed
This was referenced Sep 26, 2013
behlendorf
added
Bug - Major
and removed
Type: Feature
Feature request or new feature
labels
Oct 3, 2014
related to #2129 |
and #3441 |
Merged as: 7657def Introduce ARC Buffer Data (ABD) |
ahrens
pushed a commit
to ahrens/zfs
that referenced
this issue
Sep 17, 2019
Signed-off-by: Paul Dagnelie <[email protected]>
sdimitro
pushed a commit
to sdimitro/zfs
that referenced
this issue
Feb 14, 2022
Signed-off-by: Paul Dagnelie <[email protected]>
rkojedzinszky
pushed a commit
to rkojedzinszky/zfs
that referenced
this issue
Mar 7, 2023
Avoid duplicated Actions in TrueNAS ZFS CI Signed-off-by: Umer Saleem <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is a big change but we really need to consider updating the ZFS code to use scatter-gather lists for the ARC buffers instead of vmalloc'ed memory. Using a vmalloc'ed buffer is the way it's done on OpenSolaris but it's less problematic there because they have a more full featured virtual memory management system in the kernel. By design the Linux kernel's VM is primitive for performance reasons. The only reason things are working reasonable well today is that I've implemented a fairly decent virtual slab in the SPL. This is good but it goes against the grain of what should be done and it does cause some problems, such as:
Deadlocks. Because of the way the zio pipeline in designed in ZFS we must be careful to avoid triggering the synchronous memory reclaim path. If one of the zio threads does enter reclaim then it may deadlock on itself by trying to flush dirty pages from say a zvol. This is avoided in most instances by clearing GFP_FS but we can't clear this flag for vmalloc() calls. Unfortunately, we may be forced to vmalloc() a new slab in the zio pipeline for certain workloads such as compression and this we risk deadlocking. Moving to scatter-gather lists would allow us to eliminate this __vmalloc() and potential deadlock.
Avoid serializing on the single Linux VM lock. Because the Linux VM is designed to be lightly used all changes to the virtual address space are serialized through a single lock. The SPL slab does go through some effort to minimizing this impact by allocating slabs of objects but clearly there are scaling concerns here.
VM overhead. In addition to the lock contention there is overhead involved in locating suitable virtual addresses and setting up the mapping from virtual to physical pages. For a CPU hungry filesystem and overhead we can eliminate is worthwhile.
32-bit arch support. This biggest issue with supporting 32-bit arches is they have a very small virtual address range, usually only 100's of MB. By moving all ARC data buffers to scatter gather lists we avoid having to use this limited address range. Instead all data pages can simply reside is the standard address range just like with all other Linux filesystems.
The text was updated successfully, but these errors were encountered: