Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive memory_throttle_count value on i386 #831

Closed
amospalla opened this issue Jul 14, 2012 · 13 comments
Closed

Excessive memory_throttle_count value on i386 #831

amospalla opened this issue Jul 14, 2012 · 13 comments
Labels
Type: Architecture Indicates an issue is specific to a single processor architecture
Milestone

Comments

@amospalla
Copy link

Gentoo Linux running OpenVZ 042stab057.1 kernel (based upon 2.6.32 Red Hat), arcstats http://pastebin.com/5cbgdxPY.

@behlendorf
Copy link
Contributor

Until we properly abandon vmalloc() there's probably not much we can do about this. 32-bit systems have such a limited virtual address space.

@ryao
Copy link
Contributor

ryao commented Jul 17, 2012

@behlendorf I do not believe that is the case. The following items contradict the idea that the limited virtual address space is the cause:

  1. The arcstats suggest that the virtual address space has not been depleted.
  2. This works on FreeBSD with proper tuning and FreeBSD is subject to the same limitations.

I spent some time on this with @amospalla. He increased vmalloc, zfs_arc_max and zfs_arc_meta_limit, but none of that had any effect. I believe that there is some kind of arithmetic error occurring with 32-bit arithmetic, although I do not currently understand what that is. There were reports that pull request #651 prevents this issue, but I currently do not understand why.

@ryao
Copy link
Contributor

ryao commented Oct 11, 2012

Pull request #1034 might address this issue.

@ryao
Copy link
Contributor

ryao commented Oct 11, 2012

This issue was reproduced on ARM in issue #749 and it has been confirmed that pull request #1034 solves it there (assuming proper vmalloc and ARC tuning).

@ryao
Copy link
Contributor

ryao commented Oct 11, 2012

After talking with @amospalla in IRC, it turns out that pull request #1034 does not completely address his issue. However, adding the following patch to his kernel module does make the excessive throttle issue go away:

diff --git a/module/zfs/arc.c b/module/zfs/arc.c
index 59a3056..402eaaf 100644
--- a/module/zfs/arc.c
+++ b/module/zfs/arc.c
@@ -3567,10 +3567,6 @@ arc_memory_throttle(uint64_t reserve, uint64_t inflight_data, uint64_t txg)

    /* Easily reclaimable memory (free + inactive + arc-evictable) */
    available_memory = ptob(spl_kmem_availrmem()) + arc_evictable_memory();
-#if defined(__i386)
-   available_memory =
-       MIN(available_memory, vmem_size(heap_arena, VMEM_FREE));
-#endif

    if (available_memory <= zfs_write_limit_max) {
        ARCSTAT_INCR(arcstat_memory_throttle_count, 1);

This was only difference between what was being done on @lundman's system and what was being done on @amospalla's system. I provided him with a patch that deleted those lines on the suspicion that it was the problem and a paste of his arcstatus following an rsync involving 2GB of files showed memory_throttle_count to be 0. The implication is that vmem_size(heap_arena, VMEM_FREE) is returning some obscenely small value, causing ARC to throttle indefinitely.

@ryao
Copy link
Contributor

ryao commented Oct 11, 2012

This is surprising:

#define VMALLOC_START   ((unsigned long)high_memory + VMALLOC_OFFSET)
#ifdef CONFIG_X86_PAE
#define LAST_PKMAP 512
#else
#define LAST_PKMAP 1024
#endif

#define PKMAP_BASE ((FIXADDR_BOOT_START - PAGE_SIZE * (LAST_PKMAP + 1)) \
                    & PMD_MASK)

#ifdef CONFIG_HIGHMEM
# define VMALLOC_END    (PKMAP_BASE - 2 * PAGE_SIZE)
#else
# define VMALLOC_END    (FIXADDR_START - 2 * PAGE_SIZE)
#endif
# define VMALLOC_TOTAL          (VMALLOC_END - VMALLOC_START)
size_t
vmem_size(vmem_t *vmp, int typemask)
{
        struct vmalloc_info vmi;
        size_t size = 0;

        ASSERT(vmp == NULL);
        ASSERT(typemask & (VMEM_ALLOC | VMEM_FREE));

        get_vmalloc_info(&vmi);
        if (typemask & VMEM_ALLOC)
                size += (size_t)vmi.used;

        if (typemask & VMEM_FREE)
                size += (size_t)(VMALLOC_TOTAL - vmi.used);

        return size;
}

It looks like the return value of vmem_size(heap_arena, VMEM_FREE) is being set at compile time. This explains why vmalloc did not have any effect. This is also ruins a fairly important calculation in arc_evict_needed() on all architectures:

static int
arc_evict_needed(arc_buf_contents_t type)
{
        if (type == ARC_BUFC_METADATA && arc_meta_used >= arc_meta_limit)
                return (1);

#ifdef _KERNEL
        /*
         * If zio data pages are being allocated out of a separate heap segment,
         * then enforce that the size of available vmem for this area remains
         * above about 1/32nd free.
         */
        if (type == ARC_BUFC_DATA && zio_arena != NULL &&
            vmem_size(zio_arena, VMEM_FREE) <
            (vmem_size(zio_arena, VMEM_ALLOC) >> 5))
                return (1);
#endif

        if (arc_no_grow)
                return (1);

        return (arc_size > arc_c);
}

vmem_size() is only used in ARC and splat, but it needs a rewrite.

@behlendorf
Copy link
Contributor

vmem_size() is only used in ARC and splat, but it needs a rewrite.

Could you be more specific because by my reading of the code vmem_size() is working exactly as intended.

The issue is more that the virtual address space on 32-bit Linux is tiny, tiny, tiny compared to OpenSolaris and FreeBSD. We're talking about 100MB by default on Linux compared to many GBs for the other platforms. This __i386 code was something which I brought over from Solaris unmodified and never seriously focused on since 32-bit systems have never been a high priority.

Now since the expectations about the virtual memory subsystem on Linux are so different than on Solaris it's not to surprising this code is just wrong. Since we have good reason to believe this code is causing problems on the 32-bit Linux system I suggest we just drop it.

#if defined(__i386)
        available_memory =
            MIN(available_memory, vmem_size(heap_arena, VMEM_FREE));
#endif

This hunk should certainly go since almost by definition it's going to be a small value. On 32-bit Linux systems the expectation right now is that almost all this space will be consumed by the ARC.

#ifdef _KERNEL
        /*
         * If zio data pages are being allocated out of a separate heap segment,
         * then enforce that the size of available vmem for this area remains
         * above about 1/32nd free.
         */
        if (type == ARC_BUFC_DATA && zio_arena != NULL &&
            vmem_size(zio_arena, VMEM_FREE) <
            (vmem_size(zio_arena, VMEM_ALLOC) >> 5))
                return (1);
#endif

This is already 100% dead code on Linux for all builds. I never implemented the concept of VM arenas in the SPL layer for Linux so zio_arena is always set to NULL.

        /*
         * On architectures where the physical memory can be larger
         * than the addressable space (intel in 32-bit mode), we may
         * need to limit the cache to 1/8 of VM size.
         */
        arc_c = MIN(arc_c, vmem_size(heap_arena, VMEM_ALLOC | VMEM_FREE) / 8);

For now this still makes sense since the ARC is still largely backed by the virtual address space. We should be careful never to automatically size it larger than this. Once the ARC is backed by the page cache this can also be dropped. At which point we'll be able to remove vmem_size() entirely from the SPL which will be a good thing.

Anyway, I'm proposing @ryao 's original fix for this issue, behlendorf/zfs@cf86aeb with just a little more commentary. Any additional 32-bit testing with this bring would be appreciated.

https://github.com/behlendorf/zfs/commits/issue-831

@lundman
Copy link
Contributor

lundman commented Oct 12, 2012

I went to HEAD, and applied behlendorf@cf86aeb and have no issues with it.

memory_throttle_count           4    0
memory_direct_count             4    3
memory_indirect_count           4    1102

I loaded ZFS with insmod ./module/zfs/zfs.ko zfs_arc_max=41943040 zfs_vdev_cache_size=5242880
and vmalloc=800M

@ryao
Copy link
Contributor

ryao commented Oct 12, 2012

@behlendorf I was wrong to say that the value of vmem_size(zio_arena, VMEM_FREE) was being set at compile time. The value of VMALLOC_TOTAL is set at compile time, even though it can be modified with vmalloc as a kernel commandline parameter. However, it is using a value that was determined at compile time as the total amount of kernel virtual address space.

For instance, arc_c = MIN(arc_c, vmem_size(heap_arena, VMEM_ALLOC | VMEM_FREE) / 8); can be replaced with arc_c = MIN(arc_c, VMALLOC_TOTAL / 8); and it will have the same effect. The value of VMALLOC_TOTAL is something computed by preprocessor definitions. We might not want that.

As for arc_evict_needed(), you are correct that the code that I cited was dead code. I didn't realize that zio_arena was a NULL value when I was looking at it.

@lundman
Copy link
Contributor

lundman commented Oct 12, 2012

I applied #1034 to test on the 32bit ARM, and I did not notice any particular speedup. But then, with previous fixes, the throttle is now at "0" both before and after. The fix does no harm here at least.

before real    3m26.005s
after    real    3m31.933s

@behlendorf
Copy link
Contributor

That's to be expected it was all dead code for ARM. Still this is progress in the right direction. Now -rc12 when tagged should at least build and basically function for ARM. I'm sure there's more tuning which could be done but that's going to be an incremental thing.

@behlendorf
Copy link
Contributor

Thanks for running this one down guys, I've merged it in to master. I'd be very interested to hear what sort of tunings are still needed to run reliably on a 32-bit system.

@ryao
Copy link
Contributor

ryao commented Oct 13, 2012

@behlendorf Issue #749 showed that the following works:

zfs_arc_max=41943040
zfs_vdev_cache_size=5242880
vmalloc=384M

These probably are far from optimal, but they are at least known to be sane.

unya pushed a commit to unya/zfs that referenced this issue Dec 13, 2013
There are currently three vmem_size() consumers all of which are
part of the ARC implemention.  However, since the expected behavior
of the Linux and Solaris virtual memory subsystems are so different
the behavior in each of these instances needs to be reevaluated.

* arc_evict_needed() - This is actually dead code.  Arena support
was never added to the SPL and zio_arena is always NULL.  This
support isn't needed so we simply remove this dead code.

* arc_memory_throttle() - On Solaris where virtual memory constitutes
almost all of the address space we can reasonably expect there to be
a fairly large amount free.  However, on Linux by default we only
have about 100MB total and that's heavily used by the ARC.  So the
expectation on Linux is that this will usually be a small value.
Therefore we remove the vmem_size() check for i386 systems because
the expectation is that it will be less than the zfs_write_limit_max.

* arc_init() - Here vmem_size() is used to initially size the ARC.
Since the ARC is currently backed by the virtual address space it
makes sense to use this as a limit on the ARC for 32-bit systems.
This code can be removed when the ARC is backed by the page cache.

Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#831
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
…zfs#831)

Bumps [linux-raw-sys](https://github.com/sunfishcode/linux-raw-sys) from 0.3.4 to 0.3.6.
- [Release notes](https://github.com/sunfishcode/linux-raw-sys/releases)
- [Commits](sunfishcode/linux-raw-sys@v0.3.4...v0.3.6)

---
updated-dependencies:
- dependency-name: linux-raw-sys
  dependency-type: indirect
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Architecture Indicates an issue is specific to a single processor architecture
Projects
None yet
Development

No branches or pull requests

4 participants