-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARC size "exploding" beyond set limit upon heavy NFS traffic, triggering OOM and killing system #12354
Comments
If you look at the arcstats, what is the size of the 'other' portion of the ARC? It may be that you have a very large amount of dirty data, because your demand read rate is so high, and the writes are never having a chance to flush, and it is building up dirty data. The write throttle should eventually slow the incoming rate of writes to resolve this, but, it is something to look at. |
With ZFS+NFS I have always had to limit ARC to far below the actual system memory, to account for ZFS taking up RAM from unknown slab allocations which it never releases. |
Possible related, #10302 |
Does 'other' refer to another "column" from the I am currently creating test data on an otherwise unused NFS box to perform some tests there without interfering with other users' jobs. This will take a bit to be set-up but hopefully I will be able to gather some data from |
Yeah, could be. As |
The stat I was looking for was the |
So far, all my attempts to reproduce the problem failed. I know the user had a directory with about 2 TByte of tarballs on his NFS share and ech tarball contained a few 1000 entries of either small uncompressed text files or much larger gzipped files, i.e. the individual files were compressed, the outer Each client only accessed a single tarball, opened it, scanned it for all members and selected all of these based on the file names. Then those file names were sorted and each file was read in turn from the tarball, uncompressed to the local file system and then merged into a HDF5 file. As the files within the tarball are not sorted by name, sorting the file names would obviously lead to random access of the tarball, but given all of this happened within a Python script and Python's But so far, I was unable to generate enough "pressure" or whatever was needed to ensure the ARC cache barrier breaks done on my partial cop of the user's data. The user's job was safe with 25 clients but caused problems with 250 clients running simultaneously. So far I tried it with up to 800 clients but the system just maxes out at about 60k reads and 15% cache misses
(please note for this test, I reduced the available RAM via the kernel's command line as i hoped to be able to trigger the problem more easily with less RAM).
Any ideas which kind of scenario I should/could try? |
If you could print out /proc/spl/kstat/zfs/arcstats when it happens, that will be great for debug. |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
System information
Describe the problem you're observing
On several of our NFS servers, we see ARC size exceeding zfs_arc_max and growing until OOM kills the system. Sometimes this takes days, sometimes less then an hour and sometimes it seems to regulate itself.
Initially, we set
zfs_arc_max
to 32GByte (echo $((2**35)) | tee /sys/module/zfs/parameters/zfs_arc_max
) and arcstat confirms thesize
to be 32G (out of 192GByte RAM).After prolonged, inefficient random reads by 1000s of NFS clients at about 400MByte/s leading to an average 740MByte/s read from the pool's 8 vdevs with 3 disks each in raidz configuration:
Initially:
after some time
(adding
waste
column does not really yield much, it always stays below the 100kByte level)Describe how to reproduce the problem
Hit a NFS (v3) server hard by 100-1000s of clients reading data off disks (presumably in small random bits).
Include any warning/errors/backtraces from the system logs
As the system dies with OOM killing a "random" process (e.g.
agetty
), not much output was found so far.echo 1 > /proc/sys/vm/drop_caches
does nothing,echo 2 > /proc/sys/vm/drop_caches
does cause a brief noticeable effect:but later-on while still running/blocking the situation grows more and more dire:
I am still investigating but so far the only "cure" found was restarting the NFS server which seems to let a lot of cache space "expire".
That, happy for any hint what I could do to mitigate/get around this problem.
The text was updated successfully, but these errors were encountered: