-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARC_PRUNE Storm - Server becomes unusable - 1 SSD drive, 60 zfs filesystems mounted #6630
Comments
@thomasfrivold It's
I.e. your ARC been as high as 37G, current size is 4.5G, and it's trying to get that down to 369M. Check your ex-ZFS memory usage: have a look for what might be causing the memory pressure that's pushing down the ARC size. |
@thomasfrivold Given you're on linux 4.10 and have a reasonable amount of memory, you may well be suffering from the problem described in this comment in #6635. The patch fixing that issue hasn't yet been committed, but if you're comfortable with git and compiling your own ZFS modules you might like to give it a try. (I'm running it in production, so it's safe enough...) |
Closing. This issue should be resolved in the 0.7 series. |
Describe the problem you're observing
We're experiencing an extremely high CPU load (400-450 continous, for hours) and soft lockups multiple times a day involving arc_prune and spl_system_task:
As a result of this bug, the machine becomes practically inoperable.
SSH logins are still possible. The system is not able to stat most things in /proc/
So top works, htop does not.
Ps ax works, ps aux does not.
Describe how to reproduce the problem
The box has 92G ECC RAM, Xeon E5620 x 2 (16 cores) and, possibly crucially, 60 mounted zfs filesystems (base filesystems + snapshots).
The ZFS filesystem is installed on an SSD drive. The machine has over 50% free memory and CPU utilization is 2%-10%.
Number of LXC containers using ZFS, running at the same time: 59
Number of KVM machines using ZFS, booted up: 1
LXC containers using ZFS filesystems are not causing the issue.
The ARC_PRUNE storm, as I've learnt this phomenon is called occurs when booting up the KVM virtual machines and performing OS installs, or copying 10GB+ files onto the VM filesystem.
The server then becomes bogged down by ARC_PRUNE processes that never seem to end. Load averages goes from 90, 85, 15 to 350, 345, 320 to 450, 450, 450.
Output of lshw -class disk
Output of iostat
Output of iostat -x
Output of ps ax
root@thebank:~# ps ax | grep arc_prune
output of arcstat
Suggestions I have found elsewhere with solutions
Why I am adding a ticket on this
I am affected of this error in September 2017, and I am hoping to learn if I can solve the issue with the current installed versions of ZFS and SPF.
The rich discussion on a similar/identical ticket, provided a solution where setting a DNODE_MAX seems to have solved the issue. #6223
It seems that the following solves it:
I don't have ZFS v0.7 or newer installed, so I can't tune the DNODE_SIZE setting.
Are there any other things I can do?
The text was updated successfully, but these errors were encountered: