-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ordinary dd from a disk device can kill the ARC even after it finishes #3680
Comments
@siebenmann As of 11f552f, |
@siebenmann You want to use Direct I/O with dd which bypasses the file cache and prevents this kind of issue. Just specify "oflag=direct". |
@ronnyegner The problem is a bit deeper than that. In the current implementation, a simple The current scheme under Linux w.r.t. the page cache appears to be that filesystems are supposed to use it for their cache but are also supposed to yield to user programs if possible, however, the scheme has problems when there are multiple filesystems competing for the page cache (which one wins). In the case of ZFS, at least we can set EDIT: And, EDIT[2]: actually, it seems NR_INACTIVE_FILE might be enough to hack around the problem. |
A quick test shows that https://github.com/dweeezil/spl/tree/freemem works around most issues related to this. EDIT: This may very well be incompatible with ABD. |
I've added a couple more sources of freeable memory in dweeezil/spl@08807f8 (https://github.com/dweeezil/spl/tree/freemem branch). |
@dweeezil what makes you say that ? Please explain (I'm eager to learn about internals :) ) If it compiles with ABD, I'll give it a try, thanks ! |
@kernelOfTruth Under ABD, we'd like to exclude ABD's own reclaimable pages from the EDIT: Thinking about this a bit more, it might not matter. I've not looked at ABD in awhile. |
At the moment ABD isn't integrated with the page cache (it's a first step) but once it is we'll need to revisit this. As for this being a heuristic that's definitely the cache. But FWIW the entire Linux VM is just a collection of evolved heuristics that happen to work well for many workloads. @dweeezil's proposed patch looks like a very sensible way to handle this case for now. Let me run in through to additional testing and get it merged. |
Thanks guys, I've merged the fix. |
@dweeezil I've been running your fix with ABD for some time now and observed no issues Thanks |
I have ZoL 0.6.5-3 in mint from the ppa. Not sure if that is too old to have the patch or not. I too have seen a shrinking arc in conjunction with the creation of a sparse 1TB zvol and putting xfs over luks on it. Now I am observing a gradual shrinking of ARC (initially 12GB of the 32GB of the machine to the 1GB limit I set), then at some fraction of the read-rate (like 50MB per second the ARC shrinks, but recovers, again but not to the maximum such as 12-11.9-11.8-11.7...11...11.1 11.5 11.4 11.3... 11.1 10.9... kinda sine-wave-with-attenuation-until-faded-like) while copying data from another pool to the zvol - used to just send the zvol to the backup pool, but that resulted in non-sparseness, so I thought I'd give simply copying the files over a chance in order to get the sparse zvol to sane sizes again. As soon as I issue the "drop caches", the 12GB are reached quickly (at source-ppols read-rate) again. EDIT: I just discovered (closed) #548. Sure, my backup pool is on 6-Disk raidz2. So when writing e.g. 8 meg of random data to the newly created ZVOL, only with -o volblocksize=16K, 32, 64, 128k did the space requirements NOT double... OK, so I learned, I have to have block sized that can be striped across four disks. |
I wanted to see the raw speed of my L2ARC SSD, so I did the obvious thing:
To my surprise, this wound up destroying my ZFS ARC; the size shrunk to 32 MB and stayed there even after the dd finished. In the end I had to do 'echo 1 >/proc/sys/vm/drop_caches' and then reset zfs_arc_max in order to have the ARC recover (although it looks like a slower recovery would have happened over time anyways).
What seems to be happening here is that the straightforward dd is flooding the Linux page cache. This drops free memory down to the point where arc_no_grow turns on and ZFS starts shrinking the ARC. Unfortunately the page cache is voracious and will eat all the memory that ZFS will free up, sometimes driving the ARC all the way down to its 32 MB minimum size on my machine (and when the ARC stays above this, it is non-data that remains; data size goes to zero). When the dd finishes, the machine is still low on free memory and so arc_no_grow doesn't turn off and ZFS never tries to grow the ARC and thus never puts pressure on the page cache to evict all of those uselessly cached pages from the dd. In order to unlock the ARC, I can wind up needing to explicitly purge the page cache; in the mean time, ARC performance is terrible. Even once arc_no_grow goes back to zero, ZFS is very reluctant to grow the ARC for data even in the face of demand for it.
(eg 'arcstat.py 1' shows terrible ARC hit rates and the data_size remains almost zero.)
All of this is with the latest git tip, 6bec435 as I file this report.
The text was updated successfully, but these errors were encountered: