-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM triggered, suspect ARC to blame #14686
Comments
It seems like ARC was trying to shrink to the minimum, but can't. I have no clue why could it be. It could help if you could provide full kstat and/or arc_summary output. I wonder if the data can somehow be non-evictable, or it is really some math issue. |
I lost access to the system (build eventually times out, and the system is cleaned up), but here's arcstats from when I was able to log in and inspect..
|
According to small "pd" value, the size of MRU data should be very small, but that is where most of the ARC size reside according to the stats. I'll think more about it, but if you can reproduce the problem, it could help with understanding to dump everything possible around the arc_evict_impl(arc_mfu, ARC_BUFC_METADATA, e) call in case some math does not work right there. |
I can.. it might take me half a day, but I should be able to reproduce it..
any chance you could provide me a patch, or open a PR, with the code you'd like me to test with? I can get that done tomorrow (PST). I appreciate the help! |
@amotin we run our systems with |
The allocation which failed is also very surprising. It's a tiny zero order GFP_KERNEL allocation which should have been able to get a page via direct reclaim from the ARC. The
I'd say we may not be pruning the Linux dentry and inode caches aggressively enough via One experiment you might try is setting @grwilson it looks like you must also be increasing the default |
i.e. reproduce a system to this point, and then run these commands, and see what happens to the ARC? I can work on that..
|
If this was the case, would the stats show things accounted for in
|
Right. What we'd expect is the Linux virtual memory system to apply pressure to the ARC until everything has been reclaimed down to
That's the thing, I'd expect it to be accounted for as metadata not data. Which is what makes me think that's not what's going on unless somehow we're somehow miscategorizing these buffers. |
OK.. I appreciate the input and help so far. I'm trying to reproduce the issue now, so I can try the Additionally, if there's any patches you'd like me to deploy and test with, feel free to send them to me and I can get them into our buildserver image that has the problem. It's not clear to me what information would help root cause so far. |
I haven't been able to reproduce the issue so far today, after running like 30+ builds 🤷 before it'd hit relatively frequently, maybe every 3-4 builds... |
It's always something! I also noticed I made a terrible typo above, you want to set
|
Hit again:
Run experiment:
While waiting for
nothing seems to be happening w.r.t. the ARC? I should have the system available for maybe 10 more hours (before it times out), if there's anything I could collect from it to help..? |
@grwilson The code behind zfs_arc_meta_strategy was removed completely. There is nothing left of it. I am not sure what was your motivation behind setting that tunable. Setting zfs_arc_meta_balance = 1 would cause metadata to be evicted first if there are any data ghost hits at all, that I really doubt is your goal. zfs_arc_meta_balance should not affect OOMs, but only what should be evicted from ARC under pressure. |
If it's an EC2 VM, doesn't Amazon let you hibernate the VM, then you could snapshot the volume and be able to bring it back up and see it broken like this very conveniently on demand as long as you kept the snapshot? |
after 6 hours, it's finally able to evict data out of the ARC:
I have some grafana stuff enabled, so I can see clearly, it took 6 hours for this to occur: Hm.. not sure what happened, but I stepped away, and now that I'm able to start poking around again, it looks like the ARC has evicted.. I don't know why it couldn't (wouldn't?) do that for ~6 hours.. Hm.. Also, here's
|
Also, FWIW.. as I logged back in hours later, and started poking around.. I ran:
so I think right now, now that the ARC has evicted.. it's returning 0:
so, I think at the point where I was able to log back in and poke around, the system hadn't yet evicted.. but while I was trying to inspect the system, eviction was able to occur.. I don't know if, why, or how my shell activity would have affected ARC eviction, but just wanted to mention it.. |
@prakashsurya Could you please test this patch: #14692 ? |
Sure, let me start working on this.. I'll be out part of the day today, so I might not be able to reproduce until tomorrow, but I'll work on this. Not sure how useful this is, I think it just gives us the same information we already suspected, but here's some graphs of ARC stat info, over that same time period of the same system that I did the |
We can see, around 21:44, Still not clear why it behaved this way. |
Why is So, maybe we need to understand why |
We also run with a larger arc by default. We use 3/4 of all memory vs upstream which use 5/8. |
Hm.. could we be filling the ARC with prefetched data, and then not allowing that to be evicted due to this logic? |
Theoretically we could, but for that we would have to be very bad in our prefetch decisions, since first demand read clears the prefetch flag, allowing the eviction, and even if not then buffers should be evicted at most 1 or 6 seconds later. Considering that eviction thread should try it again and again I am not sure what value of evict_skip would I consider suspicious, but I can't say that 4M I see makes me worry too much. |
@prakashsurya Please let me know how it goes and if I should continue searching. |
@amotin sorry for the delay, I kicked off 10 builds with your change applied.. lets see how those go.. With that said, I'm skeptical it's a signed-ness issue at play here, since I'm seeing |
Do we have any theories as to why |
@prakashsurya With MFU data state being persistently beaten into the ground, I am absolutely not surprised by evict_skip and evict_not_enough growing same time. I can't quantify their exact values without knowing how many times eviction was called in general, but so far it all fits within my understanding of the problem that should be fixed by the signed-ness PR. |
OK.. if it makes sense to you, then that sounds good to me. I was able to get #14692 applied and ran through about 30 builds, none of them failed. I'd say, lets work to get that integrated.. Then I can work to get our product builds running those bits, and see if we run into any problems.. looks good so far. |
Closing this with the #14692 merged. |
I'm still investigating things, but I wanted to open an issue with the data I have, since I'm thinking it may be related to #14359
We have a system/image that's used as our buildserver, used to build various artifacts for our product (packages, VM image, etc.).
Since we integrated #14359 into our version of ZFS used on these buildserver images, we've been seeing OOM kills, causing build failures.
In the one case that I've been able to inspect, I see the following:
So, the system only has ~3G of RAM.. most of which appears to be used.. but, used by what?
It doesn't look to be used by processes...
Looking at the ARC, I see this:
For a nearly 3G RAM system.. using ~2.9G for ARC seems excessive.. particularly, given that OOM killer is kicking in to kill user space processes.. e.g. why doesn't the ARC evict data?
Here's an example OOM message:
Additionally, I've attached a g'ziped copy of
syslog
from the system.. syslog.gzI'm still actively investigating this issue.. and not certain that #14359 is to blame yet.. but things point it so far.. For example, an image generated prior to that change being integrated, does not have these OOM issues.. but an image generated after that change was integrated does have these OOM issues.. that coupled with the fact that nearly all 3G of the system are being used by the ARC, is leading me to suspect it's the changes to the ARC eviction at play here..
As I get more information, I'll report back.. but wanted to post the information I have so far, in case somebody more familiar could chime in, and maybe help me pin point the problem..
CC: @amotin @behlendorf @allanjude
The text was updated successfully, but these errors were encountered: