Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Importing ZFS pool xyz Out of memory" crash at boot. #3863

Closed
DannCos opened this issue Oct 1, 2015 · 5 comments
Closed

"Importing ZFS pool xyz Out of memory" crash at boot. #3863

DannCos opened this issue Oct 1, 2015 · 5 comments
Labels
Component: Memory Management kernel memory management Status: Inactive Not being actively updated

Comments

@DannCos
Copy link

DannCos commented Oct 1, 2015

Hello
I replicated the following scenario in 2 different computers running the following specs:

  • OS version: centos 6.7 x64
  • ZoL version: latest as of today 1-octorber-2015
  • RAM: 4GB
  • DEDUP = off
  • COMPRESSION = off
  • No snapshots

-Computer one has one zpool of 4 x 500GB in zraid1 (4 mirrors).
-Computer two has one zpool of 4 x 1000GB in zraid 1 (4 mirrors)

Issue:
After a certain amount of space is filled on your zpool (please jump to the end of this post to find how much was needed to trigger this issue) and you reboot for whatever reason not related to this problem, you find an "Out of memory" warning at boot time that appears while importing the zpool and prevents linux from completing the boot. The error reads

"Importing ZFS pool zpool1000gb Out of memory: Kill process XYZ or sacrifice child
Out of memory: Kill process YZX or sacrifice child
Out of memory: Kill process ZYX or sacrifice child"

-On computer one (4x500GB), the issue occurred after 180GB of data.
-On computer two (4x1000GB), the issue occurred after ONLY 10GB of data.

In both cases, I had to insert another stick of ram (from 4GB to 6GB) for the computer to boot properly.
In both cases, there was no issues while copying the data.

I found this very weird, has I have another server with 8 x 2TB in zraid10 with 16GB of ram, running with the zpool at 90% capacity, and this never happens.

cheers

@FransUrbo
Copy link
Contributor

Might this be related to #3866?

@DannCos
Copy link
Author

DannCos commented Oct 1, 2015

Hello Frans

I have not yet tried to mount the zraid1 pool by hand inside linux to see if it complains about memory the same way it does during boot.

I am however testing other zfs raid options instead of zraid1.

So far with zraid10 (2x1TB + 2x1TB), I have already copied 700GB of data, and the computer boots just fine into linux. In contrast, with zraid1 the computer will not boot if i have more than 10GB of data on the zpool (4x1TB).

There is a definite issue with zraid1 and out of memory problems, at least with 4 hard drives. I will also try zraid1 with 2 and 3 hard drivers to reach a consensus.

@behlendorf behlendorf added this to the 0.7.0 milestone Oct 2, 2015
@behlendorf
Copy link
Contributor

@DannCos from what you're describing it sounds as if we're allocating a significant amount of working memory during the import. Two quick questions might help us narrow this down.

  1. Are you importing a pool which was cleanly exported? If it wasn't cleanly exported then it's possible that a large amount of log reply needs to occur and this has the potential to consumer a significant amount of memory for a short period of time. In your case it sounds like more than is available.
  2. If your able to reproduce this it would be very helpful to run slabtop while importing the pool. It should give us a good idea of what's consuming the memory and help us narrow down the issue.

@DannCos
Copy link
Author

DannCos commented Oct 5, 2015

@behlendorf Am I correct to assume "sudo shutdown -r now" properly unmounts the zpool before actually rebooting? I never read we were supposed to export the zpool before rebooting the OS.

I say this because you might be correct. The issue only happens when rebooting the computer, but only under three simultaneous conditions:

  • zraid1 with > 4 HDDs
  • atime=off
  • the harddrives must not have any previous partition from a destroyed zpool.

I created and destroyed my pool many times to reach the following conclusions:

  • With atime=on and >4 clean hardrives there are no problems.
  • With atime=off and >4 clean harddrives the OS will not boot, complaining of "out of memory"
  • With atime=off and >4 dirty harddrives there are not problems

Also, one more thing that contributes to your assumption:

After the OS boot hangs with "out of memory", I shutdown the computer to insert another +2GB of RAM and the OS now boots correctly. I shutdown the computer again and remove the extra 2GB of RAM and the OS continues to boot correctly. I then copy another chunk of data to the pool, reboot, and it hangs again. I add another 2 GB of ram again, it boots correctly, remove the 2GB of ram, and it continues boots correctly.

So every time I copy chunk of data to the pool and reboot, it will hang.

slabtop will not tell me anything because everytime I export/import the pool inside the OS, it mounts properly everytime.

@behlendorf
Copy link
Contributor

@DannCos it's entirely safe to just pull the plug on the system but when you do so there may be some pending work which needs to be completed during import and subsequent mount. This shouldn't take a significant amount of memory but clearly something unexpected is going on.

Unfortunately, the only way we're going to be able to get to the bottom of this is to get some debugging on where the memory is being used at the time of the OOM. Or even just a back trace from the console. If you can drop to a recovery shell after the boot fails and run dmesg that would be helpful. As long as the sysctls oom_dump_tasks=1 and panic_on_oom=0 are set you should be able to login on the console.

@behlendorf behlendorf modified the milestones: 0.8.0, 0.7.0 Mar 26, 2016
@behlendorf behlendorf removed this from the 0.8.0 milestone Feb 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Memory Management kernel memory management Status: Inactive Not being actively updated
Projects
None yet
Development

No branches or pull requests

4 participants
@behlendorf @FransUrbo @DannCos and others