Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs freezes kernel during boot #7466

Closed
mabod opened this issue Apr 20, 2018 · 6 comments
Closed

zfs freezes kernel during boot #7466

mabod opened this issue Apr 20, 2018 · 6 comments

Comments

@mabod
Copy link

mabod commented Apr 20, 2018

System information

Type Version/Name
Distribution Name Manjaro
Distribution Version rolling
Linux Kernel 4.16, 4.15
Architecture amd64
ZFS Version 0.7.8
SPL Version 0.7.8

Describe the problem you're observing

Kernel freeze during boot process saying: Out of memory and no killable processes. When I deinstall the spl and zfs modules the computer boots just fine.

I can then install the modules again and load them manually. zpool import shows the pool ready to be imported. When I do that, the computer immediately goes from gnome to the console and freezes completely.

If I then do a hard reset the computer does not boot again until I deinstall spl and zfs modules (with live DVD).

Soemthing seems to be wrong with the pool which causes zfs to crash completely. I have checked with different packages versions for 0.7.8 from testing repository to stable. Always the same.

The pool was running just fine until this morning. When I turned off the computer for a couple of hours I could not boot anymore.

How can I debug this? Any hope to recover the pool?

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

@mabod
Copy link
Author

mabod commented Apr 20, 2018

When installing and loading the modules manually I can import another pool from my USB JBOD without problems. So the issue must have to do with this one internal pool: zstore.

   pool: zstore
     id: 14393956364711311496
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

	zstore               ONLINE
	  mirror-0           ONLINE
	    WD-WCC4E5HF3P4S  ONLINE
	    WD-WCC4E1SSP28F  ONLINE
	  mirror-1           ONLINE
	    WD-WCC4E1SSP6NC  ONLINE
	    WD-WCC7K7EK9VC4  ONLINE

@mabod
Copy link
Author

mabod commented Apr 20, 2018

I solved it by using zpool import -F zstore
That terminatedmy gnome session and kicked me back to the console. At least no total freeze.
The pool was imported but not mounted. I had to do another export/import. Now the pool is fully operational again. I am currently doing a scrub just to be sure.

But what can cause this issue all of a sudden? At least zfs should not bring the whole system down when it has trouble with a pool.

@loli10K
Copy link
Contributor

loli10K commented Apr 20, 2018

Duplicate of #3863

@loli10K loli10K marked this as a duplicate of #3863 Apr 20, 2018
@mabod
Copy link
Author

mabod commented Apr 21, 2018

You mean this is a duplicate of an issue reported in 2015? #3863 was added as milestone to 0.7.0 on 3 Oct 2015. Then it was added as milestone to 0.8.0 on March 2016 and removed as milestone on Feb. 2018.

This is confusing.

@mabod
Copy link
Author

mabod commented Apr 30, 2018

It happened to me again today. This time the first zpool import -F zstore did not help. It frooze the system too. Only the second try was successful. Unfortunately the freeze is so massive that I have no logs to provide other than the screenshot from the boot screen:
screen

Several hours before the issue happened I was working on ca. 2.000 pictures in one directory. Converting, copying, etc. Reminds me of the first incident where I was working on the same pictures hours before. What I also did in both cases before the issue occurred was fio benchmarks with size=64G.

Now my question: How can I prevent this in the future or at least mitigate the effects?

After reading #3863 I turned atime on:

2018-04-22.12:19:27 zfs set atime=on zstore
2018-04-22.12:19:37 zfs set relatime=on zstore

Obviously that did not help.

Fixing the issue is cumbersome: I have to boot via DVD, chroot and deinstall the zfs modules before I can boot again. Then install the zfs modules manually and do zpool import -F zstore.

I have three questions:

1.) It would help a lot if I could just deactivate zfs temporarily during boot via a kernel cmdline parameter. This would eliminate the DVD boot which is pretty time consuming.

2.) Would it help to export the pool on every shutdown? What would be the draw back of doing that?

3.) Can I limit the amount of memory used by zfs? My computer has 32 GB of RAM. That this is not sufficent for a "small" RAID10 pool is hard to believe:

NAME     SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zstore  7,25T  3,44T  3,81T         -     1%    47%  1.00x  ONLINE  -

Any module parameter I could use to help with memory consumption?

@jonathonf
Copy link
Contributor

jonathonf commented May 2, 2018

The OOM on boot is weird. Do you have dedup enabled? #2492?

Just to remove it as a factor, have a read of the Arch wiki page about limiting your ARC to see if that helps: https://wiki.archlinux.org/index.php/ZFS#ZFS_is_using_too_much_RAM

Edit: Oh, no, your other open issue has more detail. Doesn't look to be dedup or ARC-related from that.

@mabod mabod closed this as completed Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants