-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow import #11034
Comments
Note, 5 minutes is a lot faster than version 0.8.4 for which import never completed with default kernel module settings (#10828). As one random guess, I tried disabling multihost, but that did not make a significant difference,
Another guess is the number of snapshots since listing those takes a similar amount of time with similar iostat and top statistics (i.e., neither storage devices nor CPU at anywhere near performance limits),
which is based on a smaller number of filesystems,
And here are the pool properties
|
Removing 80% of the snapshots helped somewhat, but it still takes ~3 min to import,
|
That's interesting. Good idea to try reducing the number of snapshots. It does seem like the time is being spend loading the config. I'll try creating thousands of snapshots on a local pool and see how that impacts import time. |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
In response to stale bot: I am still interested in faster pool import times for pools with a lot of snapshots. |
I don't have a ton of snapshots and my 66-disk pool takes 6 whole minutes to import from cache, every boot. Open to assisting with data any way I can. It's never been any faster than this -- it used to be much worse, as udev was not as well parallelized until recently. |
My 22 drive pool takes 4.5 minutes to import from cache, every boot. No snapshots and the same as @putnam. Always slow on every boot. 4min 35.352s zfs-import-cache.service |
@amotin Would this apply only to unclean exports? I am surprised if the default shutdown behavior in Debian would cause such an export. But I am not sure exactly what plays out when the machine is rebooted normally. |
Shutdown/reboot do not trigger export, only TXG flush. With export I mean literally |
Pardon the lack of knowledge on this: so on a typical shutdown/reboot, the pool is left in an unclean state and must do some work during the import to fix itself? It must be an intentional choice by the maintainers to not run export at shutdown -- should they? |
On reboot all the dirty buffers are flushed and transaction group is committed. So technically pool is clean in a sense of possible data loss. On next import ZFS still scrubs last 2 TXGs of metadata just to be sure, but it is not strictly required. But reboot does not flush spacemap logs, that is done only on explicit Anyhow, unless you really call |
log_spacemap is indeed active on my pool. OK, I understand now that a typical reboot/shutdown qualifies as an 'unclean export' since it doesn't wait to run a full zpool export, which would flush the spacemap logs. So then when imported again on boot, although it's sped up somewhat by the fact that there's no ZIL replay or disk scanning required thanks to the cache, the spacemap flush is what is likely consuming all this time. And your change looks like it's designed to speed this up dramatically! I'm a little fearful of cowboy-patching on my production system, though I have an offsite backup. Is there any chance of the change corrupting the pool in any way? |
We already use this patch in TrueNAS builds for several months. So far nobody complained. But as you can see I am still trying to get proper review. I hope additional feedback motivate the reviewers. |
Would we be able to test your patch by booting the TrueNAS SCALE install disc and importing the pool temporarily there? Could time it and compare. |
I've never used SCALE installer for that, but may be. Though only one aspect (prefetch) of the patch will affect the first import time. Maximum log length is enforced slowly during normal pool operation. |
The PR was merged. |
I just want to provide an update now that 2.1.6 has landed in Debian unstable. With this patch, the wait time to import my 66-disk pool is now less than 15 seconds vs. approximately 7-8 minutes before. Nice work! |
System information
Describe the problem you're observing
zpool import reproducibly takes nearly 5 minutes for a pool with 60 HDD without any obvious storage or CPU bottlenecks.
Describe how to reproduce the problem
And a similar time was seen during the import at boot.
Include any warning/errors/backtraces from the system logs
During initial boot import,
During multiuser import,
The text was updated successfully, but these errors were encountered: