-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZFS takes 5-6 seconds to import and mount a pool #2190
Comments
In my experience, the big slowdown is all the partition scanning and udev interaction with zvols. It would be interesting to see what |
This is from a different boot that the one pictured above, but you can get an idea
Note that I'm not using that zvol (or any other for that matter) for swap or any other purposes at the moment. |
@turl My hunch that other things known to affect import times are not at work here. Would you include the output of |
@dweeezil Having given this some thought, I believe that there is potential for unflushed transactions in ZIL to force ZIL replay during normal boot. The numbers that the OSv developers shared with me show a 5 to 6 second penalty on Amazon EC2 that can be alleviated by flushing ZIL at shutdown to avoid ZIL replay. The addition of a flush made them rather happy with things and they did not report any noticeable increase is shutdown time. This suggests to me that the time needed to flush them at shutdown is much less than the time needed to perform replay at boot. Consequently, it would be useful to extend the code to address that by allowing the shutdown process to forcibly flush these things to avoid the ZIL replay penalty at boot. @ahrens If you have some time, would you provide a second opinion what I have said? |
@ryao here it is
|
I mainly mentioned the issue because I've experienced this delay myself and have been thinking of ways to mitigate it. On most of my machines, all the vols are for virtual guests and I really don't care about accessing their partitions (at least not right away after a reboot) from the host system. The system I'm on right now took about 14 seconds to scan its 32 zvols. Things can get a lot worse if the snapshots are exposed. @ryao Shouldn't the ZIL be empty after a "clean" shutdown in which the filesystems are unmounted? Of course, I'm not totally sure whether a normal shutdown actually causes the zfs_umount->zfs_sb_teardown->zil_close chain to be called (for each filesystem). |
@dweeezil You would think that, but I see no reason that it is and there is sufficient evidence that it is not. |
There's a lot going on in your zfs systemd unit so it's hard to say for certain what is exactly responsible for the delay. It could certainly be any or all of the things mentioned above, it would depend on your exact situation. We should be able to get a much better idea of where the time is being spent in the 0.6.3 code. @Lalufu did some really nice work (881f45c) adding support for systemd upstream and splitting the bring up in to multiple logical units. In addition, certain parts of the module initialization are now done asynchronously which I'd expect to speed things up. I'd want to see the |
The last boot on my small storage took ~30 seconds in zfs-mount.service, mounting 24 file systems. If a pool was not shut down cleanly (the system was hard reset), is the work needed to get the pool into a clean state done on import, or on mount? |
Zil replay in particular will be done per-filesystem during mount. However, pool import can potentially also result in significant IO. |
I thought I'd add a couple of our observations - we have a really pathological case where pool bring up can last for 15 to 20 minutes. The system that we are using separates the import and the mount phases (
BTW, when looking at ZIL replay we've added timer kstat support and ZIL replay kstat that collects timing info on various ZIL records. If there is enough interest I can pull-request it. While there are some low hanging fruits with pool import, there is no simple solution for optimizing ZIL replay during mount - it is done sequentially. And it is not trivial to take advantage of normal file operations - the filesystem does not exist until mount is finished. |
Issue #1526 and this are related. The fix will likely be the same. |
This is probably a very stupid question, but here goes: why does the ZIL have to be replayed completely on startup? Can't we just reconstruct the ARC from the ZIL and write out the data and age out the ZIL normally? If the system crashes again we still have the ZIL on disk. |
@Lalufu The question is by no means stupid. The deal is that ZIL doesn't keep raw ARC data. It rather gets very short records on what high level operation is going to be executed at the file system level (well, with one small exception, but bear with me). For example, you are creating file named FILE1. The record itself is very simple you just write down the object id of the directory and the name of the file (together with its permissions, timestamps, etc - i.e. metadata). Overall the amount of data to be recorded is small. When it comes to execute this operation - it translates into quite a complex work of creating ZPL objects, frequently of very complex nature. Ultimately it ends up in updating a significant amount of datablocks on disk. There are only few records that are may keep the actual raw data to be written, but even then you will need to replay them one-by-one, otherwise you risk losing your data due to unintentional re-ordering. Hm, I am not sure I explained it well, but I'll be glad to elaborate if more details needed. |
With Arch updates today, I got a ZoL git version with the new systemd units. I see a ~2.5s time span on zfs-import-cache.service ("/usr/bin/zpool import -c /etc/zfs/zpool.cache -aN") and a ~1.5s one on zfs-mount.service ("/usr/bin/zfs mount -a") |
Import times may be further reduced with #4794. |
Closing. Import times have been reduced considerably by various commits. There's definitely still room for improvement in some areas but lets file new issues as needed. |
I'm filling this issue by @ryao's request
The ZFS unit contains
I'm using Arch Linux, ZoL 0.6.2, Linux 3.13.6
The text was updated successfully, but these errors were encountered: