-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvement for metadata objects reading #14425
Comments
Hi there, I'm trying to understand the implications of such a feature, does somebody know if Sun's "ZFS On-Disk Specification" Draft is still relevant? Readind zdb -dddd output is a bit overhelming. Best regards, |
I believe this will help with Reference: https://www.reddit.com/r/zfs/comments/sgg9iu/quirk_on_using_zfs_dataset_as_rsync_target_xattr/ |
tl;dr: I don't see real gains here, just use appropriate devices for IOPS (ssd/nvme/ big ARC) PS: I wanted to point at |
Thank you very much CyberCr33p, suggestion to use a fast small SSD for L2ARC only for metadata looks promising, I'll investigate it. But in general, it seems weird to me, that I can unzip an archive with 10 thousand folders and 100 thousand files to ZFS fs residing on a slow HDD with 150 IOPS in under one minute (archive of 600M, the sum of uzipped file sizes is 4GB). But to list that uzipped folder takes one hour with HDD reading small pieces of data all over the disk. Looking at iosnoop output I see about one IO for each file and directory, reading 512 bytes to 3 kbytes of data at "random" locations. I understand that any modification of meta will rewrite the original object to another location. For this reason, we have atime e relatime switched off. I think it is worth investigating. For long-term storage which is only for appending new data and occasional reading the ability to read metadata efficiently could be a plus. Thanks, |
Another option which will guarantee fast metadata access would be to use a special device:
A small SSD or SSD mirror makes metadata accesses very fast and with recent ZFS 2.1.6 changes L2ARC and special device can complement each other instead of duplicating. |
Yes, fast special vdev solves the problem. But it must be redundant, which adds complexity. L2ARC with cache vdev, solves only partially. Because the first time it should be populated. And will be pushed out of cache eventually. For data used occasionally on a big disk, it will not help. My use case is 20TB spinning disk, containing millions of files, in separate folders. Each such folder, after a year of service, gets zipped. So my problem is that zip spends more time enumerating files in a folder than zipping the data in it. I'm doing research, on how to optimize it. It could be useful also in other scenarios. Take a look. Regards, |
I've experimented with setting I posted #15118 with the intention of allowing special devices to be used to accelerate reading without the need to worry about the redundancy, since it would give much more predictable performance than messing with L2ARC. Otherwise the only way to accelerate your |
If the point is about speeding up metadata demand reads, then adding a fast special device will do the trick (at least for newly written data). In case it should be more flexible, like being able to affect existing data and removable if no longer needed, then something like the last paragraph of #13460 (comment) could be the most convenient way to scratch this particular itch. |
@arg7 , fantastic tool !
besides optimizing opjects reading, it would be helpful if metadata eviction wouldn't be so inconvenient like now. actually, i know of no method how to make metadata better stick in memory or at least let it be served from l2arc. i have spent hours and hours analyzing and fiddling with zfs params, it's absolutely frustrating to see, that you cannot have predictive metadata performance from arc. |
Describe the feature would like to see added to OpenZFS
ZFS should try to write metadata objects in adjacent sectors of physical disks in order that vdev prefetch and physical disk cache to kick in.
It could be done by reserving let's say 5-10% of space on the disks for metadata only, so every metadata object is written next to its parent one, if possible.
Another possibility is to add background process, like scub, which performs metadata objects defragmentation.
How will this feature improve OpenZFS?
Currently, ZFS on big slow magnetic disks is performing quite poorly, when enumerating filesystem content with ls command. It generates about one IO operation for each file or directory, and when we have thousands of files and folders, simple ls command can take hours to complete.
Other filesystems, like ext4 are not doing better in this reagard.
I've noticed, that once metadata is in the ARC, ls is instantaneous. And also other commands, like rm are very fast on slow disks.
Additional context
I'm using zfs-2.1.4 on ubuntu 22, and the performance degradation is easily observable on a slow magnetic disk, with a lot of files and directories by using ls -R /your pool/fs name
What do you think about such a feature, is it doable?
Best regards,
AR
The text was updated successfully, but these errors were encountered: