-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defragmentation like on btrfs #4785
Comments
As is repeated multiple times, this requires block pointer rewrite, and as There are 10 kinds of people in the world; those who can read binary and On Thu, Jun 23, 2016 at 3:05 PM, @RubenKelevra [email protected]
|
You can see block pointers with Fragmentation is a normal aspect of filesystems. It is required for a CoW filesystem to function and is a requirement for filesystems in general. The main exception would be tape archives and maybe log-based filesystems (provided you never unlink or append to a file). Fragmentation is a non-issue when the average IO size times the IOPS constrains the sequential throughput. The 128KB default record size should limit the extent to which fragmentation can affect performance, at the trade-off of read-modify-write IO amplification when your IOs are routinely smaller than it such as under a database workload (where you will want a smaller recordsize). The ill effects attributed to file fragmentation in ZFS are often ZFS' own anti-fragmentation measures taking effect when metaslabs reach 96% full to prevent a bad situation from becoming worse. That bad situation being the formation of gang blocks to satisfy the need for a large allocation by using several little allocations in place of one. That increases the IOs required to perform reads and writes whenever one must be used, and consumes free space faster. When metaslab reaches 96% full, the ZFS driver will switch from first fit allocation behavior to best fit allocation behavior to minimize the number of gang blocks that it uses. That is extremely CPU intensive and it can limit IOPS. Also, ZFS has a feature called LBA weighting that will prefer metaslabs at the outermost tracks (at low LBAs). This increases bandwidth while the pool is relatively empty, but causes metaslabs to enter the best-fit allocator earlier and has the effect of making the decrease in performance when the pool is relatively full more apparent from having mostly only the inner most platters to use. This means that when a pool composed of rotational disks is particularly full, performance of sequential writes and sequential reads of recently written things will be lower than what it was when the pool was empty. This also has the effect of triggering best-fit behavior on the outermost tracks early because those metaslabs can exceed the 96% threshold from it. The effects of LBA weighting as a pool fills are often misattributed to fragmentation. LBA weighting can be disabled globally via the If you are willing to sacrifice some performance when a pool is empty for more consistent performance as a pool fills, you could disable lba weighting via |
Thanks for your very informative response @ryao, I've using zfs as storage for raw files for vservers. I added a cronjob which do snapshots every two ours. The server exports the diff via send | receive to a backup storage. Rolling back can be easily done if the snapshot is still on the main machines but I have to copy the entirely data back if it's deleted on the main machines. So the issue of fragmentation in this case is very very strong, the disks are doing the triple times of IO than one year earlier, but the load is about the same. So I think my usecase is pretty bad on the current approach because the plain raw files are going to be fragmented more and more, because the original chunks remain as snapshot-data for fast recovery. It would be nice if it would be possible to just rewrite the actual last data to a new location at once and let the snapshot-data laying fragmented on disk - nobody cares here. |
@RubenKelevra I suggest reading this: http://open-zfs.org/wiki/Performance_tuning It has some tips. Things like read-modify-write overhead could be happening. read-modify-write is more often an issue rather than fragmentation. vserver does kernel virtualization, so a blanket tip like recordsize=4K like I would say for a VM host might not be the best here. It depends on the IO sizes that your applications are using. e.g. a PostgreSQL database would do best with recordsize=8K. |
Actually I use only one DB-Server per Hypervisor, and |
It would be nice to implement an defragmentation support and a possible to display the file-fragmentation on a disk.
Currently it's only possible to view the fragmentation of the free space (how is this calculated??).
The text was updated successfully, but these errors were encountered: