NTFS on ZPOOL/ZFS with FAST_DEDUP #421

leonik3 · 2024-12-12T19:45:52Z

leonik3
Dec 12, 2024

On debian sid I compiled and installed myself zfs with fast_dedup:
https://github.com/openzfs/zfs/releases/tag/zfs-2.3.0-rc3
I created pool:
zpool create -o autoexpand=on -o autotrim=on -o ashift=12 -O dedup=on -O casesensitivity=insensitive -O compression=zstd -O atime=off -O recordsize=1M -O longname=on zw /dev/sde /dev/sdf /dev/sdg
I created a dataset: zfs create zw/wz
I imported pool on Windows on zfs version with fast_dedup:
https://github.com/openzfsonwindows/openzfs/releases/tag/zfswin-2.2.6rc10
There were big problems in that, more often than not, the wz folder was unopenable in Windows. It showed up as two kinds of shortcuts that could not be accessed. Maybe one in ten pool imports was correct and allowed entry and editing of the wz folder.

As a solution, it turned out that replacing the mountpoint parameter "/zw/wz" with legacy
zfs set mountpoint=legacy zw/wz

I set a drive letter to W:\ that was convenient for me
zfs set driveletter=w zw
I created a virtual vhdx drive of 17 TB in the zw/wz directory. Then a second similar one.

I initialised the new virtual disk

I formatted the new virtual disk in the NTFS file system. It was very important to select an allocation unit that is at least as large as the recordsize in zfs. In my case, a minimum of 1MB also had to be selected in NTFS. If I had not done so, there would have been much worse deduplication.

I copied the following number of files and directories to the first NTFS virtual drive:

It took about 12 hours.

I then copied the following number of files and directories to the second NTFS virtual disk:

I didn't count how much it took, but rather over 2 days.

I have set the automatic import of the pool at windows startup time in the ‘Task scheduler’

I have set the task in the ‘Task scheduler’ to automatically mount vhdx drives at windows logon time.

Mount-DiskImage "W:\wz\K17T.vhdx"

I chose recordsize=1M because I was concerned that the DDT deduplication table should not be too large. At recordsize=1M, one 1 TB file will get a million blocks. If I had chosen recordsize=128K, there would be almost 10 million blocks for that one file. We are talking about one file, but 1TB in small files would still have many times the number of blocks. It seems to me that the large recordsize has a disadvantage in that with small files the block is always filled with zeros to the recordsize and these zeros have to be compressed, but this I have checked experimentally is not a big problem - writing and reading are nevertheless fast, but most importantly I win a lot by reducing the write slowdown with increasing DDT array size.
These are the parameters my pool currently has:

16, What do I gain from all this:
a) Very fast index of all files using the WIZFILE program, which scans all my disks every time Windows starts.
b) DEDUPLICATION on NTFS.
c) SMR disks are no longer such a big problem.

My disks are SMR disks, which cause big problems when writing - every now and then writing slows down to 200-300 KB/sec and such slowdown on the disk lasts 30 minutes, or an hour, or two hours. Previously, these drives worked in a Microsoft striped volume. If there was a slowdown in writing on one disk, it drastically slowed down writing on the entire volume. In the new arrangement, that each of these disks works in zpool stripping, then zpool intelligently stops writing a lot of data to the slowed down disk, and in the meantime writes to the others at full speed.

This is my write buffer configuration on disks:

I could set it up like this, because this data is backed up from time to time

Wizfile:
https://antibody-software.com/wizfile/

It is very good practice to disable real-time file change monitoring, as I have observed various programmes crashing many times after prolonged wizfile operation with file system change monitoring.

This is what an example of uploading a 20 GB file looks like now (using deduplication):

This is what an example of uploading a 20 GB file looks like now (without deduplication):

guenther-alka · 2024-12-12T23:38:27Z

guenther-alka
Dec 12, 2024

Some aspects

Move a pool from Unix to Windows gives mount problems
seems a remaining bug, unclear if fast dedup is involved in the problem
ZSTD enabled
ZSTD can give a higher compress ratio than lz4 but usually with a lower performance
Large recsize and small files
ZFS does not write smaller files in recsize (filled up with zero) but reduces recsize dynamically to filesize (beside draid)
.vhdx as a virtual harddisk is a nice Windows feature, format to ntfs is possible, a higher blocksize for ntfs than the default 4k improves performance
but why ntfs on ZFS? This only adds complexity and reduces performance without additional advantages unless your application absolutely needs ntfs.

The best performance and flexibility would be a simple ZFS filesystem with fast dedup enabled. Size of dedup table can be limited with a quota, a 1M recsize is a good option.

SMR disks are always a problem especially in raid setups: Avoid!
The standard ZFS method for fast file listings is Arc (give ZFS enough RAM) and special vdev for small io and metadata.
A special vdev can also hold fast dedup ddt.

1 reply

leonik3 Dec 13, 2024
Author

Thank you very much for your comments.

ad 1. The mounting problem concerned only the dataset which is a child of the original dataset created with the pool. In Windows it mounts as a folder.

ad 3. I suspect that zero compression is done in the NTFS system on ZFS, because a small file in NTFS is padded with zeros to the size of the record and this state is saved on ZFS. This came out in my tests, because the more I increased the recordsize in ZFS and the allocation unit in NTFS, the more the compression level increased - on the same data.

ad 6. How to use arc and special vdev in practice to quickly search for a file by a fragment of its name?

I wouldn't do this NTFS if I had an alternative to wizfile or wiztree. These programs index large disks in a few seconds because they scan the $MFT table without any interference.

guenther-alka · 2024-12-13T00:46:14Z

guenther-alka
Dec 13, 2024

When you create a ZFS filesystem below a ZFS pool you can mount this filesystem individually. Special on Windows is the option to assign a driveletter either to the pool or to individual ZFS filesystems.

ZFS compression (and dedup) is not related to files but ZFS datablocks (in recsize) independent from data structures ontop ZFS. A larger recsize makes compress and dedup more efficient.

Arc is the rambased ZFS readcache for last/most accessed ZFS datablocks (not files). Unless no other application demands RAM, ZFS is using RAM up to a certain percentage ex 50% of RAM.

Special vdev is a method for hybrid pools (hd + NVMe). It must be a mirror as a vdev lost is a pool lost. It can massively improve small file performance ex up to 64K or 128K and file listings due faster access to metadata.

You can check if this is enough for performance, otherwise wizfile claims support for non ntfs filesystems.

1 reply

leonik3 Dec 13, 2024
Author

Yes, wizfile and wiztree supports all file systems, even over the network, but it then runs maybe 100-1000 times slower, because it scans directory by directory, not the $MFT table.

lundman · 2024-12-13T07:24:28Z

lundman
Dec 13, 2024
Maintainer

There was a race in the mounting, which you detailed - and I believe it is fixed. I have been waiting to hear back on that, and other issues before rc11.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NTFS on ZPOOL/ZFS with FAST_DEDUP #421

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

NTFS on ZPOOL/ZFS with FAST_DEDUP #421

leonik3 Dec 12, 2024

Replies: 3 comments · 2 replies

guenther-alka Dec 12, 2024

leonik3 Dec 13, 2024 Author

guenther-alka Dec 13, 2024

leonik3 Dec 13, 2024 Author

lundman Dec 13, 2024 Maintainer

leonik3
Dec 12, 2024

Replies: 3 comments 2 replies

guenther-alka
Dec 12, 2024

leonik3 Dec 13, 2024
Author

guenther-alka
Dec 13, 2024

leonik3 Dec 13, 2024
Author

lundman
Dec 13, 2024
Maintainer