Windows 32/64 binary, 64 bit-HW accelerated
Pre-releaseFaster overall
~5% in the average case, due to "smarter" computation of SHA-1 on fragments
New switch -dataset on zfs - unix
This will use zfs filesystem to automagically update files without filesystem scans
zpaqfranz a /tmp/test.zpaq * -dataset "tank/d"
Using point-in-time copy mechanisms (e.g., once every hour) requires scanning the entire filesystem.
zpaqfranz has long supported the ZFS backup feature, but it's at the block level, not at single file level
aka: you can very quickly backup "everything", but to restore "something" (a file) you have to ... restore everything, then get back the file you want
In the case of using large fileservers or with magnetic disks, i.e. on which the filesystem scan is slow, the issue becomes "painful", whatever software you use (tar, 7z, srep or whatever you want)
TRANSLATION
Suppose you have a mid-sized file server, with 1M files
Suppose your system can scan the folders at 500 files/sec (real-world performance for spinning drives), you need AT LEAST (~) 30 minutes (1M/(500*60)) just to enumerate everything
THEN "you" (whatever software you use) can start to "do things" (aka: deduplicate, compress, whatever)
With SSD real world speed is ~ 5K files/sec, with NVMes ~ 30K files/sec
=>
you cannot update the backup (in the example) every 10 minutes
But, with zpaqfranz on zfs, now you can
the -dataset automagically will make a temporary snapshot
On the next run will get changed files from the zfs filesystem, instead of scanning again from scratch
First run, nothing done
In this example the dataset is tank/d.
Datasets are (very crudely) parts of a "disk" (I'm actually obscuring the whole ZFS hierarchy), basically... a folder where you write the data (https://www.illumos.org/books/zfs-ad...r-1.html#ftyue)
root@aserver:/tmp/zp # zpaqfranz a prova2.zpaq * -dataset "tank/d" -verbose
zpaqfranz v58.12m-JIT-L(2023-12-02)
franz:-dataset <<tank/d>>
franz:-verbose
59901: zfs dataset tank/d
59839: dataset path |/tank/d/|
59840: topath |/tank/d/.zfs/snapshot/franco_diff/|
59856: Base snapshot tank/d@franco_base
59856: Temp snapshot tank/d@franco_diff
37720: running Destroy diff snapshot (if any)
38162: x_one zfs destroy tank/d@franco_diff
37720: running Taking diff snapshot
38162: x_one zfs snapshot tank/d@franco_diff
39147: running Getting diff
39149: x_one zfs diff -F tank/d@franco_base tank/d@franco_diff >/tmp/tempdiff.txt
59877: Load a zfsdiff 0 bytes long file <</tmp/tempdiff.txt>>
63108: zfsdiff lines 0
63119: + 0 - 0
59883: zfsdiff to add 0
59896: Nothing to do (from zfsdiff)
0.032 seconds (000:00:00) (with warnings)
Now create a newfile, somewhere in the dataset, and run again
with conventional "something" you have to enumerate all files, find the "touched" one, then "do something"
zpaqfranz will NOT enumerate all files, but take just the changed one(s), relying on the indication of the changes made by ZFS
In effect, it copies the data from the snapshot, therefore with certainty of consistency, even if it automagically changes its name (as if it were in the dataset, and not inside the snapshot). In short, it is transparent to the user
root@aserver:/tmp/zp # echo "test" >/tank/d/spaz/newfile
root@aserver:/tmp/zp # zpaqfranz a prova2.zpaq * -dataset "tank/d"
zpaqfranz v58.12m-JIT-L(2023-12-02)
franz:-dataset <<tank/d>>
59901: zfs dataset tank/d
59883: zfsdiff to add 1
Creating prova2.zpaq at offset 0 + 0
Add 2023-12-02 18:17:12 1 5 ( 5.00 B) 16T (0 dirs)
1 +added, 0 -removed.
0 + (5 -> 5 -> 840) = 840 @ 94.00 B/s
0.099 seconds (000:00:00) (all OK)
Now change again something, and run
root@aserver:/tmp/zp # echo "changed" >/tank/d/spaz/newfile
root@aserver:/tmp/zp # zpaqfranz a prova2.zpaq * -dataset "tank/d"
zpaqfranz v58.12m-JIT-L(2023-12-02)
franz:-dataset <<tank/d>>
59901: zfs dataset tank/d
could not find any snapshots to destroy; check snapshot names.
59883: zfsdiff to add 1
prova2.zpaq:
1 versions, 1 files, 840 bytes (840.00 B)
Updating prova2.zpaq at offset 840 + 0
Add 2023-12-02 18:17:55 1 8 ( 8.00 B) 16T (0 dirs)
1 +added, 0 -removed.
840 + (8 -> 8 -> 843) = 1.683 @ 195.00 B/s
0.086 seconds (000:00:00) (all OK)
In the archive the various version of the file(s) will be ready to a in-time file-level rollback
root@aserver:/tmp/zp # zpaqfranz l prova2.zpaq -all
zpaqfranz v58.12m-JIT-L(2023-12-02)
franz:-all 4
prova2.zpaq:
2 versions, 2 files, 1.683 bytes (1.64 KB)
- 2023-12-02 18:17:12 0 0001| +1 -0 -> 840
- 2023-12-02 18:17:08 5 0644 0001|/tank/d/spaz/newfile
- 2023-12-02 18:17:55 0 0002| +1 -0 -> 843
- 2023-12-02 18:17:48 8 0644 0002|/tank/d/spaz/newfile
48650: 13 (13.00 B) of 13 (13.00 B) in 4 files shown
48651: 1.683 compressed Ratio 129.462 <<prova2.zpaq>>
0.001 seconds (000:00:00) (all OK)
Obviously, the archiving time remains the same (if the changed files are very large, it will take the necessary time).
However, for fileservers used for e-mails, Word documents, etc., written by a few dozen users, the files are relatively small, and can be updated in a matter of seconds.
The real problem is to quickly locate what is the new file "foo.docx" written somewhere
Sure it's not a suitable method for giant virtual machine disks, but its goal is different
Default buffersize is now 1MB (was 4KB)
Time to update read-from-file for solid state World
New command redu
Quite complex command, developing of new "smarter" methods under the hood
zpaqfranz redu z:\*.exe