-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mergerfs.balance reverse filesystem walk #39
Comments
Accessing the sub-second file modification timestamps would be the best way to find the newest files, similar to what SnapRAID does. |
How about this: os.stat_float_times ( True ) dated_files = [(os.path.getmtime(fn), os.path.basename(FN) for fn in os.listdir(path)] |
I can look at doing that but it's somewhat niche. Under normal usage patterns newer doesn't mean l end of disk. Does it really make that much of a difference? I'd think fragmentation in general would be a bigger issue. |
For example, an 8TB Seagate Archive drive has a max read of almost 200 MB/s at the beginning of the disk and a minimum read speed around 80 MB/s at the end of the disk. If one drive is reading near the end while the others are reading from the beginning, that bottlenecks all the drives. I've been thinking further about this, it's probably most efficient to build up the filelists as a snapshot at the beginning of the process, and moving array entries to the corresponding disk they have been moved to (in a stack fashion, to maintain the newness ordering). It's niche, yes, but if you've been filling up all your disks evenly (mfs policy) and add a new blank drive to the array, then it makes sense from a SnapRAID performance perspective to make all the disks physically utilized the same. In a WORM hard drive usage pattern, you can generally expect newer files to be towards the end of the disk. |
I've several archive drives and I can't say I've noticed that behavior but I've not paid that much attention. That'll obviously increase the startup cost quite a bit but it's not difficult. I'll take a stab at it. |
Initially, I wanted to make a PR for the But there are a few major differences between the tool I wrote and mergerfs.balance:
Here is how you can create a fs db: pip install xklb
library fsadd --filesystem fs.db /mnt/d1/* &
library fsadd --filesystem fs.db /mnt/d2/* &
...
library fsadd --filesystem fs.db /mnt/d7/* &
wait I have five million files and this took about half an hour... After everything is done you can run In your case @qweasdzxc787 you might do something like this, which would check the balance between the most recently modified 10,000 files: library scatter -m /mnt/d1:/mnt/d2:/mnt/d3:/mnt/d4/:/mnt/d5:/mnt/d6:/mnt/d7 --sort 'time_modified desc' --group size --limit 10000 fs.db /
Current path distribution:
╒═════════╤══════════════╤══════════════╤═══════════════╤════════════════╤═════════════════╤════════════════╕
│ mount │ file_count │ total_size │ median_size │ time_created │ time_modified │ time_scanned │
╞═════════╪══════════════╪══════════════╪═══════════════╪════════════════╪═════════════════╪════════════════╡
│ /mnt/d1 │ 2806 │ 17.0 GB │ 1.3 MB │ Jan 27 │ Jan 27 │ Jan 31 │
├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
│ /mnt/d2 │ 2836 │ 8.6 GB │ 1.3 MB │ Jan 27 │ Jan 27 │ Jan 31 │
├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
│ /mnt/d3 │ 1049 │ 2.9 GB │ 287.7 kB │ Jan 29 │ Jan 27 │ Jan 31 │
├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
│ /mnt/d4 │ 2604 │ 20.1 GB │ 403.2 kB │ Jan 31 │ Jan 26 │ Jan 31 │
├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
│ /mnt/d5 │ 705 │ 112.4 GB │ 59.3 MB │ yesterday │ Jan 25 │ today │
╘═════════╧══════════════╧══════════════╧═══════════════╧════════════════╧═════════════════╧════════════════╛
Simulated path distribution:
494 files should be moved
9506 files should not be moved
╒═════════╤══════════════╤══════════════╤═══════════════╤════════════════╤═════════════════╤════════════════╕
│ mount │ file_count │ total_size │ median_size │ time_created │ time_modified │ time_scanned │
╞═════════╪══════════════╪══════════════╪═══════════════╪════════════════╪═════════════════╪════════════════╡
│ /mnt/d1 │ 2887 │ 30.8 GB │ 1.4 MB │ Jan 27 │ Jan 27 │ Jan 31 │
├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
│ /mnt/d2 │ 2919 │ 21.3 GB │ 1.3 MB │ Jan 27 │ Jan 27 │ Jan 31 │
├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
│ /mnt/d3 │ 1148 │ 25.5 GB │ 318.6 kB │ Jan 29 │ Jan 27 │ Jan 31 │
├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
│ /mnt/d4 │ 2670 │ 32.4 GB │ 411.7 kB │ Jan 31 │ Jan 26 │ Jan 31 │
├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
│ /mnt/d5 │ 211 │ 24.5 GB │ 525.2 kB │ today │ yesterday │ today │
├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
│ /mnt/d6 │ 69 │ 11.3 GB │ 111.7 MB │ yesterday │ Jan 25 │ today │
├─────────┼──────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼────────────────┤
│ /mnt/d7 │ 96 │ 15.2 GB │ 107.9 MB │ yesterday │ Jan 25 │ today │
╘═════════╧══════════════╧══════════════╧═══════════════╧════════════════╧═════════════════╧════════════════╛
######### Commands to run #########
### Move 96 files to /mnt/d7: ###
rsync -aE --xattrs --info=progress2 --no-inc-recursive --remove-source-files --files-from=/tmp/tmpmm8g95qw / /mnt/d7
### Move 69 files to /mnt/d6: ###
rsync -aE --xattrs --info=progress2 --no-inc-recursive --remove-source-files --files-from=/tmp/tmp16m8kj28 / /mnt/d6
### Move 99 files to /mnt/d3: ###
rsync -aE --xattrs --info=progress2 --no-inc-recursive --remove-source-files --files-from=/tmp/tmpsc49r7oo / /mnt/d3
### Move 81 files to /mnt/d1: ###
rsync -aE --xattrs --info=progress2 --no-inc-recursive --remove-source-files --files-from=/tmp/tmp96iks8ot / /mnt/d1
### Move 83 files to /mnt/d2: ###
rsync -aE --xattrs --info=progress2 --no-inc-recursive --remove-source-files --files-from=/tmp/tmpmdppltyw / /mnt/d2
### Move 66 files to /mnt/d4: ###
rsync -aE --xattrs --info=progress2 --no-inc-recursive --remove-source-files --files-from=/tmp/tmpdujeaue9 / /mnt/d4 Afterward running the library fsupdate fs.db |
Would be nice to have a command line switch to reverse walk the filesystem rather than the moving the first item found. I have rearranged all my files with rsync and the files are all at the beginning of each disk. If I run mergerfs.balance after I add an empty disk, it will create large empty spaces at the beginning of all the existing disks and creates a speed mismatch between the disks when performing a SnapRAID sync.
Basically, I would like mergerfs.balance to move the newest files first, leaving the oldest files in place and keeping all free space at the end of the disks so they are balanced performance wise.
The text was updated successfully, but these errors were encountered: