Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC/WIP: moving hard links along with file #85

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kinghrothgar
Copy link
Contributor

@kinghrothgar kinghrothgar commented Jun 5, 2019

Explination

I wanted to open this PR to see if you would be willing to merge one that added support for moving hard links of a file along with a file via an option. If you are interested, I will work this PR into a fully featured one. I heavily use hard links for various file sorting and organization systems I have. As such, without this modification I can't use mergerfs.balance as it duplicates the files and breaks the links.

Proposed solution:

If you enable the links option, it will use gnu find to locate hard linked files using the -samefile argument. You have already enabled rsync hard link handling with the -H option, so all I had to do was pass the other hard linked files along with the "original" as src files in the rsync command and rsync handles the rest.

Questions/Complications:

  1. Should the found hard links be subject to the various include and exclude options? I personally don't want the links filtered, but I could see a user expecting or wanting this. As such, maybe this should be configurable via an option?
  2. If we do filter the hard links, what do we do when a found hard link is excluded? Simply excluding it would result in duplicating the file. This doesn't seem like the behavior a user who has already told the program to preserve hard links would want. The two options I think are to either exit with a message or modify find_a_file() to support skipping seen files.

@trapexit
Copy link
Owner

trapexit commented Jun 6, 2019

The problem is that that solution is really expensive since you'll end up scanning the filesystem N times rather than 1. Preferably you'd scan the filesystem once at the beginning and then add to the source.

@kinghrothgar
Copy link
Contributor Author

kinghrothgar commented Jun 6, 2019

Are you suggesting I load the full filesystem file list with inode info for each srcmount at program start? The downside of that is the race conditions presented by the fact a run of the program can take many hours so there is a long time for the FS to change and diverge from the loaded state. This will lead one of two things:

  1. The program will exit because of a failed rsync caused by a link no longer existing
  2. A link will be missed and the file will be duplicated

I can do it that way if that is what you prefer.

@trapexit
Copy link
Owner

trapexit commented Jun 6, 2019

There will always be a race condition. It's a matter of degrees. Not kind. Your current method is no different. Scanning the drive will take time. The first file it sees could be a link of the file to move. So could the last file seen. It could take an hour to scan the drive if you had a ton of files. The first file could be removed before you even find the last one.

If you do as you do now you still have a race condition and you will ultimately have N * M file scans. N = number of files on the drive and M = number of files being moved. That's a hell of a lot more expensive than 1 * M.

So yes. I'm fine with this feature so long as it's optional (given the overhead) but it shouldn't have an O(N*M) runtime.

@kinghrothgar
Copy link
Contributor Author

Alright, I'll load the file list beforehand.

Do you care how I handle either of the two Questions in the original comment?

@trapexit
Copy link
Owner

trapexit commented Jun 6, 2019

If you can see either way being valid make it optional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants