Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mergerfs.dedup - doesn't validate if duplicate is the same file (all copies might be deleted if symlinks are involved) #124

Open
fedy-cz opened this issue Apr 6, 2022 · 3 comments

Comments

@fedy-cz
Copy link

fedy-cz commented Apr 6, 2022

I have a mergerfs filesystem where some branches contain symlinks to a directory at the same path within a different branch.
Example:
/mnt/branch1/dir - symlink to /mnt/branch2/dir
/mnt/branch2/dir - a directory

The reasons for doing such a thing are historical (there are existing processes that work with certain branch paths directly, the data had to be moved to a different volume, ...). Maybe I'm using mergerfs horribly wrong, but didn't find any warning about this in the docs and so far it seems to work fine.

The issue:
It seems like when the mergerfs.dedup is called on the merged filesystem it incorrectly identifies the files within both paths as different files (different copies) and would attempt to delete them. That would result in 0 copies of the affected files.

Suggested solution:
There should be an option (or even better it might be the default), where:

  • Before file deletion mergerfs.dedup performs readlink on both paths of the duplicates and compares the results - if they are the same, the deletion shouldn't be performed.

  • Alternatively: It could check if both files are the same inode within the same volume and if so it should reject deletion when the refcount is only 1.

Basically the idea is that a tool made to delete duplicates should make especially sure that everything it deletes are truly duplicates.

@trapexit
Copy link
Owner

trapexit commented Apr 6, 2022

  • Maybe I'm using mergerfs horribly wrong, but didn't find any warning about this in the docs and so far it seems to work fine.

As the docs say if you're overlapped files are not in sync you will have some issues. It's on the user to take on the responsibility for such things.

These tools are just random things I purpose built for other people as template for their own tooling. If you have a non-standard setup where overlapping files are in fact not the same or even same type then it will have to be modified for that setup. This really isn't much different from situations where people "move" files from a mount into a bind mount of the same.

fedy-cz added a commit to fedy-cz/mergerfs-tools that referenced this issue Apr 7, 2022
…ies (including the version/branch we decided to keep) - described in the issue:

trapexit#124
- both methods (inode and realpath) are implemented
- the inode method skips the suggested refcount=1 check (it is extra strict)
@fedy-cz
Copy link
Author

fedy-cz commented Apr 7, 2022

Implemented the proposed additional checks in my fork:
https://github.com/fedy-cz/mergerfs-tools

mergerfs is awesome, thanks

@trapexit
Copy link
Owner

trapexit commented Apr 7, 2022

👍

If you think it is generally valuable and won't interfere with the general use cases then feel free to submit a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants