-
-
Notifications
You must be signed in to change notification settings - Fork 702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
better handle hard link.s #1267
Comments
hard link has been ignorded on linux and macos. For the reason that windows is not unix like os, there's no implementation for windows. |
I found a crate file_id implemented the file id getter for Windows. I’m learning Rust and trying to add the function. But this will take some time (and may never be done) |
I don't see hardlinks (HLs) being ignored and the way czkawa converts duplicates to HLs is not acceptable for my use. I considered using the czkawa-gui hardlink command, but I was unable to find anywhere I could specify which of the pair of files was to be the target (i.e., I want the 2023 files to be the target, the copy I'm keeping, the 2020 files will be replaced with HLs to the 2023 versions. "Target" is used the way that the man page uses it.) In the end I used
When I then went back to look for duplicates in czawka under name, they were all still there. Thinking that it was a cache problem, I compiled it on another machine and then looked for duplicates with that machine over NFS and they're still there so... 1.) allow for the user to specify which direction the HLs will be made, i.e., I want to be able to specify the TARGET. I can't say that 100% of the duplicates are HLs—some of them did fail—but as I go back and look through the log, most of them show a 2 for the HL number.
|
So after running my HL script, 10% of the files have not been hard linked, either because they don't have a pair in the new data or there is some other problem. 10% of 4.4 GB is 400 MB and I'm not tracing that down...
* This is a problem because the hash never showed duplicates and now the name view is showing files, 90% of which are hardlinked to each other. I did a bit more investigating and of the remaining ca. 220 files that are not hardlinked via the script, it looks like about 189 of them are probably duplicates and vary by e.g., punctuation:
This can probably be scripted and 200-300 hits aren't that many that I can individually approve the HL replacement of the 2020 archive copy, but ... tl;dr: well, isn't this supposed to be what czkawa is doing? I mean I identified the directory for it to search, I let it search, it found duplicates I knew where there, then I couldn't HL them the way I wanted, I wrote a de novo script to link 90% of them, and I still can't turn up the duplicates I that exist with czkawa...are you interested in looking at this to see what could be done to address this use case? |
Windows does support hard links through the https://learn.microsoft.com/en-us/windows/desktop/api/WinBase/nf-winbase-createhardlinka function. Whether hard links are supported is more a filesystem aspect than a host os aspect. |
The OP did not mention Windows and I did not take this thread to be about HLs on that proprietary operating system.
Ubuntu 22.04/btrfs |
The purpose of the duplicate file finder is to find identical files (not similar files). So at this time, there is no problem with the direction of hard links. All links share the same file (even the so-called target). |
Hard links are currently reported as duplicate.
It would be nice if they could be ignored.
I'm not sure how all files are processed, but if a file is a hard link to another it is not useful to calculate a hash.
Also it would be nice to have an option to process eliminate duplicates by replacing all but one with a hard link to the remaining one.
(I have a large photo archive where the photos are ordered by date, but some of them are also in topic specific folders; I would like to physically only have one copy but still see the photo in multiple folders)
The text was updated successfully, but these errors were encountered: