-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mac ZFS triggers duplicating finding programs to fail randomly #808
Comments
Do you actually mean it fails randomly, i.e. that you get different results every time you run a scan? Or do you mean you get the same results every time but you don't understand why you are getting false negatives? |
The second one. For example, in Gemini, I had it scan duplicate word, powerpoint and excel files, and it consistently would act as if the Excel file was not a duplicate. But the md5 says otherwise. In Duplicate File Finder, it was skipping over two jmp (JMP, a statistics package) files consistently. Once again, md5 is saying they're identical to their originals. So my guess is there is a problem with file iteration, not with the files themselves. But this is hard to explain, since obviously file iteration works in the Finder, and with rsync and FreeFileSync. |
I'd curious how it is implemented in these apps. They might be referring to hard links, which are hard to detect once you make them. macOS does have fcntls for "next" and "prev" links, which is something we had to attempt to implement. There's been no good way to test if they work. So there could be a bug in here. There is also that macOS has "file_id" (like posix) but also "link_id", and file_id should be the same between all duplicates, but each one should have a unique "link_id". Could be a bug in here. Then if you query /.vol/123456789/1234 - which is a secret way to lookup files by ID, or, link ID, it should return the correct name (if you use link ID). Could be a bug in here. Last two can be tested with any getattrlist program, like FSMegaTool. |
Last two seem OK
|
Any chance there are extended attributes or similar involved? Just guessing that just because the file contents are the same/have the same hash it doesn't mean the files are considered identical by these tools. |
These are all regular files. I do believe they're essentially using hashes though Gemini says they have a "proprietary" algorithm. Duplicate File Finder has an option for a "slower" check, and I suspect here it's computing hashes. It's more likely files are being skipped during iteration. For example, when I tested Duplicate File Finder, I have two parent folders I'm checking. There is a subfolder in each, which contains a file. When I test the subfolder, it declared this file as a duplicate, but when I told it to scan the parent folder, it didn't. So it sounds like something is throwing off the iteration and it just didn't see this file when starting with the parent folder. |
It just seems weird because there are many other pieces of software, from Finder to command line tools, that have no trouble listing the contents of a directory. Why are these two duplicate-finders the exception? |
I agree. The finder works. Rsync works, FreeFileSync works. The file navigation dialogs works. But I had a Mac VM with two ZFS disks and Duplicate File Finder was missing a whole slew of file duplicates. When I reformatted as APFS, the program worked great. The way forward is for your guys to test them out yourselves and assure that my machine isn't possessed or that I am crazy. |
I discovered that Gemini fails sporadically on APFS, so right now it's only Duplicate File Finder that fails on Mac zfs. |
There is a serious bug in Mac ZFS wherein programs that identify files as duplicates often fail randomly to detect duplicates.
Two such programs are Gemini and Duplicate File Finder.
These programs work perfectly on APFS disks, but on Mac ZFS, they randomly generate false negatives. There is no rhyme or reason as to what duplicated file gets falsely detected not to be a duplicate of its corresponding twin. The md5 hash of the files in question is always the same, so Mac ZFS is not altering the contents of any file. It seems more likely a problem with the iteration of the file/folder directory tree where ZFS is somehow not returning every present file. It's not clear whether this is occurring on either the source or destination directories or both.
This bug is present in 2.16 (Monterey Intel) and at least some earlier builds.
The text was updated successfully, but these errors were encountered: