-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move to ZFS volume creates correctly sized file filled with \0 #3125
Comments
Note: source and destination are not on the same filesystem, so an actual copy/unlink takes place. |
Works for me. Is this some old zol version, perhaps? $ dpkg -l zfs-dkms spl-dkms Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-====================================-=======================-=======================-============================================================================== ii spl-dkms 0.6.3-16~a3c1eb~wheezy all Solaris Porting Layer kernel modules for Linux ii zfs-dkms 0.6.3-26~2d9d57~wheezy all Native ZFS filesystem kernel modules for Linux $ ./test.sh + df -h . ../d2 Filesystem Size Used Avail Use% Mounted on rpool/test/d1 47G 0 47G 0% /rpool/test/d1 rpool/test/d2 47G 0 47G 0% /rpool/test/d2 + dd if=/dev/urandom of=a.txt bs=32 count=1 1+0 records in 1+0 records out 32 bytes (32 B) copied, 0.000312707 s, 102 kB/s + cp a.txt b.txt + sha1sum a.txt b.txt 2ad172f4e6de75e974df4b00bd97ddeb5e1135e0 a.txt 2ad172f4e6de75e974df4b00bd97ddeb5e1135e0 b.txt + cp b.txt ../d2 + sha1sum a.txt b.txt ../d2/b.txt 2ad172f4e6de75e974df4b00bd97ddeb5e1135e0 a.txt 2ad172f4e6de75e974df4b00bd97ddeb5e1135e0 b.txt 2ad172f4e6de75e974df4b00bd97ddeb5e1135e0 ../d2/b.txt + rm ../d2/b.txt + mv b.txt ../d2/b.txt + sha1sum a.txt ../d2/b.txt 2ad172f4e6de75e974df4b00bd97ddeb5e1135e0 a.txt 2ad172f4e6de75e974df4b00bd97ddeb5e1135e0 ../d2/b.txt + hexdump -C ../d2/b.txt 00000000 f7 e3 a6 5c 86 e9 40 b3 58 1f 0c cd a3 0c 15 07 |...\[email protected].......| 00000010 01 1b c6 3c 6c 6c 55 1a f1 59 ea 4e c5 21 f7 09 |... |
This happens only on one of the datasets on this pool, others are fine. On this single one it's reproducible, though. Versions (forgot about those): |
@Lalufu that's very troubling. Is the problem with the new file persistent? That is if you export/import the pool does the empty file now have good contents or is it still empty. Can you reliably reproduce this for that dataset? |
Is a reboot equivalent to an export/import for this purpose? That would be easier to test than exporting the pool in a running system. On this particular dataset the issue is reliably reproducible. Looking at an strace of a (working) cp and a (not working) mv the main difference seems to be that mv (after copying the file over to the destinanation) does the following syscalls: utimensat() (on the destination) |
@Lalufu your strace output plus the Also, if you have used older versions of ZoL on this pool, they were likely missing other SA fixes which opens the possibility the filesystem and/or the pool is permanently corrupted. It might be worth trying current master code or cherry-picking a62d1b0 into 0.6.3-1.2 (it applies cleanly) and see what happens. It would also be a good idea to run |
Yes, I thought about xattr as well. Unfortunately most of the preconditions you mentioned are there, the system is running selinux, the pool has been created on 0.6.2 (as far as I can remember, is there an easy way to check this?). I'll try the zdb call later to see what gives, and try to build a zfs version with the check. |
zdb -b seems to be happy:
Still need to cherry pick the patch and see if the zero filled file is still empty after a reboot. |
The file is still empty after a clean reboot. |
I've cherry-picked a62d1b0, and it does not make a difference (I've only rebuilt the kernel module, not user space with the patch). I've uploaded the strace of a cp and a mv of the same file to the same destination, for comparison. http://tara.camperquake.de/temp/cp.strace Those links will expire after ca. 14 days. |
Hi, @Lalufu |
I've reuploaded the files.
|
For the record:
I've upgraded to 0.6.4.1 recently, and the problem has disappeared.
|
I can reproduce this problem on 0.7.1. I use a gentoo derivative, and the packages are compiled on a different ZFS filesystem (mounted at /var/tmp) than the root. When I install packages, there is a random chance that some of the files will be zeroed. It seems bursty -- I can install a package a few times with it corrupting each time, and then spend a while unable to reproduce it. When it does corrupt, that too seems bursty, insofar that often it'll be just a bunch of man pages corrupted, or just a bunch of shared libraries, not a random selection of files. |
I have a working theory that I'm testing. Can you set /var/tmp/portage to edit: here's a little test script a banged out to find these corrupted files.
|
I set it to sync=always and set it to recompile nss in a loop, checking for the presence of corrupted files after each iteration, and went to bed. It ran for... probably a bit over 12 hours, without detecting any corruption. That said, with how dramatically it was slowed, it's unclear how many times it actually completed. Still, I expect this is evidence that it works.
This is slower than yours, but is more precise, it will only detect files that are fully zeroed. when doing broader testing using |
Hmm, at least I'm not the only one who see this. Question now is, what causes these files to get zero filled? @behlendorf @tuxoko any ideas? |
@klkblake Since you seem to be able to reproduce this problem, could you perform a |
Currently running kernel version 4.12.13.
|
@klkblake Somewhat at expected, the file is completely sparse. Have you got any of the following enabled on the dataset: compression, dedup, non-standard checksumming? Are the corrupted files always ".so.debug" files? |
Compression is on, deduplication is off, checksum is set to sha512. They aren't always |
@klkblake The reason I asked about those specific properties is because they open other avenues (than traditional lseek and hole-punching) for creating sparse files or otherwise affecting the write path (nop-write, zero suppression, etc.). Could you please try the same test with |
FWIW I still use default checksumming. I've only ever seen this on libraries and man pages. |
Let's decisively rule out NOP writes as a possible cause by disabling them entirely. This can be done by setting the module option |
@bunder2015 |
On Sat, Oct 21, 2017 at 08:16:47AM -0700, hunbalazs wrote:
After reading this bug report I dug in to portage code and found these:
https://github.com/gentoo/portage/blob/83b44fc48c07a4d8f0b6cb361f61b1493965fcb2/pym/portage/util/file_copy/__init__.py
https://github.com/gentoo/portage/blob/83b44fc48c07a4d8f0b6cb361f61b1493965fcb2/src/portage_util_file_copy_reflink_linux.c
I think this issue only occurs on Gentoo because the way portage copies files from /var/tmp/portage to /, it is using in-kernel reflink_linux copy operations.
This would be very easy to check by just patching portage to always use
shutil.copyfile and run it.
|
I created a branch from 0.7.3 with 66aca24 reverted at https://github.com/janlam7/zfs/tree/issue/3125 On my machine it prevents the issue from occuring. |
@ryao has asked me to file a gentoo bug on this issue, https://bugs.gentoo.org/635002 |
@Lalufu indeed
I'll try that. I tried to reproduce by using portage's _optimized_copyfile over and over but my short stress test didn't give any result. I'll try to continue testing overnight. |
News from the gentoo side...
Since I can't reproduce this very easily, can someone test this portage patch and report any changes? https://bugs.gentoo.org/635002#c6 |
On my system the patch in https://bugs.gentoo.org/635002#c6 does not prevent this issue from occurring. |
Using shutil.copyfile ( @Lalufu ) and https://bugs.gentoo.org/635002#c6 ( @bunder2015 ) did not help. Mounting /var/tmp/portage as tmpfs fixes it. |
With kernel 4.1.43-gentoo-r1 and zfs v0.7.2-r0-gentoo I can reproduce it. |
The gentoo folks have pushed out a revision to the patch I linked yesterday, same URL. |
@bunder2015 I'm still able to reproduce this with version 5 of said patch and sync=disabled |
Would revert the dirty check fix the issue? diff --git a/module/zfs/dmu.c b/module/zfs/dmu.c
index b3cf10c..6569131 100644
--- a/module/zfs/dmu.c
+++ b/module/zfs/dmu.c
@@ -2054,12 +2054,10 @@ dmu_offset_next(objset_t *os, uint64_t object, boolean_t hole, uint64_t *off)
/*
* Check if dnode is dirty
*/
- if (dn->dn_dirtyctx != DN_UNDIRTIED) {
- for (i = 0; i < TXG_SIZE; i++) {
- if (!list_is_empty(&dn->dn_dirty_records[i])) {
- clean = B_FALSE;
- break;
- }
+ for (i = 0; i < TXG_SIZE; i++) {
+ if (list_link_active(&dn->dn_dirty_link[i])) {
+ clean = B_FALSE;
+ break;
}
}
|
Sorry for not keeping an eye on this during the week... Got word back from gentoo again yesterday after their last round of portage patches.
Has anyone had a chance to test @tuxoko's patch? |
@tuxoko 's patch works for me. |
@tuxoko good thought, that would explain things. Better to solely check the dnode for dirty status. It would be great if we could get a few other users to verify the fix and a PR opened. |
@tuxoko With tuxoko's patch I haven't been able to reproduce this bug so far either. Thanks! |
@behlendorf |
@tuxoko we should use the one you proposed, which is what was here prior to 66aca24 for SEEK_HOLE. This was a case of over optimizing the dirty check which slipped through since it's almost always right. As for mmap we should be OK here since |
@behlendorf |
@tuxoko my understanding from the |
The correct way to determine if a dnode is dirty is to check if any of the dn->dn_dirty_link's are active. Relying solely on the dn->dn_dirtyctx can result in the dnode being mistakenly reported as clean. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#3125
The correct way to determine if a dnode is dirty is to check if any of the dn->dn_dirty_link's are active. Relying solely on the dn->dn_dirtyctx can result in the dnode being mistakenly reported as clean. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#3125 Closes openzfs#6867 Requires-spl: refs/pull/669/head
The correct way to determine if a dnode is dirty is to check if any of the dn->dn_dirty_link's are active. Relying solely on the dn->dn_dirtyctx can result in the dnode being mistakenly reported as clean. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3125 Closes #6867
The correct way to determine if a dnode is dirty is to check if any of the dn->dn_dirty_link's are active. Relying solely on the dn->dn_dirtyctx can result in the dnode being mistakenly reported as clean. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#3125 Closes openzfs#6867
The correct way to determine if a dnode is dirty is to check if any of the dn->dn_dirty_link's are active. Relying solely on the dn->dn_dirtyctx can result in the dnode being mistakenly reported as clean. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#3125 Closes openzfs#6867
Test case:
This seems to only happen on a single dataset so far.
Tracing through mv with gdb suggests this is somehow timing related, as stepping through
seems to prevent the effect from happening.
The text was updated successfully, but these errors were encountered: