-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Silently corrupted file in snapshots after send/receive #4809
Comments
Could this be the same bug as #4050? IIRC 6.5.3 was affected and you seem to have the Anyway all these "ZFS send/recv corrupts data" bug reports are really scary, considering the fact it's a ZFS feature that is primarily used for replica/backups. |
Thanks for the suggestion @loli10K, it would not occur to me to check that issue.
I am not sure, but to trigger this same issue on the "nas1FS" don`t I need to modify the file? |
I have no objections to a complete upgrade to 0.6.5.7. I was able to crash the system when doing an md5sum of the problematic file on a snapshot when it did not exist on source pool ("nas1FS/backup@20151121_1"), so I thought that there can also be some kind of problem with the source pool. |
You should probably try to replicate the lock/crash on 0.6.5.7 and/or master and then report that as a distinct problem, if it persists. Presuming this is bug #4050, the bug only manifests in the stream when you transmit via incremental send, so the source pool's data is not affected, and upgrading your ZoL version should be safe. |
Thanks, I will upgrade to 0.6.5.7 and test it tonight or tomorrow. |
I have upgraded to v0.6.5.7 and problem starts to be more interesting.
When I have tried to transfer (on v0.6.5.7) a test fs ("nas1FS/test_fs") created with same commands on v0.6.5.3 I got a checksum mismatch:
How should I proceed? |
what are the S.M.A.R.T. stats for those drives ? anything unusual ? can you post those somewhere ?
Is there a possibility you can exchange
and/or cables for a different drive ?
LSI 2116 controller for 16x SATA3 / are any issues known with those and ZFS ? firmware or driver updates ? that's at least what comes to mind right now ... |
all disks seem to be ok. nothing unusual about them from smartctl. nas1FS drives are very similar in smartctl :
All disks are connected to the same controller (LSI) on A1SA7-2750F, the motherboard is as received (no FW updates). I have repeated the tests with the pool backed by a ramdisk file, so no bad cables or controllers and I can still trigger the bug with script from #4809 (comment) I have retested also on a laptop: I have retested on another (much older) computer : |
@pwolny I'm half-asleep so I might not be reading this right, but this looks like the same bug, but with a somewhat different behavior wrt. size and delay: https://gist.github.com/lnicola/1e069f2abaee1dbaaeafc05437b0777a |
@lnicola It seems that your script output exhibits the same checksum mismatch footprint as mine. What kernel / zfs version were you using? |
Sorry for not mentioning those. I'm on 4.6.3-1-ARCH with 0.6.5.7_4.6.3_1-1, i.e. the latest versions. |
That's pretty concerning, I'm mostly backing up my files only with zfs send nowadays, too & also running 4.6.3 - ZFS is on top of cryptsetup for me, @pwolny are you using any encryption or partition "alteration" mechanism (cryptsetup, lvm, etc.) ? @pwolny @lnicola are your pools running with the latest ZFS feature set ? what does the zpool upgrade command say ? (I can't currently provide sample output of my pool since I'm in Windows, haven't upgraded it in a while so as far as a I know 2 are still missing on my /home pool) edit: I'm currently occupied otherwise but will see if I can reproduce it, too, here later in the day |
@kernelOfTruth yes, I do have |
@pwolny you're relying to show that issue with the samsung drive and gzip-9 compression,
does this also happen when you explicitly set lz4 compression ? or one of the other types? |
referencing: #4530 Data Corruption During ZFS send/receieve @pwolny please take a look at comment: #4530 (comment) especially the part related to destroying the snapshot |
About the #4530 (comment):
|
@pwolny What if you replace the use of |
I have retested multiple versions of spl/zfs (built from source) on a virtual machine running Knoppix 7.4.2 with kernel During testing I have seen three possible footprints:
To summarize tested versions (different software and hardware): So it seems that the issue (or possibly two) was introduced just before v0.6.4 and was possibly aggravated in v0.6.5. |
@lnicola If I replace: |
What about the other |
I have tested few modifications of the testing script.
It seems that only second truncate/dd command has any impact on triggering of the issue. |
Maybe someone will find this version of the script useful: It will try to create a "test_matrix_output.txt" file in "/tmp/zfs" directory (script does not create this directory). Example "test_matrix_output.txt" with errors will look like this (output with spl/zfs v0.6.5.7)
|
Hi, pwolny, in the interests of clarity, can the bug be reproduced by running one version of one script with reasonable frequency, and if so, could you please publish the shortest version of such script ? Ideally, the script would include creation and cleanup of a pool based on file vdevs. I am trying to abstract away the site specifics. To my knowledge, the original issues with partly filled holes and reused dnodes (Illumos 6370, Illumos 6513) can be reproduced on such pools. Best regards, Boris. From: pwolny [email protected] Maybe someone will find this version of the script useful: It will try to create a "test_matrix_output.txt" file in "/tmp/zfs" directory (script does not create this directory). Example "test_matrix_output.txt" with errors will look like this (output with spl/zfs v0.6.5.7) EEEEEEEEEEE You are receiving this because you were mentioned. |
Hello @bprotopopov, Does it not trigger on your system? Are there any "E" letters in the "test_matrix_output.txt" file in "/tmp/zfs" directory? If you need to minimize test time then just set: Anyway the latest script version sets up a pool backed by a file and removes it afterwards, in fact it does that in total about 55 times in the testing loop. Best regards |
I have done some regression testing and it seems that the error was introduced in this commit (zfs-0.6.3-27):
It generates a following footprint:
While a previous commit (zfs-0.6.3-26)
Both were tested with spl-0.6.3-10, commit :
Could someone take a look at those code changes? Unfortunately that is as far as I can go with debugging this. |
@bprotopopov L2 metadata of p1/hb/large_file,
and L2 metadata of p1/backup/large_file,
|
We have a zpool which I think is affected by this issue and I'm trying to work out the correct commits to build to resolve this. I am working on top of the zfs-0.6.5.7 tag: Illumos 6844 - dnode_next_offset can detect fictional holes - 463a8cf root# cat /sys/module/zfs/parameters/ignore_hole_birth Originally we saw this issue because the incremental received stream showed the content diverging between the zvol and the file. Doing some testing now though it seems that the file gets corrupted as soon as we take the snapshot in the source pool. The WriteDiffs.py tool is something we developed to only write changed blocks in a file so we got cow benefits. root# WriteDiffs.py -c -s /dev/zvol/ztank/reference/xvda2@ref4000 /ztank/reference/config.ext4
root# sha256sum /dev/zvol/ztank/reference/xvda2@ref4000 /ztank/reference/config.ext4 root# zfs snapshot -r hapseed/reference/filesystems/imageslv@james When we send/receive the snapshot both the @ref4000 snapshot and the file in the filesystem are corrupted to the snapshot checksum root@other-host# sha256sum /dev/zvol/ztank/reference/xvda2@ref4000 /ztank/reference/config.ext4 /ztank/reference/.zfs/snapshot/ref4000/config.ext4 |
@JKDingwall Fascinating. Unless I'm misrecalling how this works, you shouldn't be able to see this bug without using differential send-recv or parsing ZDB output, because the only thing in ZFS that cares about the incorrect data is the calculation of whether to send holes for differential send (and any data corruption only "happens" on the receiver of the snapshot, because it didn't send some holes). So, if you're seeing corruption happening before you do any send|recv, I don't think this is the bug in question? I guess it depends on what WriteDiffs.py is doing. |
This zpool is on a master system for our CI system which produces an incremental patch stream each night for that build against our previous release. We started seeing problems in the incremental stream on receive last week and because of how we update the config.ext4 image assumed it was related to this bug. It was only today I looked at the checksums on the master zpool and saw that the file is corrupted in the snapshot before sending. What would the interesting tests or details about the system be to confirm if this is a different issue? |
@JKDingwall The biggest thing is that we need to know how WriteDiffs.py works. This bug is in a very specific set of code, and if you're encountering this issue by running userland scripts, there's only a couple ways that could be done. |
referencing openzfs/openzfs#173 7176 Yet another hole birth issue [OpenZFS] #4950 OpenZFS 7176 Yet another hole birth issue [PR4950] |
@kernelOfTruth, I fixed the build error of your pull request, but I can still reproduce with my script. |
@pcd1193182 I have created a gist with the WriteDiffs.py code: https://gist.github.com/JKDingwall/7f7200a99f52fd3c61b734fbde31edee/49b86a26cb3b9cfd1819d7958e7c8808f6fd991e On our system after a reboot the checksum of the file in the filesystem also corrupted to match the snapshot. I believe however that there was a bug present in the code where the mmap for the output file was not flushed before it was closed. The HEAD of the gist contains the latest version which does this and so far we have not seen the problem again. It would be useful to know if you think that this could have been the cause. If so we have been lucky to have not been caught out before. |
@JKDingwall so you are not longer able to reproduce the problem?. Can you confirm if the patch "Want tunable to ignore hole_birth - 31b8601" with the tunnable set to 1 was the one that finally stopped the issue from happening? |
@clopez - I can still reproduce the problem. Patches as given above, ignore_hole_birth=1. Last night I got: I think if I reboot I will see the file corrupt to the snapshot checksum... Could it be a cache consistency problem with holes? |
@JKDingwall If you modify the pool on your master system using zfs send/recv (I mean if you receive into that master pool), then this might be an issue with missed holes. If you do not receive into the master pool and still see some corruption, then it is a different issue. There are ways of examining the metadata (block pointers) of the suspect files using zdb, as discussed in this thread, that let you see if there were holes missed. But the pre-requisite for this type of bugs to occur is that the dataset is modified with zfs receive. |
@bprotopopov - there is no zfs receive involved so I will try to gather further information with a reproducible test case and then open a new issue. |
I do not know if it belongs here. I have the script used to test from above. And find the result very strange. #!/bin/sh zfs destroy $ZFSPOOL/backup -r sleep 5 # If sleep >= 5 seconds is the checksum differently. I do not know why 5 seconds maybe there needs to be more on other systems.dd if=/dev/urandom of=/$ZFSPOOL/hb/large_file bs=4k count=10 seek=1408 conv=notrunc I have called it on two different systems, more than 50 times. Sleep <5 seconds everything OK> = 5 seconds error. System 1 Debian 8 ZFS 0.6.5.7-8-jessie |
@datiscum Unless this is going to become an overarching bug for all hole_birth bugs, this isn't a case of the bug in illumos 6513, but instead of illumos 7176, which has an open PR for ZoL of #4981. So in short, I think your problem isn't the problem being discussed in this bug, but is a known issue. |
@JKDingwall Presuming your issue hasn't been magically fixed in the interim, could you open a new issue about it, since I don't think your bug is the same bug this thread is about? |
Closing. The whole birth optimization has been disabled by default in the 0.6.5.8 release, 3a8e136. Work to resolve the various underlying issues is on going in master. |
@behlendorf can we also disable it for 0.7.0 and master ? even though following the discussion, I just noticed that I had forgotten to disable that setting at module load time, I didn't use zfs send for the last few weeks but intend to do so again in the near future, for e.g. Gentoo or other bleeding edge users who aren't aware of details this could cause some headaches if there are still issues dormant in the code Thanks |
Yes, that's a good idea. We'll definitely disable this in the final 0.7.0 when tagging if the various issues aren't all wrapped up by then. Disabling it in master also makes sense now we've started putting out release candidates for wider testing. Opened PR #5099 to change the default. |
Enable ignore_hole_birth by default until all known hole birth bugs have been resolved and relevant test cases added. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#4809
Can someone watching this bug please review patch in #5099 which changes the default value. |
Enable ignore_hole_birth by default until all known hole birth bugs have been resolved and relevant test cases added. Reviewed-by: Boris Protopopov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4809 Closes #5099
Enable ignore_hole_birth by default until all known hole birth bugs have been resolved and relevant test cases added. Reviewed-by: Boris Protopopov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#4809 Closes openzfs#5099
Enable ignore_hole_birth by default until all known hole birth bugs have been resolved and relevant test cases added. Reviewed-by: Boris Protopopov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#4809 Closes openzfs#5099
I have experienced a silently corrupted snapshot by send/receive – a checksum of a single file on source and target do not match on the filesystem and on the snapshots.
Repeated scrubs of the tested pools do not show any errors.
Trying to replicate the source filesystem results in a modification of single file on the target pool, in the received filesystem and all snapshots.
Source pool is a 6x1TB RAIDZ2 array on Debian 8.2.0 (Jessie, kernel 3.16.0-4-amd64, installed from DVDs, no additional updates) with version 0.6.5.3 of ZFS/SPL built from source (standard configure and no patches).
My source pool (“nas1FS’) was created on a non-ECC RAM machine and after filling up with data through a samba share (standard samba server, on the source fs sharesmb=off) was moved to a different computer with ECC RAM (I have used the same operating system on both machines).
I can understand non-ECC RAM in the first computer causing permanent corruption of data that is visible on both source and target pool, but in this case the data is only silently changed during the transfer on a new computer with ECC RAM and source pool data seems to be fine.
This corruption of data during send/receive is repeatable.
To better explain what I have done:
First I have created a 6x1TB raidz2 pool on my old computer ("nas1FS").
After filling this pool with data I have moved the array from old computer to a new one and I have tried to backup the data on “nas1FS” pool to a different pool (“backup_raidz“).
“nas1FS” pool contained following snapshots that are of interest in this issue:
I have created a “backup_raidz” pool for backup (with compression turned on):
# zpool create -o ashift=12 -O compression=gzip-9 backup_raidz raidz1 /dev/disk/by-id/ata-SAMSUNG_HD103UJ_SerialNo1 /dev/disk/by-id/ata-SAMSUNG_HD103UJ_SerialNo2 /dev/disk/by-id/ata-HGST_HTS721010A9E630_SerialNo3 -f
Afterwards I have tried to replicate “nas1FS” pool.
# zfs send -R -vvvv "nas1FS@20160618" |zfs receive -vvvv -F "backup_raidz/nas1FS_bak"
this command finished successfully without any error.
I have executed following commands to get a list of file checksums on both source and target:
and compared the resulting files
I have found a checksum mismatch on a single file:
Correct checksum is “3178c03d3205ac148372a71d75a835ec”, it was verified on the source used to populate the “nas1FS” filesystem.
This checksum mismatch was propagated through all snapshots in which the file was present on target pool:
Source pool has shown correct checksum on all snapshot that the offending file was accessible
Trying to access this file on a snapshot when it did not exist (“backup@20151121_1”) results in a “No such file or directory” on target pool (“backup_raidz” or “backup_raidz_test”).
When I have tried to access offending file on “nas1FS” with a command:
# md5sum /nas1FS/backup/.zfs/snapshot/20151121_1/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
it resulted in a very hard system lockup, I could not get a system reaction on “Ctrl-Alt-SysRq-h” and similar key combinations, any IO to disks stopped completely and immediately, system stopped responding to ping and only a hard reset achieved any reaction out of the system.
After hard reset everything was working, above mentioned file checksum results were unchanged.
I have also tried a send/recieve to a different target pool (a single 1TB HGST disk):
# zfs send -R -vvvv "nas1FS/backup@20160618" |zfs receive -vvvv -F "backup_raidz_test/nas1FS_bak"
resulted with same md5sum mismatches.
When sending only the latest snapshot with:
# zfs send -vvvv "nas1FS/backup@20160618" |zfs receive -vvvv -F "backup_raidz_test/backup"
I get a correct md5sum on the target filesystem.
When trying to do an incremental send receive from the first available snapshot on source pool:
# zfs send -vvvv "nas1FS/backup@20151121_1" |zfs receive -vvvv -F "backup_raidz_test/backup"
Offending file not present on target and source pool and trying to access it on target pool does not cause any issues.
# zfs send -vvvv -I "nas1FS/backup@20151121_1" "nas1FS/backup@20151124" |zfs receive -vvvv -F "backup_raidz_test/backup"
I get again a checksum mismatch.
When trying to do an incremental send receive from the second available snapshot on source pool I get correct checksums on both snapshots on target pool ....
It is interesting to note that only a single block of 4096 bytes of data is corrupted at the end of the file (that has a size of 321 x 4096 bytes ) and only when transferring data with the first source snapshot ("nas1FS/backup@20151121_1") .
Binary comparison of the offending file:
I have also run zdb on the source pool, commands I did check did not find any errors:
To summarize used system configurations:
System used for “nas1FS” data fillup (old computer):
Motherboard MSI X58 Pro (MSI MS-7522/MSI X58 Gold) with Intel Quad Core i7 965 3.2GHz and 14GB non-ECC RAM (MB, CPU and RAM and PS are about 6 years old).
“/boot” : INTEL SSDMCEAW080A4 80GB SSD
“nas1FS” pool: 6x1TB HGST Travelstar 7K1000 in a RAIDZ2 array (HDDs are ~6months old).
New system to which “nas1FS” was moved (all disks):
Motherboard Supermicro A1SA7-2750F (8 core Intel Atom) with 32GB ECC RAM (MB and RAM and PS are new).
“/boot” : INTEL SSDMCEAW080A4 80GB SSD
“nas1FS” pool: 6x1TB HGST Travelstar 7K1000 in a RAIDZ2 array (moved from the old computer).
“backup_raidz” pool: 2x1TB Samsung HD103UJ + 1x1TB HGST Travelstar 7K1000 (pool used for backup)
“backup_raidz_test” pool: 1TB Samsung HD103UJ (pool with no parity, for additional tests)
Both systems were tested with memtest, cpuburn etc. without errors.
I am using Debian Jessie booted from zfs pool (with a separate boot partition), same operating system on both machines used with “nas1FS” pool.
Kernel Command line:
BOOT_IMAGE=/vmlinuz-3.16.0-4-amd64 root=ZFS=/rootFS/1_jessie ro rpool=nas1FS bootfs=nas1FS/rootFS/1_jessie root=ZFS=nas1FS/rootFS/1_jessie rootfstype=zfs boot=zfs quiet
SPL and ZFS was built from source.
Some excerpts from dmesg (blocked messages are not connected to the hard lockup of the system):
I hope this information will be helpful, but feel free to let me know what other tests I can perform to diagnose this issue. I will be happy to provide any other info.
The text was updated successfully, but these errors were encountered: