Holes are not correctly generated(Silent corruption) #4996

loyou · 2016-08-21T10:05:28Z

I open a new issue to check the hole birth related corruption issue separately.
This issue can be reproduced with 0.6.5.7 and latest master except ignore hole birth patch.
In this issue, holes are not generated correctly with the following steps which finally caused send/recv corruption:

zpool create vpool sdb -f -o listsnapshots=on
zfs create vpool/hb
# step 1, create file with seek and take a snapshot
dd if=/dev/urandom of=/vpool/hb/large_file bs=128k count=1 seek=1 2> /dev/null
zfs snapshot vpool/hb@s1
# step 2, rewrite(or truncate) this file with smaller size
dd if=/dev/urandom of=/vpool/hb/large_file bs=128k count=1 2> /dev/null
# step 3, and then rewrite over the step1's filesize with seek, then take snapshot
dd if=/dev/urandom of=/vpool/hb/large_file bs=128k count=1 seek=3 2> /dev/null
zfs snapshot vpool/hb@s2
# step 4, send and receive snapshots
zfs send vpool/hb@s1 | zfs recv vpool/backup
zfs send -i vpool/hb@s1 vpool/hb@s2|zfs recv vpool/backup
# step 5, checksum
md5sum /vpool/*/large_file

we get md5sum mismatch:

fcbd29c64fe3f7ec9e13c9ea38a294a6  /vpool/backup/large_file
07c67ff54b20c8670302d832a4228646  /vpool/hb/large_file

let's check the indirect block of large_file in vpool/hb@s1 ,

root@i-r8d0cfps:~# zdb -ddddd vpool/hb@s1 7
Dataset vpool/hb@s1 [ZPL], ID 46, cr_txg 6, 148K, 7 objects, rootbp DVA[0]=<0:2bc00:200> DVA[1]=<0:7800bc00:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=6L/6P fill=7 cksum=7bf99c799:31f3d1988b4:a53ad4959692:175b02e1c3201f

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         7    2    16K   128K   129K   256K   50.00  ZFS plain file
                                        168   bonus  System attributes
    dnode flags: USED_BYTES USERUSED_ACCOUNTED 
    dnode maxblkid: 1
    path    /large_file
    uid     0
    gid     0
    atime   Sun Aug 21 17:40:01 2016
    mtime   Sun Aug 21 17:40:01 2016
    ctime   Sun Aug 21 17:40:01 2016
    crtime  Sun Aug 21 17:40:01 2016
    gen 6
    mode    100644
    size    262144
    parent  4
    links   1
    pflags  40800000004
Indirect blocks:
               0 L1  0:29400:200 4000L/200P F=1 B=6/6
           20000  L0 0:9400:20000 20000L/20000P F=1 B=6/6

        segment [0000000000020000, 0000000000040000) size  128K

and then the indirect block of large_file in vpool/hb@s2,

root@i-r8d0cfps:~# zdb -ddddd vpool/hb@s2 7
Dataset vpool/hb@s2 [ZPL], ID 49, cr_txg 7, 276K, 7 objects, rootbp DVA[0]=<0:70400:200> DVA[1]=<0:78010400:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=7L/7P fill=7 cksum=b3c6342cd:45accc95412:de13aa865bd8:1e4e9d3bd49e5c

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         7    2    16K   128K   257K   512K   50.00  ZFS plain file
                                        168   bonus  System attributes
    dnode flags: USED_BYTES USERUSED_ACCOUNTED 
    dnode maxblkid: 3
    path    /large_file
    uid     0
    gid     0
    atime   Sun Aug 21 17:40:01 2016
    mtime   Sun Aug 21 17:40:02 2016
    ctime   Sun Aug 21 17:40:02 2016
    crtime  Sun Aug 21 17:40:01 2016
    gen 6
    mode    100644
    size    524288
    parent  4
    links   1
    pflags  40800000004
Indirect blocks:
               0 L1  0:6f000:200 4000L/200P F=2 B=7/7
               0  L0 0:2f000:20000 20000L/20000P F=1 B=7/7
           60000  L0 0:4f000:20000 20000L/20000P F=1 B=7/7

        segment [0000000000000000, 0000000000020000) size  128K
        segment [0000000000060000, 0000000000080000) size  128K

As my understanding, in the indirect block of large_file in vpool/hb@s2
it should get the indirect blocks like this:

               0 L1  0:6f000:200 4000L/200P F=2 B=7/7
               0  L0 0:2f000:20000 20000L/20000P F=1 B=7/7
           20000  L0 0:0:0 20000L B=7
           60000  L0 0:4f000:20000 20000L/20000P F=1 B=7/7

L0 blocks at offset 20000 should be set to holes with birth time.
This hole is lost in step 2, we can get indirect blocks after step2.

root@i-r8d0cfps:~# zdb -ddddd vpool/hb 7
Dataset vpool/hb [ZPL], ID 42, cr_txg 5, 148K, 7 objects, rootbp DVA[0]=<0:5f400:200> DVA[1]=<0:7801f400:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=7L/7P fill=7 cksum=b9ef92296:48d4b5bc9d8:e9ed5565f9da:2012b0d5d1da2c

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         7    2    16K   128K   129K   128K  100.00  ZFS plain file
                                        168   bonus  System attributes
    dnode flags: USED_BYTES USERUSED_ACCOUNTED 
    dnode maxblkid: 0
    path    /large_file
    uid     0
    gid     0
    atime   Sun Aug 21 17:47:12 2016
    mtime   Sun Aug 21 17:47:13 2016
    ctime   Sun Aug 21 17:47:13 2016
    crtime  Sun Aug 21 17:47:12 2016
    gen 6
    mode    100644
    size    131072
    parent  4
    links   1
    pflags  40800000004
Indirect blocks:
               0 L1  0:5e000:200 4000L/200P F=1 B=7/7
               0  L0 0:3e000:20000 20000L/20000P F=1 B=7/7

        segment [0000000000000000, 0000000000020000) size  128K

During the test, I can find some point that in step3, if it expand the file size to next indirect level(L2). It will be fine.
compared with corruption file, the corrupted area are the offset 20000, which should be send as dump_free in sending stream.

/vpool/backup/large_file                                                        
0001 FFE0: 3B DC 9C 83 57 4E 74 7F  B9 51 84 4B 0A 49 C3 4A  ;...WNt. .Q.K.I.J  
0001 FFF0: F9 0B 2D B3 FB FE DA D6  AB 1E B2 E8 39 6E A4 63  ..-..... ....9n.c  
0002 0000: E7 4D FA FD 02 98 07 B7  4A FF BD 22 0C BC F2 AB  .M...... J.."....  
0002 0010: 81 2C ED 7B 98 52 ED D5  54 6D D2 17 E2 39 84 77  .,.{.R.. Tm...9.w  
0002 0020: 9E 2C F5 9A D8 C5 A2 CC  A7 FA 98 06 79 3E 15 26  .,...... ....y>.&  
0002 0030: C8 1F 43 57 6B 42 9F FF  89 22 4F 3B 46 1E BE 4B  ..CWkB.. ."O;F..K  
0002 0040: E8 6B E6 6C 5A DC 6C 02  24 6C 13 FF 03 57 28 B6  .k.lZ.l. $l...W(.  

/vpool/hb/large_file                                                            
0001 FFE0: 3B DC 9C 83 57 4E 74 7F  B9 51 84 4B 0A 49 C3 4A  ;...WNt. .Q.K.I.J  
0001 FFF0: F9 0B 2D B3 FB FE DA D6  AB 1E B2 E8 39 6E A4 63  ..-..... ....9n.c  
0002 0000: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ........ ........  
0002 0010: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ........ ........  
0002 0020: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ........ ........  
0002 0030: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ........ ........  
0002 0040: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ........ ........

The text was updated successfully, but these errors were encountered:

rincebrain · 2016-08-21T10:09:32Z

This looks like it would be fixed by either #4754 or #4950, both of which are in master but neither of which is in an 0.6.5.X release (yet).

Could you try master and report back if you can reproduce using git master?

Also, are you saying this still happens if you have the ignore_hole_birth tunable set to 1 and then do all of these steps?

loyou · 2016-08-21T10:25:42Z

I have already merged those patch in 0.6.5.7.
The following are git master test result.

root@i-qdlbf8a3:~# cat /sys/module/zfs/version 
0.6.5-415_g2bce804
root@i-qdlbf8a3:~# cat /sys/module/zfs/parameters/ignore_hole_birth 
0
root@i-qdlbf8a3:~# sh 3.sh 
0d4b02f699bc6f3db26c13139e709f12  /vpool/backup/large_file
74b04ea0f4bdf8dfb1e2fa1800d2ccbb  /vpool/hb/large_file
root@i-qdlbf8a3:~# zpool destroy vpool
root@i-qdlbf8a3:~# echo 1 > /sys/module/zfs/parameters/ignore_hole_birth 
root@i-qdlbf8a3:~# sh 3.sh 
82ef22fbd09b2d5a5e214e282c988236  /vpool/backup/large_file
82ef22fbd09b2d5a5e214e282c988236  /vpool/hb/large_file
root@i-qdlbf8a3:~# cat 3.sh 
zpool create vpool sdb -f -o listsnapshots=on
zfs create vpool/hb
# step 1, create file with seek and take a snapshot
dd if=/dev/urandom of=/vpool/hb/large_file bs=128k count=1 seek=1 2> /dev/null
zfs snapshot vpool/hb@s1
# step 2, rewrite(or truncate) this file with smaller size
dd if=/dev/urandom of=/vpool/hb/large_file bs=128k count=1 2> /dev/null
# step 3, and then rewrite over the step1's filesize with seek, then take snapshot
dd if=/dev/urandom of=/vpool/hb/large_file bs=128k count=1 seek=3 2> /dev/null
zfs snapshot vpool/hb@s2
# step 4, send and receive snapshots
zfs send vpool/hb@s1 | zfs recv vpool/backup
zfs send -i vpool/hb@s1 vpool/hb@s2|zfs recv vpool/backup
# step 5, checksum
md5sum /vpool/*/large_file

pcd1193182 · 2016-08-22T23:40:39Z

Just wanted to comment that this is definitely a new issue; occurs on Delphix's OS. The patch in #5004 appears to solve the issue; now it's just down to deciding what the best fix is.

Issue openzfs#4996 is ultimately caused because of a problem in the freeing logic. At the time when the dbufs are bzero'd, zfs doesn't yet have enough information to properly handle writes later in the same TXG. By changing the code to use the new dbuf_write_children_ready code path in more cases, we address this issue. The only time free_children will bzero a dbuf now is if we are freeing the entire dnode. Add Test Suite for hole_birth issues: illumos openzfs#7176, Zol openzfs#4996, Zol openzfs#5030. Change meaningless -1 into DMU_OBJECT_END

behlendorf · 2018-02-09T23:44:21Z

Closing, this was resolved in #5675.

Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> As reported by openzfs/zfs#4996, there is yet another hole birth issue. In this one, if a block is entirely holes, but the birth times are not all the same, we lose that information by creating one hole with the current txg as its birth time. The ZoL PR's fix approach is incorrect. Ultimately, the problem here is that when you truncate and write a file in the same transaction group, the dbuf for the indirect block will be zeroed out to deal with the truncation, and then written for the write. During this process, we will lose hole birth time information for any holes in the range. In the case where a dnode is being freed, we need to determine whether the block should be converted to a higher-level hole in the zio pipeline, and if so do it when the dnode is being synced out. Upstream bug: DLPX-46861

Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> As reported by openzfs/zfs#4996, there is yet another hole birth issue. In this one, if a block is entirely holes, but the birth times are not all the same, we lose that information by creating one hole with the current txg as its birth time. The ZoL PR's fix approach is incorrect. Ultimately, the problem here is that when you truncate and write a file in the same transaction group, the dbuf for the indirect block will be zeroed out to deal with the truncation, and then written for the write. During this process, we will lose hole birth time information for any holes in the range. In the case where a dnode is being freed, we need to determine whether the block should be converted to a higher-level hole in the zio pipeline, and if so do it when the dnode is being synced out. Closes openzfs#611

Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Robert Mustacchi <[email protected]> As reported by openzfs/zfs#4996, there is yet another hole birth issue. In this one, if a block is entirely holes, but the birth times are not all the same, we lose that information by creating one hole with the current txg as its birth time. The ZoL PR's fix approach is incorrect. Ultimately, the problem here is that when you truncate and write a file in the same transaction group, the dbuf for the indirect block will be zeroed out to deal with the truncation, and then written for the write. During this process, we will lose hole birth time information for any holes in the range. In the case where a dnode is being freed, we need to determine whether the block should be converted to a higher-level hole in the zio pipeline, and if so do it when the dnode is being synced out. Closes #611

Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Robert Mustacchi <[email protected]> As reported by openzfs/zfs#4996, there is yet another hole birth issue. In this one, if a block is entirely holes, but the birth times are not all the same, we lose that information by creating one hole with the current txg as its birth time. The ZoL PR's fix approach is incorrect. Ultimately, the problem here is that when you truncate and write a file in the same transaction group, the dbuf for the indirect block will be zeroed out to deal with the truncation, and then written for the write. During this process, we will lose hole birth time information for any holes in the range. In the case where a dnode is being freed, we need to determine whether the block should be converted to a higher-level hole in the zio pipeline, and if so do it when the dnode is being synced out.

@rincebrain

… birth times As reported by openzfs#4996, there is yet another hole birth issue. In this one, if a block is entirely holes, but the birth times are not all the same, we lose that information by creating one hole with the current txg as its birth time. The ZoL PR's fix approach is incorrect. Ultimately, the problem here is that when you truncate and write a file in the same transaction group, the dbuf for the indirect block will be zeroed out to deal with the truncation, and then written for the write. During this process, we will lose hole birth time information for any holes in the range. In the case where a dnode is being freed, we need to determine whether the block should be converted to a higher-level hole in the zio pipeline, and if so do it when the dnode is being synced out. Porting Notes: * The DMU_OBJECT_END change in zfs_znode.c was already applied. * Added test cases from openzfs#5675 provided by @rincebrain for hole_birth issues. These test cases should be pushed upstream to OpenZFS. * Updated mk_files which is used by several rsend tests so the files created are a little more interesting and may contain holes. Authored by: Paul Dagnelie <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9438 OpenZFS-commit: openzfs/openzfs@738e2a3c External-issue: DLPX-46861

@rincebrain

… birth times As reported by #4996, there is yet another hole birth issue. In this one, if a block is entirely holes, but the birth times are not all the same, we lose that information by creating one hole with the current txg as its birth time. The ZoL PR's fix approach is incorrect. Ultimately, the problem here is that when you truncate and write a file in the same transaction group, the dbuf for the indirect block will be zeroed out to deal with the truncation, and then written for the write. During this process, we will lose hole birth time information for any holes in the range. In the case where a dnode is being freed, we need to determine whether the block should be converted to a higher-level hole in the zio pipeline, and if so do it when the dnode is being synced out. Porting Notes: * The DMU_OBJECT_END change in zfs_znode.c was already applied. * Added test cases from #5675 provided by @rincebrain for hole_birth issues. These test cases should be pushed upstream to OpenZFS. * Updated mk_files which is used by several rsend tests so the files created are a little more interesting and may contain holes. Authored by: Paul Dagnelie <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9438 OpenZFS-commit: openzfs/openzfs@738e2a3c External-issue: DLPX-46861 Closes #7746

loyou mentioned this issue Aug 22, 2016

fix hole_birth issue #4996 and add hole_birth Test Suite #5004

Closed

behlendorf added this to the 0.7.0 milestone Aug 22, 2016

behlendorf added the Bug label Aug 22, 2016

rincebrain mentioned this issue Aug 27, 2016

Even more hole_birth bug #5030

Closed

pcd1193182 mentioned this issue Sep 1, 2016

Holes are not correctly generated(Silent corruption) #5061

Closed

behlendorf removed the Bug label Sep 30, 2016

kernelOfTruth mentioned this issue Dec 30, 2016

Provide a way to recover from "permanent errors" #5528

Closed

behlendorf modified the milestones: 0.8.0, 0.7.0 Jun 6, 2017

behlendorf closed this as completed Feb 9, 2018

behlendorf mentioned this issue Jul 26, 2018

OpenZFS 9438 - Holes can lose birth time info if a block has a mix of birth times #7746

Closed

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Holes are not correctly generated(Silent corruption) #4996

Holes are not correctly generated(Silent corruption) #4996

loyou commented Aug 21, 2016 •

edited

Loading

rincebrain commented Aug 21, 2016

loyou commented Aug 21, 2016

pcd1193182 commented Aug 22, 2016 •

edited by behlendorf

Loading

behlendorf commented Feb 9, 2018

Holes are not correctly generated(Silent corruption) #4996

Holes are not correctly generated(Silent corruption) #4996

Comments

loyou commented Aug 21, 2016 • edited Loading

rincebrain commented Aug 21, 2016

loyou commented Aug 21, 2016

pcd1193182 commented Aug 22, 2016 • edited by behlendorf Loading

behlendorf commented Feb 9, 2018

loyou commented Aug 21, 2016 •

edited

Loading

pcd1193182 commented Aug 22, 2016 •

edited by behlendorf

Loading