Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Holes are not correctly generated(Silent corruption) #4996

Closed
loyou opened this issue Aug 21, 2016 · 4 comments
Closed

Holes are not correctly generated(Silent corruption) #4996

loyou opened this issue Aug 21, 2016 · 4 comments
Milestone

Comments

@loyou
Copy link
Contributor

loyou commented Aug 21, 2016

I open a new issue to check the hole birth related corruption issue separately.
This issue can be reproduced with 0.6.5.7 and latest master except ignore hole birth patch.
In this issue, holes are not generated correctly with the following steps which finally caused send/recv corruption:

zpool create vpool sdb -f -o listsnapshots=on
zfs create vpool/hb
# step 1, create file with seek and take a snapshot
dd if=/dev/urandom of=/vpool/hb/large_file bs=128k count=1 seek=1 2> /dev/null
zfs snapshot vpool/hb@s1
# step 2, rewrite(or truncate) this file with smaller size
dd if=/dev/urandom of=/vpool/hb/large_file bs=128k count=1 2> /dev/null
# step 3, and then rewrite over the step1's filesize with seek, then take snapshot
dd if=/dev/urandom of=/vpool/hb/large_file bs=128k count=1 seek=3 2> /dev/null
zfs snapshot vpool/hb@s2
# step 4, send and receive snapshots
zfs send vpool/hb@s1 | zfs recv vpool/backup
zfs send -i vpool/hb@s1 vpool/hb@s2|zfs recv vpool/backup
# step 5, checksum
md5sum /vpool/*/large_file

we get md5sum mismatch:

fcbd29c64fe3f7ec9e13c9ea38a294a6  /vpool/backup/large_file
07c67ff54b20c8670302d832a4228646  /vpool/hb/large_file

let's check the indirect block of large_file in vpool/hb@s1 ,

root@i-r8d0cfps:~# zdb -ddddd vpool/hb@s1 7
Dataset vpool/hb@s1 [ZPL], ID 46, cr_txg 6, 148K, 7 objects, rootbp DVA[0]=<0:2bc00:200> DVA[1]=<0:7800bc00:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=6L/6P fill=7 cksum=7bf99c799:31f3d1988b4:a53ad4959692:175b02e1c3201f

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         7    2    16K   128K   129K   256K   50.00  ZFS plain file
                                        168   bonus  System attributes
    dnode flags: USED_BYTES USERUSED_ACCOUNTED 
    dnode maxblkid: 1
    path    /large_file
    uid     0
    gid     0
    atime   Sun Aug 21 17:40:01 2016
    mtime   Sun Aug 21 17:40:01 2016
    ctime   Sun Aug 21 17:40:01 2016
    crtime  Sun Aug 21 17:40:01 2016
    gen 6
    mode    100644
    size    262144
    parent  4
    links   1
    pflags  40800000004
Indirect blocks:
               0 L1  0:29400:200 4000L/200P F=1 B=6/6
           20000  L0 0:9400:20000 20000L/20000P F=1 B=6/6

        segment [0000000000020000, 0000000000040000) size  128K

and then the indirect block of large_file in vpool/hb@s2,

root@i-r8d0cfps:~# zdb -ddddd vpool/hb@s2 7
Dataset vpool/hb@s2 [ZPL], ID 49, cr_txg 7, 276K, 7 objects, rootbp DVA[0]=<0:70400:200> DVA[1]=<0:78010400:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=7L/7P fill=7 cksum=b3c6342cd:45accc95412:de13aa865bd8:1e4e9d3bd49e5c

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         7    2    16K   128K   257K   512K   50.00  ZFS plain file
                                        168   bonus  System attributes
    dnode flags: USED_BYTES USERUSED_ACCOUNTED 
    dnode maxblkid: 3
    path    /large_file
    uid     0
    gid     0
    atime   Sun Aug 21 17:40:01 2016
    mtime   Sun Aug 21 17:40:02 2016
    ctime   Sun Aug 21 17:40:02 2016
    crtime  Sun Aug 21 17:40:01 2016
    gen 6
    mode    100644
    size    524288
    parent  4
    links   1
    pflags  40800000004
Indirect blocks:
               0 L1  0:6f000:200 4000L/200P F=2 B=7/7
               0  L0 0:2f000:20000 20000L/20000P F=1 B=7/7
           60000  L0 0:4f000:20000 20000L/20000P F=1 B=7/7

        segment [0000000000000000, 0000000000020000) size  128K
        segment [0000000000060000, 0000000000080000) size  128K

As my understanding, in the indirect block of large_file in vpool/hb@s2
it should get the indirect blocks like this:

               0 L1  0:6f000:200 4000L/200P F=2 B=7/7
               0  L0 0:2f000:20000 20000L/20000P F=1 B=7/7
           20000  L0 0:0:0 20000L B=7
           60000  L0 0:4f000:20000 20000L/20000P F=1 B=7/7

L0 blocks at offset 20000 should be set to holes with birth time.
This hole is lost in step 2, we can get indirect blocks after step2.

root@i-r8d0cfps:~# zdb -ddddd vpool/hb 7
Dataset vpool/hb [ZPL], ID 42, cr_txg 5, 148K, 7 objects, rootbp DVA[0]=<0:5f400:200> DVA[1]=<0:7801f400:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=7L/7P fill=7 cksum=b9ef92296:48d4b5bc9d8:e9ed5565f9da:2012b0d5d1da2c

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         7    2    16K   128K   129K   128K  100.00  ZFS plain file
                                        168   bonus  System attributes
    dnode flags: USED_BYTES USERUSED_ACCOUNTED 
    dnode maxblkid: 0
    path    /large_file
    uid     0
    gid     0
    atime   Sun Aug 21 17:47:12 2016
    mtime   Sun Aug 21 17:47:13 2016
    ctime   Sun Aug 21 17:47:13 2016
    crtime  Sun Aug 21 17:47:12 2016
    gen 6
    mode    100644
    size    131072
    parent  4
    links   1
    pflags  40800000004
Indirect blocks:
               0 L1  0:5e000:200 4000L/200P F=1 B=7/7
               0  L0 0:3e000:20000 20000L/20000P F=1 B=7/7

        segment [0000000000000000, 0000000000020000) size  128K

During the test, I can find some point that in step3, if it expand the file size to next indirect level(L2). It will be fine.
compared with corruption file, the corrupted area are the offset 20000, which should be send as dump_free in sending stream.

/vpool/backup/large_file                                                        
0001 FFE0: 3B DC 9C 83 57 4E 74 7F  B9 51 84 4B 0A 49 C3 4A  ;...WNt. .Q.K.I.J  
0001 FFF0: F9 0B 2D B3 FB FE DA D6  AB 1E B2 E8 39 6E A4 63  ..-..... ....9n.c  
0002 0000: E7 4D FA FD 02 98 07 B7  4A FF BD 22 0C BC F2 AB  .M...... J.."....  
0002 0010: 81 2C ED 7B 98 52 ED D5  54 6D D2 17 E2 39 84 77  .,.{.R.. Tm...9.w  
0002 0020: 9E 2C F5 9A D8 C5 A2 CC  A7 FA 98 06 79 3E 15 26  .,...... ....y>.&  
0002 0030: C8 1F 43 57 6B 42 9F FF  89 22 4F 3B 46 1E BE 4B  ..CWkB.. ."O;F..K  
0002 0040: E8 6B E6 6C 5A DC 6C 02  24 6C 13 FF 03 57 28 B6  .k.lZ.l. $l...W(.  

/vpool/hb/large_file                                                            
0001 FFE0: 3B DC 9C 83 57 4E 74 7F  B9 51 84 4B 0A 49 C3 4A  ;...WNt. .Q.K.I.J  
0001 FFF0: F9 0B 2D B3 FB FE DA D6  AB 1E B2 E8 39 6E A4 63  ..-..... ....9n.c  
0002 0000: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ........ ........  
0002 0010: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ........ ........  
0002 0020: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ........ ........  
0002 0030: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ........ ........  
0002 0040: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ........ ........  
@rincebrain
Copy link
Contributor

This looks like it would be fixed by either #4754 or #4950, both of which are in master but neither of which is in an 0.6.5.X release (yet).

Could you try master and report back if you can reproduce using git master?

Also, are you saying this still happens if you have the ignore_hole_birth tunable set to 1 and then do all of these steps?

@loyou
Copy link
Contributor Author

loyou commented Aug 21, 2016

I have already merged those patch in 0.6.5.7.
The following are git master test result.

root@i-qdlbf8a3:~# cat /sys/module/zfs/version 
0.6.5-415_g2bce804
root@i-qdlbf8a3:~# cat /sys/module/zfs/parameters/ignore_hole_birth 
0
root@i-qdlbf8a3:~# sh 3.sh 
0d4b02f699bc6f3db26c13139e709f12  /vpool/backup/large_file
74b04ea0f4bdf8dfb1e2fa1800d2ccbb  /vpool/hb/large_file
root@i-qdlbf8a3:~# zpool destroy vpool
root@i-qdlbf8a3:~# echo 1 > /sys/module/zfs/parameters/ignore_hole_birth 
root@i-qdlbf8a3:~# sh 3.sh 
82ef22fbd09b2d5a5e214e282c988236  /vpool/backup/large_file
82ef22fbd09b2d5a5e214e282c988236  /vpool/hb/large_file
root@i-qdlbf8a3:~# cat 3.sh 
zpool create vpool sdb -f -o listsnapshots=on
zfs create vpool/hb
# step 1, create file with seek and take a snapshot
dd if=/dev/urandom of=/vpool/hb/large_file bs=128k count=1 seek=1 2> /dev/null
zfs snapshot vpool/hb@s1
# step 2, rewrite(or truncate) this file with smaller size
dd if=/dev/urandom of=/vpool/hb/large_file bs=128k count=1 2> /dev/null
# step 3, and then rewrite over the step1's filesize with seek, then take snapshot
dd if=/dev/urandom of=/vpool/hb/large_file bs=128k count=1 seek=3 2> /dev/null
zfs snapshot vpool/hb@s2
# step 4, send and receive snapshots
zfs send vpool/hb@s1 | zfs recv vpool/backup
zfs send -i vpool/hb@s1 vpool/hb@s2|zfs recv vpool/backup
# step 5, checksum
md5sum /vpool/*/large_file

@pcd1193182
Copy link
Contributor

pcd1193182 commented Aug 22, 2016

Just wanted to comment that this is definitely a new issue; occurs on Delphix's OS. The patch in #5004 appears to solve the issue; now it's just down to deciding what the best fix is.

loyou added a commit to loyou/zfs that referenced this issue Sep 4, 2016
Issue openzfs#4996 is ultimately caused because of a problem in the freeing logic.
At the time when the dbufs are bzero'd, zfs doesn't yet have enough
information to properly handle writes later in the same TXG. By changing
the code to use the new dbuf_write_children_ready code path in more cases,
we address this issue. The only time free_children will bzero a dbuf now is
if we are freeing the entire dnode.

Add Test Suite for hole_birth issues: illumos openzfs#7176, Zol openzfs#4996, Zol openzfs#5030.

Change meaningless -1 into DMU_OBJECT_END
@behlendorf behlendorf removed the Bug label Sep 30, 2016
@behlendorf behlendorf modified the milestones: 0.8.0, 0.7.0 Jun 6, 2017
@behlendorf
Copy link
Contributor

Closing, this was resolved in #5675.

brad-lewis pushed a commit to brad-lewis/openzfs that referenced this issue Apr 6, 2018
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>

As reported by openzfs/zfs#4996, there is yet another hole birth issue. In this one,
if a block is entirely holes, but the birth times are not all the same, we lose that information by creating one
hole with the current txg as its birth time.

The ZoL PR's fix approach is incorrect. Ultimately, the problem here is that when you truncate and write a file
in the same transaction group, the dbuf for the indirect block will be zeroed out to deal with the truncation,
and then written for the write. During this process, we will lose hole birth time information for any holes in
the range. In the case where a dnode is being freed, we need to determine whether the block should be converted
to a higher-level hole in the zio pipeline, and if so do it when the dnode is being synced out.

Upstream bug: DLPX-46861
brad-lewis pushed a commit to brad-lewis/openzfs that referenced this issue Apr 13, 2018
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>

As reported by openzfs/zfs#4996, there is yet another hole birth issue. In this one,
if a block is entirely holes, but the birth times are not all the same, we lose that information by creating one
hole with the current txg as its birth time.

The ZoL PR's fix approach is incorrect. Ultimately, the problem here is that when you truncate and write a file
in the same transaction group, the dbuf for the indirect block will be zeroed out to deal with the truncation,
and then written for the write. During this process, we will lose hole birth time information for any holes in
the range. In the case where a dnode is being freed, we need to determine whether the block should be converted
to a higher-level hole in the zio pipeline, and if so do it when the dnode is being synced out.

Upstream bug: DLPX-46861
prakashsurya pushed a commit to brad-lewis/openzfs that referenced this issue Jun 18, 2018
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>

As reported by openzfs/zfs#4996, there is
yet another hole birth issue. In this one, if a block is entirely holes,
but the birth times are not all the same, we lose that information by
creating one hole with the current txg as its birth time.

The ZoL PR's fix approach is incorrect. Ultimately, the problem here is
that when you truncate and write a file in the same transaction group,
the dbuf for the indirect block will be zeroed out to deal with the
truncation, and then written for the write. During this process, we will
lose hole birth time information for any holes in the range. In the case
where a dnode is being freed, we need to determine whether the block
should be converted to a higher-level hole in the zio pipeline, and if
so do it when the dnode is being synced out.

Closes openzfs#611
prakashsurya pushed a commit to openzfs/openzfs that referenced this issue Jun 19, 2018
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Approved by: Robert Mustacchi <[email protected]>

As reported by openzfs/zfs#4996, there is
yet another hole birth issue. In this one, if a block is entirely holes,
but the birth times are not all the same, we lose that information by
creating one hole with the current txg as its birth time.

The ZoL PR's fix approach is incorrect. Ultimately, the problem here is
that when you truncate and write a file in the same transaction group,
the dbuf for the indirect block will be zeroed out to deal with the
truncation, and then written for the write. During this process, we will
lose hole birth time information for any holes in the range. In the case
where a dnode is being freed, we need to determine whether the block
should be converted to a higher-level hole in the zio pipeline, and if
so do it when the dnode is being synced out.

Closes #611
lundman pushed a commit to openzfsonosx/zfs that referenced this issue Jun 27, 2018
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Approved by: Robert Mustacchi <[email protected]>

As reported by openzfs/zfs#4996, there is
yet another hole birth issue. In this one, if a block is entirely holes,
but the birth times are not all the same, we lose that information by
creating one hole with the current txg as its birth time.

The ZoL PR's fix approach is incorrect. Ultimately, the problem here is
that when you truncate and write a file in the same transaction group,
the dbuf for the indirect block will be zeroed out to deal with the
truncation, and then written for the write. During this process, we will
lose hole birth time information for any holes in the range. In the case
where a dnode is being freed, we need to determine whether the block
should be converted to a higher-level hole in the zio pipeline, and if
so do it when the dnode is being synced out.
lundman pushed a commit to openzfsonwindows/ZFSin that referenced this issue Jul 19, 2018
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Approved by: Robert Mustacchi <[email protected]>

As reported by openzfs/zfs#4996, there is
yet another hole birth issue. In this one, if a block is entirely holes,
but the birth times are not all the same, we lose that information by
creating one hole with the current txg as its birth time.

The ZoL PR's fix approach is incorrect. Ultimately, the problem here is
that when you truncate and write a file in the same transaction group,
the dbuf for the indirect block will be zeroed out to deal with the
truncation, and then written for the write. During this process, we will
lose hole birth time information for any holes in the range. In the case
where a dnode is being freed, we need to determine whether the block
should be converted to a higher-level hole in the zio pipeline, and if
so do it when the dnode is being synced out.
lundman pushed a commit to openzfsonwindows/ZFSin that referenced this issue Jul 19, 2018
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Approved by: Robert Mustacchi <[email protected]>

As reported by openzfs/zfs#4996, there is
yet another hole birth issue. In this one, if a block is entirely holes,
but the birth times are not all the same, we lose that information by
creating one hole with the current txg as its birth time.

The ZoL PR's fix approach is incorrect. Ultimately, the problem here is
that when you truncate and write a file in the same transaction group,
the dbuf for the indirect block will be zeroed out to deal with the
truncation, and then written for the write. During this process, we will
lose hole birth time information for any holes in the range. In the case
where a dnode is being freed, we need to determine whether the block
should be converted to a higher-level hole in the zio pipeline, and if
so do it when the dnode is being synced out.
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Jul 26, 2018
… birth times

As reported by openzfs#4996, there is
yet another hole birth issue. In this one, if a block is entirely holes,
but the birth times are not all the same, we lose that information by
creating one hole with the current txg as its birth time.

The ZoL PR's fix approach is incorrect. Ultimately, the problem here is
that when you truncate and write a file in the same transaction group,
the dbuf for the indirect block will be zeroed out to deal with the
truncation, and then written for the write. During this process, we will
lose hole birth time information for any holes in the range. In the case
where a dnode is being freed, we need to determine whether the block
should be converted to a higher-level hole in the zio pipeline, and if
so do it when the dnode is being synced out.

Porting Notes:
* The DMU_OBJECT_END change in zfs_znode.c was already applied.
* Added test cases from openzfs#5675 provided by @rincebrain for hole_birth
  issues.  These test cases should be pushed upstream to OpenZFS.
* Updated mk_files which is used by several rsend tests so the
  files created are a little more interesting and may contain holes.

Authored by: Paul Dagnelie <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Ported-by: Brian Behlendorf <[email protected]>

OpenZFS-issue: https://www.illumos.org/issues/9438
OpenZFS-commit: openzfs/openzfs@738e2a3c
External-issue: DLPX-46861
behlendorf pushed a commit that referenced this issue Jul 30, 2018
… birth times

As reported by #4996, there is
yet another hole birth issue. In this one, if a block is entirely holes,
but the birth times are not all the same, we lose that information by
creating one hole with the current txg as its birth time.

The ZoL PR's fix approach is incorrect. Ultimately, the problem here is
that when you truncate and write a file in the same transaction group,
the dbuf for the indirect block will be zeroed out to deal with the
truncation, and then written for the write. During this process, we will
lose hole birth time information for any holes in the range. In the case
where a dnode is being freed, we need to determine whether the block
should be converted to a higher-level hole in the zio pipeline, and if
so do it when the dnode is being synced out.

Porting Notes:
* The DMU_OBJECT_END change in zfs_znode.c was already applied.
* Added test cases from #5675 provided by @rincebrain for hole_birth
  issues.  These test cases should be pushed upstream to OpenZFS.
* Updated mk_files which is used by several rsend tests so the
  files created are a little more interesting and may contain holes.

Authored by: Paul Dagnelie <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Ported-by: Brian Behlendorf <[email protected]>

OpenZFS-issue: https://www.illumos.org/issues/9438
OpenZFS-commit: openzfs/openzfs@738e2a3c
External-issue: DLPX-46861
Closes #7746
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants