Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BRT: Does not respect fallocate FALLOC_FL_PUNCH_HOLE before txg sync #16012

Open
rrevans opened this issue Mar 19, 2024 · 0 comments
Open

BRT: Does not respect fallocate FALLOC_FL_PUNCH_HOLE before txg sync #16012

rrevans opened this issue Mar 19, 2024 · 0 comments
Assignees
Labels
BRT BRT tracking Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@rrevans
Copy link
Contributor

rrevans commented Mar 19, 2024

System information

Type Version/Name
Distribution Name Fedora
Distribution Version 39
Kernel Version Linux 6.6.8-200.fc39
Architecture x86_64
OpenZFS Version head (8f2f6cd) --enable-debug

Describe the problem you're observing

Copying files with reflink does not respect holes added with fallocate(FALLOC_FL_PUNCH_HOLE).

Describe how to reproduce the problem

To reproduce on a dataset with recordsize=128k:

dd if=/dev/random of=out bs=1M count=1 status=none
zpool sync
fallocate -p -o 262144 -l 524288 out
cp --debug out out.2
diff -q out out.2

Output:

'out' -> 'out.2'
copy offload: unknown, reflink: yes, sparse detection: unknown
Files out and out.2 differ

The difference:

$ diff -u <(hexdump -C out) <(hexdump -C out.2)
--- /dev/fd/63  2024-03-19 08:34:33.496600405 -0400
+++ /dev/fd/62  2024-03-19 08:34:33.494600367 -0400
@@ -16382,8 +16382,32774 @@
 0003ffd0  39 0d 00 a3 91 2f 00 b6  a8 79 6d cd c9 83 d8 42  |9..../...ym....B|
 0003ffe0  b2 42 4e c3 9b 79 aa 96  66 68 8d 14 57 4b 98 32  |.BN..y..fh..WK.2|
 0003fff0  1f e7 36 ff 63 c0 b6 53  fd 81 cb 9e c5 bb 39 f9  |..6.c..S......9.|
-00040000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
-*
+00040000  5b 1c ca ce 08 10 72 cb  b9 a6 a2 c0 15 84 79 93  |[.....r.......y.|
+00040010  6a c1 55 9f cb 30 bb a9  a2 05 8d cb 7b a0 a3 42  |j.U..0......{..B|
+00040020  7a 32 05 9a 2a f5 29 91  9f 48 25 12 ac 4e 6b 09  |z2..*.)..H%..Nk.|
+00040030  ac 04 21 13 43 89 e1 96  c3 11 f1 dd e0 31 3c e4  |..!.C........1<.|
+00040040  1d db de 92 f1 67 6d dc  d1 d4 5d 72 ae d9 de 99  |.....gm...]r....|
... snip
+000bffc0  d7 2c 5d bb a0 3b 32 11  37 d1 24 49 b8 0d 88 fc  |.,]..;2.7.$I....|
+000bffd0  ea 79 9d df 25 ae 3d 16  c6 fd 5c 64 b2 9f 56 f2  |.y..%.=...\d..V.|
+000bffe0  4e d6 5d 4c a9 0b 83 47  51 ac 06 5b ec 0c 49 61  |N.]L...GQ..[..Ia|
+000bfff0  de 7c 87 0d e8 bc 8e f4  e3 b2 ef 07 96 3c fd a6  |.|...........<..|
 000c0000  e7 4e 28 f9 dc f8 f8 41  8b 1a d1 62 9d 4c f3 93  |.N(....A...b.L..|
 000c0010  66 88 ad ef 46 ef 78 11  19 08 c0 cb 3c 1a d0 ce  |f...F.x.....<...|
 000c0020  ab ba c3 c5 38 c3 77 95  88 65 d5 b0 28 d5 61 93  |....8.w..e..(.a.|

The issue does not happen if zpool sync is added betweenfallocate and cp.

Include any warning/errors/backtraces from the system logs

Nothing of note

Possible root cause

Punched holes for full records are processed only at sync time. The L1 block(s) for the affected range get dirtied, but none of the L0 blocks get dirtied.

When holes are read back, dbuf_read_hole detects the record is within a freed range via dnode_block_freed and synthesizes a zeroed record. Note that unsynced holes still have the original non-hole blkptr set on the parent L1 block, and must be specifically skipped by reads if the dbuf is not cached until sync clears the pointer.

dmu_read_l0_bps does not check for detect holes and relies on dirtying the L0 record to detect unsynced changes, so zfs_clone_range gets it wrong. It 1) accepts the file is clean and 2) copies the stale blkptrs as part of the clone.

Adding some zdb outputs to the reproducer clearly shows the blocks are being miscopied:

+ dd if=/dev/random of=out bs=1M count=1 status=none
+ zpool sync
+ sudo zdb -vv -O test out

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
     33167    2   128K   128K  1.00M     512     1M  100.00  ZFS plain file
                                               184   bonus  System attributes
	dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED 
	dnode maxblkid: 7
	uid     1000
	gid     1000
	atime	Tue Mar 19 08:43:42 2024
	mtime	Tue Mar 19 08:43:53 2024
	ctime	Tue Mar 19 08:43:53 2024
	crtime	Tue Mar 19 08:30:25 2024
	gen	6250830
	mode	100644
	size	1048576
	parent	34
	links	1
	pflags	840800000004
	xattr	33168
Indirect blocks:
               0 L1  0:a00af000:400 20000L/400P F=8 B=6251034/6251034 cksum=000000a4bd334262:000050baf4554e79:0017f678ded9c0b6:056cab4c2d8ca0bc
               0  L0 0:a04e0400:20000 20000L/20000P F=1 B=6251034/6251034 cksum=00004036258786ec:1014cfe9358d7677:868112d2d71a99cc:9dc19878ddccdea2
           20000  L0 0:a0500400:20000 20000L/20000P F=1 B=6251034/6251034 cksum=00003fd7d0f502b6:0ffcddc561ba2898:fd2a2771c5901615:78e16ffe55e9718b
           40000  L0 0:a0520400:20000 20000L/20000P F=1 B=6251034/6251034 cksum=00003f8a6d07d1e4:0fd716e7323db836:9b2964828c495bf4:8135d1cceccc0218
           60000  L0 0:a0540400:20000 20000L/20000P F=1 B=6251034/6251034 cksum=0000403e176b8307:10101dc08fb13f71:ef7d61c53a3cc3b2:9d72033a1b03c594
           80000  L0 0:a0560400:20000 20000L/20000P F=1 B=6251034/6251034 cksum=00004054a094eef6:0ffceb259e1c50f4:1f8475bd887e5c40:63cdb77b6791f9ac
           a0000  L0 0:a0580400:20000 20000L/20000P F=1 B=6251034/6251034 cksum=00004022454ea9b8:1011bd83355abb7f:7e7e5ffd5ed79d11:e85d05bbac9fac4c
           c0000  L0 0:a05a0400:20000 20000L/20000P F=1 B=6251034/6251034 cksum=00004072fa988400:1014e8a2f721c2c0:fbec9a6aec75375d:5ce0368265683524
           e0000  L0 0:a05c0400:20000 20000L/20000P F=1 B=6251034/6251034 cksum=00003fe47d597706:0ffd73e6ed22a7fd:796d5363056d9260:de9d85c78af6ef45

		segment [0000000000000000, 0000000000100000) size    1M
+ fallocate -p -o 262144 -l 524288 out
+ cp --debug out out.2
'out' -> 'out.2'
copy offload: unknown, reflink: yes, sparse detection: unknown
+ diff -q out out.2
Files out and out.2 differ
+ zpool sync
+ sudo zdb -vv -O test out

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
     33167    2   128K   128K   514K     512     1M   50.00  ZFS plain file
                                               184   bonus  System attributes
	dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED 
	dnode maxblkid: 7
	uid     1000
	gid     1000
	atime	Tue Mar 19 08:43:53 2024
	mtime	Tue Mar 19 08:43:53 2024
	ctime	Tue Mar 19 08:43:53 2024
	crtime	Tue Mar 19 08:30:25 2024
	gen	6250830
	mode	100644
	size	1048576
	parent	34
	links	1
	pflags	840800000004
	xattr	33168
Indirect blocks:
               0 L1  0:225e00:400 20000L/400P F=4 B=6251037/6251037 cksum=0000009414d5729e:000054d1e2f11a3a:001afd967fe080ab:063e503032610052
               0  L0 0:a04e0400:20000 20000L/20000P F=1 B=6251034/6251034 cksum=00004036258786ec:1014cfe9358d7677:868112d2d71a99cc:9dc19878ddccdea2
           20000  L0 0:a0500400:20000 20000L/20000P F=1 B=6251034/6251034 cksum=00003fd7d0f502b6:0ffcddc561ba2898:fd2a2771c5901615:78e16ffe55e9718b
           40000  L0 0:0:0 20000L B=6251037
           60000  L0 0:0:0 20000L B=6251037
           80000  L0 0:0:0 20000L B=6251037
           a0000  L0 0:0:0 20000L B=6251037
           c0000  L0 0:a05a0400:20000 20000L/20000P F=1 B=6251034/6251034 cksum=00004072fa988400:1014e8a2f721c2c0:fbec9a6aec75375d:5ce0368265683524
           e0000  L0 0:a05c0400:20000 20000L/20000P F=1 B=6251034/6251034 cksum=00003fe47d597706:0ffd73e6ed22a7fd:796d5363056d9260:de9d85c78af6ef45

		segment [0000000000000000, 0000000000040000) size  256K
		segment [00000000000c0000, 0000000000100000) size  256K
+ sudo zdb -vv -O test out.2

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
     33408    2   128K   128K  1.00M     512     1M  100.00  ZFS plain file
                                               184   bonus  System attributes
	dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED 
	dnode maxblkid: 7
	uid     1000
	gid     1000
	atime	Tue Mar 19 08:43:53 2024
	mtime	Tue Mar 19 08:43:53 2024
	ctime	Tue Mar 19 08:43:53 2024
	crtime	Tue Mar 19 08:30:25 2024
	gen	6250833
	mode	100644
	size	1048576
	parent	34
	links	1
	pflags	840800000004
	xattr	33409
Indirect blocks:
               0 L1  0:1002f8e00:400 20000L/400P F=8 B=6251037/6251037 cksum=000000a7218a8495:0000520430917ca7:00186cf00af12809:058bde7be53aef62
               0  L0 0:a04e0400:20000 20000L/20000P F=1 B=6251037/6251034 cksum=00004036258786ec:1014cfe9358d7677:868112d2d71a99cc:9dc19878ddccdea2
           20000  L0 0:a0500400:20000 20000L/20000P F=1 B=6251037/6251034 cksum=00003fd7d0f502b6:0ffcddc561ba2898:fd2a2771c5901615:78e16ffe55e9718b
           40000  L0 0:a0520400:20000 20000L/20000P F=1 B=6251037/6251034 cksum=00003f8a6d07d1e4:0fd716e7323db836:9b2964828c495bf4:8135d1cceccc0218
           60000  L0 0:a0540400:20000 20000L/20000P F=1 B=6251037/6251034 cksum=0000403e176b8307:10101dc08fb13f71:ef7d61c53a3cc3b2:9d72033a1b03c594
           80000  L0 0:a0560400:20000 20000L/20000P F=1 B=6251037/6251034 cksum=00004054a094eef6:0ffceb259e1c50f4:1f8475bd887e5c40:63cdb77b6791f9ac
           a0000  L0 0:a0580400:20000 20000L/20000P F=1 B=6251037/6251034 cksum=00004022454ea9b8:1011bd83355abb7f:7e7e5ffd5ed79d11:e85d05bbac9fac4c
           c0000  L0 0:a05a0400:20000 20000L/20000P F=1 B=6251037/6251034 cksum=00004072fa988400:1014e8a2f721c2c0:fbec9a6aec75375d:5ce0368265683524
           e0000  L0 0:a05c0400:20000 20000L/20000P F=1 B=6251037/6251034 cksum=00003fe47d597706:0ffd73e6ed22a7fd:796d5363056d9260:de9d85c78af6ef45

		segment [0000000000000000, 0000000000100000) size    1M
@rrevans rrevans added the Type: Defect Incorrect behavior (e.g. crash, hang) label Mar 19, 2024
@behlendorf behlendorf self-assigned this May 25, 2024
@behlendorf behlendorf added the BRT BRT tracking label May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BRT BRT tracking Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants