Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receive may skip objects in FREEOBJECTS record #6694

Closed
nedbass opened this issue Sep 28, 2017 · 8 comments
Closed

Receive may skip objects in FREEOBJECTS record #6694

nedbass opened this issue Sep 28, 2017 · 8 comments

Comments

@nedbass
Copy link
Contributor

nedbass commented Sep 28, 2017

As reported in #6366 by @jgottula, it appears receive_freeobjects() may have incorrectly skipped over freeing some objects in a FREEOBJECTS record. This left an object allocated on disk on the receiving side which was unallocated in the corresponding snapshot on the sending side, which in turn caused receiving a subsequent incremental stream to fail.

I am opening this new issue to track the problem, as it seems fundamentally different than #6366. @jgottula, the problem object 101 was unlinked in between snapshots pool/Game/TF2/History/ServerLinux@20160625a and pool/Game/TF2/History/ServerLinux@20160707a. For debugging purposes it would be useful to attach a full send of 20160625a and an incremental for 20160707a. To attach directly in pieces maybe try the split command, rather than using an external site (mega.nz is blocked on my network) e.g.

zfs send pool/Game/TF2/History/ServerLinux@20160625a | gzip > full-20160625a.zfs.gz
split -b $(( 9 * 1024 * 1024 )) full-20160625a.zfs.gz full-20160625a.zfs.
for x in full-20160625a.zfs.a* ; do mv $x $x.gz ; done
@nedbass
Copy link
Contributor Author

nedbass commented Sep 28, 2017

@jgottula I guess you may have to give each chunk a .gz extension to make github happy. Amended example above. This way to receive we can just do

cat full-20160625a.zfs.a* | gunzip -c | zfs recv ...

@jgottula
Copy link
Contributor

Yeah I was just thinking about that. Will have something that GitHub can't complain about within a few minutes here.

@jgottula
Copy link
Contributor

jgottula commented Sep 28, 2017

@jgottula
Copy link
Contributor

jgottula commented Sep 29, 2017

Okay, so, I added some printk's to receive_freeobjects and dmu_object_next for debugging purposes.

I did this on a tree at 0c484ab (pretty much the most recent commit on master) where I also un-reverted #6576.

When doing the incremental receive from 20160625a to 20160707a, here's what I get:

[ZFS:receive_freeobjects] rwa: [ds 'lifeboat/test-freeobjects/ServerLinux/%recv']
[ZFS:receive_freeobjects] drrfo: [firstobj 100] [numobjs 16]

[ZFS:receive_freeobjects] obj:100 dmu_object_info returned ENOENT; incrementing obj to 101

	[ZFS:dmu_object_next] initial *objectp: 101
	[ZFS:dmu_object_next] SPA_FEATURE_LARGE_DNODE not in use, so start_obj = 102
	[ZFS:dmu_object_next] offset = cc00
	[ZFS:dmu_object_next] dnode_next_offset set offset = cc00; *objectp = 102

[ZFS:receive_freeobjects] obj:102 dmu_free_long_object OK

	[ZFS:dmu_object_next] initial *objectp: 102
	[ZFS:dmu_object_next] SPA_FEATURE_LARGE_DNODE not in use, so start_obj = 103
	[ZFS:dmu_object_next] offset = ce00
	[ZFS:dmu_object_next] dnode_next_offset set offset = ce00; *objectp = 103

[ZFS:receive_freeobjects] obj:103 dmu_free_long_object OK

	[ZFS:dmu_object_next] initial *objectp: 103
	[ZFS:dmu_object_next] SPA_FEATURE_LARGE_DNODE not in use, so start_obj = 104
	[ZFS:dmu_object_next] offset = d000
	[ZFS:dmu_object_next] dnode_next_offset set offset = d000; *objectp = 104

[ZFS:receive_freeobjects] obj:104 dmu_free_long_object OK

	[ZFS:dmu_object_next] initial *objectp: 104
	[ZFS:dmu_object_next] SPA_FEATURE_LARGE_DNODE not in use, so start_obj = 105
	[ZFS:dmu_object_next] offset = d200
	[ZFS:dmu_object_next] dnode_next_offset set offset = d200; *objectp = 105

[ZFS:receive_freeobjects] obj:105 dmu_free_long_object OK

	[ZFS:dmu_object_next] initial *objectp: 105
	[ZFS:dmu_object_next] SPA_FEATURE_LARGE_DNODE not in use, so start_obj = 106
	[ZFS:dmu_object_next] offset = d400
	[ZFS:dmu_object_next] dnode_next_offset set offset = d400; *objectp = 106

[ZFS:receive_freeobjects] obj:106 dmu_free_long_object OK

	[ZFS:dmu_object_next] initial *objectp: 106
	[ZFS:dmu_object_next] SPA_FEATURE_LARGE_DNODE not in use, so start_obj = 107
	[ZFS:dmu_object_next] offset = d600
	[ZFS:dmu_object_next] dnode_next_offset set offset = d800; *objectp = 108

[ZFS:receive_freeobjects] obj:108 dmu_free_long_object OK

	[ZFS:dmu_object_next] initial *objectp: 108
	[ZFS:dmu_object_next] SPA_FEATURE_LARGE_DNODE not in use, so start_obj = 109
	[ZFS:dmu_object_next] offset = da00
	[ZFS:dmu_object_next] dnode_next_offset set offset = e800; *objectp = 116

Based on this output, you can tell that receive_freeobjects is skipping the obj = 101 iteration of the loop.

Looking at the zdb output for snapshot 20160625a, I see that there's no object 100. So receive_freeobjects gets a FREEOBJECTS record for objects 100 thru 115. When it calls dmu_object_info in the obj = 100 loop iteration and gets ENOENT, it increments obj to 101 and continues to the next loop iteration. But uh... little problem here: going to the next loop iteration involves calling dmu_object_next. Which sets obj to the next object after 101, i.e. 102. Hence, the loop skips object 101 entirely.

I'm reasonably convinced that this particular line added in PR #3542 should probably never have existed.

I did take a look at e6d3a84, which messed with the loop condition/increment slightly, but I'm pretty sure it wouldn't have had any effects relevant to this.

If I'm right about that ENOENT-obj-increment being a mistake, then it potentially means that some of the subsequent commits that came around to fix errors that users were experiencing may or may not have been "fixing" some of them incorrectly. Of particular interest would be PR #5532 from @loli10K and PR #6564 from @ofaaland, both of which made changes to dmu_object_next. (I'm thinking that it's possible that they might have fixed some problems by making dmu_object_next do things that it really ought not to, in order to counteract that extra obj skip; which would then potentially lead to re-breakage if that obj increment is taken out.)

@nedbass What do you think?

nedbass added a commit to nedbass/zfs that referenced this issue Sep 29, 2017
When receiving a FREEOBJECTS record, receive_freeobjects()
incorrectly skips a freed object in some cases. This leaves an object
allocated on disk on the receiving side which is unallocated on the
sending side, which may cause receiving subsequent incremental
streams to fail.

The bug was caused by an incorrect increment of the object index
variable when current object being freed doesn't exist.  The
increment is incorrect because incrementing the object index is
handled by a call to dmu_object_next() in the increment portion of
the for loop statement.

Fixes openzfs#6694
Signed-off-by: Ned Bass <[email protected]>
@nedbass
Copy link
Contributor Author

nedbass commented Sep 29, 2017

@jgottula nice work. I came to the same conclusion and cooked up a patch. I don't think #5532 or #6564 were related to this issue, however. I came up with an simple reproducer for this bug, but I decided it was too specific to add as a test case for the test suite. @behlendorf let me know if you disagree and would like to see this added to the test suite.

#!/bin/sh

# 1. Create two files with sequential object numbers, f1 and f2
# 2. Delete the lower-numbered object
# 3. Take snapshot A
# 4. Delete the higher-numbered object
# 5. Take snapshot B
# 6. Receive a full send of A
# 7. Receive an incremental send of B
# 8. Fail if f2 exists on received snapshot B

truncate -s 512m /tmp/tank
zpool create -f tank /tmp/tank
zfs create -o mountpoint=/tmp/fish tank/fish

f1=/tmp/fish/f1
f2=/tmp/fish/f2

while :; do
        touch $f1
        touch $f2
        o1=`ls -li $f1 | awk '{print $1}'`
        o2=`ls -li $f2 | awk '{print $1}'`

        if [ $o2 -ne $(( $o1 + 1 )) ] ; then
                rm $f1 $f2
        else
                break
        fi
done

rm $f1
zfs snap tank/fish@A
rm $f2
zfs snap tank/fish@B

zfs send tank/fish@A | zfs recv tank/fish2
zfs send -i tank/fish@A tank/fish@B | zfs recv tank/fish2

zdb -dddd tank/fish2@B $o2

if [ $? -eq 0 ] ; then
        echo FAIL
else
        echo PASS
fi

zpool destroy tank

nedbass added a commit to nedbass/zfs that referenced this issue Sep 29, 2017
When receiving a FREEOBJECTS record, receive_freeobjects()
incorrectly skips a freed object in some cases. Specifically, this
happens when the first object in the range to be freed doesn't exist,
but the second object does. This leaves an object allocated on disk
on the receiving side which is unallocated on the sending side, which
may cause receiving subsequent incremental streams to fail.

The bug was caused by an incorrect increment of the object index
variable when current object being freed doesn't exist.  The
increment is incorrect because incrementing the object index is
handled by a call to dmu_object_next() in the increment portion of
the for loop statement.

Add test case that exposes this bug.

Fixes openzfs#6694
Signed-off-by: Ned Bass <[email protected]>
nedbass added a commit to nedbass/zfs that referenced this issue Sep 29, 2017
When receiving a FREEOBJECTS record, receive_freeobjects()
incorrectly skips a freed object in some cases. Specifically, this
happens when the first object in the range to be freed doesn't exist,
but the second object does. This leaves an object allocated on disk
on the receiving side which is unallocated on the sending side, which
may cause receiving subsequent incremental streams to fail.

The bug was caused by an incorrect increment of the object index
variable when current object being freed doesn't exist.  The
increment is incorrect because incrementing the object index is
handled by a call to dmu_object_next() in the increment portion of
the for loop statement.

Add test case that exposes this bug.

Fixes openzfs#6694
Signed-off-by: Ned Bass <[email protected]>
@jgottula
Copy link
Contributor

@nedbass Thanks for your help!

I've been doing some testing with (master + this fix patch + #6576 un-reverted), and I'm now encountering no errors when receiving datasets that were previously problematic.

I'll see whether any other unanticipated issues crop up, or if I can finally get all of my data off of this dying pool. 😛

nedbass added a commit to nedbass/zfs that referenced this issue Sep 29, 2017
When receiving a FREEOBJECTS record, receive_freeobjects()
incorrectly skips a freed object in some cases. Specifically, this
happens when the first object in the range to be freed doesn't exist,
but the second object does. This leaves an object allocated on disk
on the receiving side which is unallocated on the sending side, which
may cause receiving subsequent incremental streams to fail.

The bug was caused by an incorrect increment of the object index
variable when current object being freed doesn't exist.  The
increment is incorrect because incrementing the object index is
handled by a call to dmu_object_next() in the increment portion of
the for loop statement.

Add test case that exposes this bug.

Fixes openzfs#6694
Signed-off-by: Ned Bass <[email protected]>
nedbass added a commit to nedbass/zfs that referenced this issue Sep 29, 2017
When receiving a FREEOBJECTS record, receive_freeobjects()
incorrectly skips a freed object in some cases. Specifically, this
happens when the first object in the range to be freed doesn't exist,
but the second object does. This leaves an object allocated on disk
on the receiving side which is unallocated on the sending side, which
may cause receiving subsequent incremental streams to fail.

The bug was caused by an incorrect increment of the object index
variable when current object being freed doesn't exist.  The
increment is incorrect because incrementing the object index is
handled by a call to dmu_object_next() in the increment portion of
the for loop statement.

Add test case that exposes this bug.

Fixes openzfs#6694
Signed-off-by: Ned Bass <[email protected]>
@jgottula
Copy link
Contributor

jgottula commented Oct 1, 2017

@nedbass Well, this fix did resolve a number of problematic datasets that previously wouldn't receive. But I'm still running into errors with a few particular datasets.

Those ones are still giving me the unhelpful "cannot receive incremental stream: incompatible embedded data stream feature with encrypted receive" error message indicative of some kind of EINVAL coming from somewhere.

(Note that with the latest receives I'm doing with these datasets, I've been making sure to first destroy the partially-received dataset already on the destination pool, if any, and then starting over with a fresh replication stream of that dataset, since the problem here involved previously received snapshots failing to free objects.)

I'll try to do some digging soon to determine if these errors look like they're still due to some variation on this failure-to-free-objects situation, or if they're due to that changed-dnodesize situation from #6366, or if this is something else entirely. (I am still using a tree with #6576 un-reverted for these receives, so presumably it's not a dnodesize thing...)

@jgottula
Copy link
Contributor

jgottula commented Oct 2, 2017

Okay so here's the excerpt from dbgmsg:

1506896558   dnode.c:1379:dnode_hold_impl(): error 28
1506896558   dmu_send.c:2495:receive_object(): error 22
1506896558   dmu_send.c:3285:dprintf_drr(): drr_type = OBJECT obj = 8608 type = 19 bonustype = 44 blksz = 36864 bonuslen = 168 cksumtype = 0 compress = 0 dn_slots = 2 err = 22

Will have to do some more investigation on this.

Unfortunately, the 3 datasets of mine that are still being affected by this error all contain somewhat-private data, so I'll have to take a close look myself before I go uploading anything.

(And as I don't yet know whether this is a #6366 problem or a #6694 problem, I'll probably just keep posting here for now.)

aerusso pushed a commit to aerusso/zfs that referenced this issue Oct 11, 2017
When receiving a FREEOBJECTS record, receive_freeobjects()
incorrectly skips a freed object in some cases. Specifically, this
happens when the first object in the range to be freed doesn't exist,
but the second object does. This leaves an object allocated on disk
on the receiving side which is unallocated on the sending side, which
may cause receiving subsequent incremental streams to fail.

The bug was caused by an incorrect increment of the object index
variable when current object being freed doesn't exist.  The
increment is incorrect because incrementing the object index is
handled by a call to dmu_object_next() in the increment portion of
the for loop statement.

Add test case that exposes this bug.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ned Bass <[email protected]>
Closes openzfs#6694
Closes openzfs#6695
tonyhutter pushed a commit that referenced this issue Oct 16, 2017
When receiving a FREEOBJECTS record, receive_freeobjects()
incorrectly skips a freed object in some cases. Specifically, this
happens when the first object in the range to be freed doesn't exist,
but the second object does. This leaves an object allocated on disk
on the receiving side which is unallocated on the sending side, which
may cause receiving subsequent incremental streams to fail.

The bug was caused by an incorrect increment of the object index
variable when current object being freed doesn't exist.  The
increment is incorrect because incrementing the object index is
handled by a call to dmu_object_next() in the increment portion of
the for loop statement.

Add test case that exposes this bug.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ned Bass <[email protected]>
Closes #6694
Closes #6695
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Apr 28, 2019
When receiving a FREEOBJECTS record, receive_freeobjects()
incorrectly skips a freed object in some cases. Specifically, this
happens when the first object in the range to be freed doesn't exist,
but the second object does. This leaves an object allocated on disk
on the receiving side which is unallocated on the sending side, which
may cause receiving subsequent incremental streams to fail.

The bug was caused by an incorrect increment of the object index
variable when current object being freed doesn't exist.  The
increment is incorrect because incrementing the object index is
handled by a call to dmu_object_next() in the increment portion of
the for loop statement.

Add test case that exposes this bug.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ned Bass <[email protected]>
Closes openzfs#6694 
Closes openzfs#6695
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants