send/recv compatibility with 0.6.5.x #6616

Fabian-Gruenbichler · 2017-09-08T06:31:46Z

Description

ZFS 0.6.5.x does not handle large FREEOBJECTS records referencing non-existing objects well. ZFS 0.7 separately introduced two changes which lead to an incompatibility when sending from ZFS 0.7.x to ZFS 0.6.5.x:

Receiving full streams as clones (#4221)

generates FREE and FREEOBJECTS records even when generating a full stream, to allow those streams to be received as clones of existing datasets.

Increasing indirect block sizes (#5679)

blows up the number of objects contained in the final FREEOBJECTS record for the "logical hole" of unused space at the end of most datasets.

This PR caps the FREEOBJECTS records at the maximum possible USED object ID. For incremental streams, this is no problem. For full streams received as clones, this means we need an extra step of freeing any potentially leftover objects of the origin dataset, after processing the whole stream. The latter breaks compatibility for receiving full streams generated with this patch as clones on systems without this patch.

How Has This Been Tested?

Sending from patched to patched, patched to non-patched (see caveat in description), and patched to 0.6.5.11.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.
Change has been approved by a ZFS on Linux member.

behlendorf

Thanks! I'd suggest squashing the module_param() commit and its documentation commit together.

More fundamentally I think it's worth re-evaluating the decision to not add a feature flag to the send stream for this functionality. The comment in zfs_ioctl.h clearly states it wasn't added initially because the stream format was thought to be fully backwards compatible.

+ * This is not implemented as a feature flag, because the receiving side does
+ * not need to have implemented it to receive this stream; it is fully backward
+ * compatible.  We need a flag, though, because full send streams without it
+ * cannot necessarily be received as a clone correctly.

I haven't dug in to the details but if that's not going to be possible we should consider converting this module option in to a proper zfs send command line option and disabling this flag by default for compatibility.

behlendorf · 2017-09-08T16:34:19Z

module/zfs/dmu_send.c

@@ -4010,4 +4010,7 @@ dmu_objset_is_receiving(objset_t *os)
 #if defined(_KERNEL)
 module_param(zfs_send_corrupt_data, int, 0644);
 MODULE_PARM_DESC(zfs_send_corrupt_data, "Allow sending corrupt data");
+
+module_param(zfs_send_set_freerecords_bit, int, 0644);
+MODULE_PARM_DESC(zfs_send_set_freerecords_bit, "Do not set freerecords bit for backwards compatibility with 0.6.5.x");


nit: 80 character limit.

You can wrap the block with /* BEGIN CSTYLED */ /* END CSTYLED */ to make the style checker happy about the indent. See the bottom of arc.c for an example.

Fabian-Gruenbichler · 2017-09-13T12:06:13Z

okay, so further bisecting shows that "OpenZFS 6393 - zfs receive a full send as a clone" (#4221) is NOT the actual culprit here. while it does change the behaviour for full streams (which now include FREEOBJECTS records), that change alone would indeed (like noted in the comment) be backwards compatible. this is a bit hidden by the fact that it introduces improved error handling on the receiving side, which meant that the actual backwards-incompatibility was not noticed later on.

the first commit in the 0.6.5 -> 0.7.0 history which actually introduces the breaking behaviour is "OpenZFS 7104 - increase indirect block size" (d7958b4 / #5679), which bumps the default/max indirect block size (shift) by 3 orders of magnitude (from 14/16K to 17/128K). this value is included when calculating the BP_SPAN when dumping a hole, which affects the numobjs for the FREEOBJECTS record as well as introducing an additional 0,0 FREEOBJECTS record (the latter does not seem to be a problem).

test case:

following pool setup:

sudo modprobe zfs

sudo zpool create testpool /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1
sudo zfs create testpool/send

echo 'aaaa' | sudo tee /testpool/send/testfile
sudo zfs snapshot testpool/send@snap1
echo 'bbbb' | sudo tee /testpool/send/testfile
sudo zfs snapshot testpool/send@snap2

dumping a full and incremental replication stream using:

sudo zfs send -Rv testpool/send@snap1 | sudo zstreamdump -v > full.dump
sudo zfs send -Rv -I testpool/send@snap1 testpool/send@snap2 | sudo zstreamdump -v > incremental.dump

0.7.0 produces streams that end in:

==> 0.7/full.dump <==
FREEOBJECTS firstobj = 64 numobjs = 36028797018963904
FREEOBJECTS firstobj = 0 numobjs = 0
END checksum = 1a441875cdb/26d2f0c3ead633/3eacb4cb93d74e87/941e0bf139b514c9
END checksum = 0/0/0/0
SUMMARY:
        Total DRR_BEGIN records = 2
        Total DRR_END records = 3
        Total DRR_OBJECT records = 7
        Total DRR_FREEOBJECTS records = 5
        Total DRR_WRITE records = 8
        Total DRR_WRITE_BYREF records = 0
        Total DRR_WRITE_EMBEDDED records = 0
        Total DRR_FREE records = 12
        Total DRR_SPILL records = 0
        Total records = 37
        Total write size = 36864 (0x9000)
        Total stream length = 49204 (0xc034)

==> 0.7/incremental.dump <==
FREEOBJECTS firstobj = 64 numobjs = 36028797018963904
FREEOBJECTS firstobj = 0 numobjs = 0
END checksum = 309fbc68a9/685551b2ff9a/927e876e829f47/977bfcebffe77a14
END checksum = 0/0/0/0
SUMMARY:
        Total DRR_BEGIN records = 2
        Total DRR_END records = 3
        Total DRR_OBJECT records = 2
        Total DRR_FREEOBJECTS records = 4
        Total DRR_WRITE records = 1
        Total DRR_WRITE_BYREF records = 0
        Total DRR_WRITE_EMBEDDED records = 0
        Total DRR_FREE records = 3
        Total DRR_SPILL records = 0
        Total records = 15
        Total write size = 512 (0x200)
        Total stream length = 5940 (0x1734)

the full stream already hangs zfs recv on 0.6.5.11 on the receiving side.

applying the following patch with a sort of revert of d7958b4, on top of 0.7.0:

diff --git a/include/sys/dnode.h b/include/sys/dnode.h
index d32855d..edb0575 100644
--- a/include/sys/dnode.h
+++ b/include/sys/dnode.h
@@ -65,7 +65,7 @@ extern "C" {
  * 4 levels of indirect blocks would not be able to guarantee addressing an
  * entire object, so 5 levels will be used, but 5 * (20 - 7) = 65.
  */
-#define        DN_MAX_INDBLKSHIFT      17      /* 128k */
+#define        DN_MAX_INDBLKSHIFT      14      /* 16k */
 #define        DNODE_BLOCK_SHIFT       14      /* 16k */
 #define        DNODE_CORE_SIZE         64      /* 64 bytes for dnode sans blkptrs */
 #define        DN_MAX_OBJECT_SHIFT     48      /* 256 trillion (zfs_fid_t limit) */

now, the dumped streams end like this:

==> 0.7-patched/full.dump <==
FREEOBJECTS firstobj = 64 numobjs = 422212465065920
END checksum = 1c6379af216/2bf9931acc6d8d/a5b2c2ebf3d70779/5ff05388267cbd3d
END checksum = 0/0/0/0
SUMMARY:
        Total DRR_BEGIN records = 2
        Total DRR_END records = 3
        Total DRR_OBJECT records = 7
        Total DRR_FREEOBJECTS records = 4
        Total DRR_WRITE records = 8
        Total DRR_WRITE_BYREF records = 0
        Total DRR_WRITE_EMBEDDED records = 0
        Total DRR_FREE records = 12
        Total DRR_SPILL records = 0
        Total records = 36
        Total write size = 36864 (0x9000)
        Total stream length = 48892 (0xbefc)

==> 0.7-patched/incremental.dump <==
FREEOBJECTS firstobj = 64 numobjs = 422212465065920
END checksum = 32e89f1cf2/68d551585a31/888a46da9f1a2f/8396c872030a02ce
END checksum = 0/0/0/0
SUMMARY:
        Total DRR_BEGIN records = 2
        Total DRR_END records = 3
        Total DRR_OBJECT records = 2
        Total DRR_FREEOBJECTS records = 3
        Total DRR_WRITE records = 1
        Total DRR_WRITE_BYREF records = 0
        Total DRR_WRITE_EMBEDDED records = 0
        Total DRR_FREE records = 3
        Total DRR_SPILL records = 0
        Total records = 14
        Total write size = 512 (0x200)
        Total stream length = 5628 (0x15fc)

and sending both the full and the incremental stream to a 0.6.5.11 system works as expected.

I only tested with 0.7.0, but I am pretty sure the same applies to 0.7.1 and master.

is there any way to get out of this mess besides backporting the improved receive handling to 0.6.5.11 like in #6602?

Fabian-Gruenbichler · 2017-09-18T09:16:58Z

@behlendorf @ahrens any change either of you could take a look at this? am I correct in assuming that bumping the indirect block size shift default value and limit does not change the actual on disk format of zpools, because otherwise it would have been hidden behind a pool feature flag and not just bumped between 0.6.5 and 0.7?

if so, one way to ease upgrades would be to revert this change and postpone it (either to 0.8 or a later point in 0.7), at least for downstreams relying on replication using zfs send/recv..

behlendorf · 2017-09-20T19:54:18Z

@Fabian-Gruenbichler thanks for isolating the exact commit. To answer your question increasing the indirect block size didn't require a feature flag at the time since the size was already being stored on disk. Older versions of the software are capable of handling this larger size. That all worked as expected, where things seem to have gone wrong is this accidental change to the send stream.

if so, one way to ease upgrades would be to revert this change and postpone it

Unfortunately since this size is stored on disk that would only resolve the issue for new pools, not for existing pools. What I think might work would be updating do_dump() to assume the previous maximum indirect shift when generating DRR_FREEOBJECTS records. It looks like you could do this with a for loop in the block you referenced.

We should try and get @dankimmel's thoughts on this too since he's familiar with all this code. He may have a better idea.

ahrens · 2017-09-20T20:44:40Z

To make sure I understand, let me summarize:

Some FREEOBJECTS records have very large number of objects, which confuses (appears to hang) old code (0.6.5). These confusing FREEOBJECTS records are generated by both full and incremental sends from current (0.7) bits. This PR makes it possible to set a tunable which changes the full send stream such that it doesn't generate these confusing records (but receive full stream as a clone won't work).

So this PR is essentially a workaround (requires setting tunable) for part of the bug (only fixes full send streams). While this is an improvement over the current state, I think we need to find a more complete solution -- at least to workaround the problem for incrementals too, and hopefully to actually fix the bug, without requiring changing tunables.

@pcd1193182 any ideas?

ahrens · 2017-09-20T20:52:51Z

For incrementals, I think we know that there can't be any objects after dn_maxblkid * DNODES_PER_BLOCK. So we don't really need to FREEOBJECTS any objects after that. We could use this knowledge to reduce the numobjs in the big FREEOBJECTS record, which might be enough to eliminate the "hang".

For full sends, we don't know what the max objid is on the target, when receiving as a clone. But we could change the code on the receiving system to make it free all objects after the last object in the send stream. An older system couldn't receive-whole-send-as-a-clone from a system with this change, but "receive whole send as a clone" is a relatively recent, and rarely used feature, so this is much less bad than not being to send from 0.7 to earlier releases.

Fabian-Gruenbichler · 2017-09-21T06:58:53Z

Matthew Ahrens wrote:

Some FREEOBJECTS records have very large number of objects, which confuses (appears to hang) old code (0.6.5). These confusing FREEOBJECTS records are generated by both full and incremental sends from current (0.7) bits. This PR makes it possible to set a tunable which changes the full send stream such that it doesn't generate these confusing records (but receive full stream as a clone won't work). So this PR is essentially a workaround (requires setting tunable) for part of the bug (only fixes full send streams). While this is an improvement over the current state, I think we need to find a more complete solution -- at least to workaround the problem for incrementals too, and hopefully to actually fix the bug, without requiring changing tunables.

summary sounds correct :) AFAICT, it's usually one such FREEOBJECTS record at the very end of the stream, presumably to free all the (coalesced) holes? that might just be a result of my test cases though, and for actual real world streams the structure might looks a bit different. the problem is always the huge number of objects though, because the receiving (0.6.5) side does not skip the whole bunch on the first non-existing object, but actually iterates over all of them.

For incrementals, I think we know that there can't be any objects after dn_maxblkid * DNODES_PER_BLOCK. So we don't really need to FREEOBJECTS any objects after that. We could use this knowledge to reduce the numobjs in the big FREEOBJECTS record, which might be enough to eliminate the "hang".

that sounds like a direction worth investigating - I will include this and @behlendorf's feedback (moving the module parameter to a send/recv flag instead) and push something later on today or tomorrow.

For full sends, we don't know what the max objid is on the target, when receiving as a clone. But we could change the code on the receiving system to make it free all objects after the last object in the send stream. An older system couldn't receive-whole-send-as-a-clone from a system with this change, but "receive whole send as a clone" is a relatively recent, and rarely used feature, so this is much less bad than not being to send from 0.7 to earlier releases.

not sure if this is worth it - if we need a fix on the receiving side, the most straight-forward one IMHO is my other PR backporting the "skip rest of FREEOBJECTS record if object does not exist" commits ;) ideally, we could fix this mess on the sender side only - in many scenarios, the target of a send/recv is not easily updatable (think centralized backup server for lots of infrastructure with conservative upgrade policy).

ahrens · 2017-09-21T16:08:26Z

not sure if this is worth it - if we need a fix on the receiving side, the most straight-forward one IMHO is my other PR backporting the "skip rest of FREEOBJECTS record if object does not exist" commits ;) ideally, we could fix this mess on the sender side only - in many scenarios, the target of a send/recv is not easily updatable (think centralized backup server for lots of infrastructure with conservative upgrade policy).

You're proposing that we need a fix on the receiving side in order to receive any full send stream. I'm proposing that we need a fix on the receiving side in order to receive a full send as a clone. That's a significant difference.

Fabian-Gruenbichler · 2017-09-22T12:24:38Z

sorry, misunderstood your point there. what you mean is:

re-introduce the early return in dump_freeobjects for full streams, and no longer set the DRR flag
instead free all objects which were not explicitly part of the stream when receiving as clone

together with reducing the incremental FREEOBJECTS records to the needed minimum, this would allow sending regular streams from > 0.7.1 to 0.6.5.x again. receiving full streams as clones would still not be supported on 0.6.5.x. sending from 0.7.0/1 to > 0.7.1 would work for all combinations if we keep the receive compatibility. but sending from > 0.7.1 to 0.7.0/1 would no longer support receiving as full clone.. I still wonder whether putting the full stream with FREE(OBJECTS) behind a send feature flag would not be easier.. that would not require any changes on the receiving side..

Fabian-Gruenbichler · 2017-09-22T13:07:43Z

took a first stab at implementing the max possible object ID, and fixed some issues with the previous iteration. haven't done extensive testing yet and this is my first foray into the inner workings of ZFS, so possibly I missed some important aspects ;) did some basic sending/receiving between patched master and patched master as well as 0.6.5.11 which showed no obvious errors.

codecov · 2017-09-25T12:51:10Z

Codecov Report

Merging #6616 into master will increase coverage by 0.19%.
The diff coverage is 74.28%.

@@            Coverage Diff             @@
##           master    #6616      +/-   ##
==========================================
+ Coverage   74.09%   74.28%   +0.19%     
==========================================
  Files         295      295              
  Lines       93882    93920      +38     
==========================================
+ Hits        69558    69767     +209     
+ Misses      24324    24153     -171

Flag	Coverage Δ
#kernel	`74.47% <81.25%> (+0.08%)`	⬆️
#user	`63.66% <0%> (+0.36%)`	⬆️

Impacted Files	Coverage Δ
module/zfs/dmu_send.c	`75.04% <74.28%> (-0.25%)`	⬇️
module/icp/algs/skein/skein_port.h	`0% <0%> (-100%)`	⬇️
module/zfs/vdev_file.c	`82.02% <0%> (-2.25%)`	⬇️
module/zfs/dsl_userhold.c	`88.49% <0%> (-1.2%)`	⬇️
module/zfs/vdev_label.c	`94.42% <0%> (-0.6%)`	⬇️
module/zfs/zio.c	`92.03% <0%> (-0.5%)`	⬇️
module/zfs/spa_history.c	`93.66% <0%> (-0.46%)`	⬇️
cmd/zdb/zdb.c	`61.49% <0%> (-0.24%)`	⬇️
module/zfs/vdev_raidz.c	`97.83% <0%> (-0.23%)`	⬇️
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 39f5662...9c2f987. Read the comment docs.

Fabian-Gruenbichler · 2017-09-25T12:51:39Z

pushed a new iteration to make style checker happy and reduce code duplication.

Stale

ahrens · 2017-09-25T23:43:46Z

re-introduce the early return in dump_freeobjects for full streams, and no longer set the DRR flag
instead free all objects which were not explicitly part of the stream when receiving as clone

Actually, I meant to keep setting the DRR flag, but truncate the last FREEOBJECTS record based on maxblkid as we do for incrementals, and have the receiver free any of their objects that are past the last OBJECT/FREEOBJECTS record.

I still wonder whether putting the full stream with FREE(OBJECTS) behind a send feature flag would not be easier.. that would not require any changes on the receiving side..

Yeah, we could require that you do zfs send --receivable-as-clone (better name/flag TBD) to enable the new DRR flag and FREE/FREEOBJECTS records.

ahrens

The dsa_incremental_maxobj code is nice. Have you been able to test it out, sending to an older release of ZFS?

ahrens · 2017-09-25T23:46:40Z

module/zfs/dmu_send.c

+
+		/*
+		 * reduce numobjs according to maximum possible object id of
+		 * base snapshot


These comments say what is happening, but it isn't much more work to read the code to understand that.

It would be really useful to have a comment explaining why we are doing this (especially since the receive code (now) handles longer FREEOBJECTS records just fine).

yep, makes sense!

ahrens · 2017-09-25T23:51:27Z

module/zfs/dmu_send.c

@@ -934,7 +967,7 @@ dmu_send_impl(void *tag, dsl_pool_t *dp, dsl_dataset_t *to_ds,
    zfs_bookmark_phys_t *ancestor_zb, boolean_t is_clone,
    boolean_t embedok, boolean_t large_block_ok, boolean_t compressok,
    boolean_t rawok, int outfd, uint64_t resumeobj, uint64_t resumeoff,
-    vnode_t *vp, offset_t *off)
+    vnode_t *vp, offset_t *off, uint64_t incremental_maxobj)


I think this function can figure out incremental_maxobj instead of making the caller pass it in. In addition to making the interface cleaner, it looks like we can eliminate some duplicated code in the callers.

but we need the maxobj of the fromsnap dataset (because those are the objects we potentially free, unless I misunderstood something..), and here we only have the ancestor_zb (which could be a bookmark!). I figured we would not want to duplicate all the dataset finding code from dmu_send(_obj) in dmu_send_impl ? but maybe I am just missing the right helper to get from zfs_bookmark_phys_t to dnode_phys_t ;)

or should we use tosnap to get maxblkid, because it is guarantueed that maxblkid does not get smaller from one snapshot to the next (not sure about this?) - that would indeed make it easier, and also be reusable for the full send proposal..

Fabian-Gruenbichler · 2017-09-26T06:17:16Z

yeah, I did some basic tests (lots of files, big files, holes in files, deleting lots of files) and all sends from patched master to patched master and to vanilla 0.6.5.11 worked as expected, and the test suite results look good as well.

Fabian-Gruenbichler · 2017-09-26T12:44:10Z

and a new iteration:

make send full stream as clone-receivable opt-in via a flag, instead of opt-out via module parameter
calculate max obj id based on "to" dataset, instead of from (less code, support for bookmarks as base of incremental stream)

I haven't tested resuming, and actually receving a full stream as clone yet ;)

Fabian-Gruenbichler · 2017-09-27T06:52:13Z

rebased, fixed the test case about receiving full streams as clone to use the new '-C' flag, and actually allow using the '-C' flag when sending.

pcd1193182 · 2017-09-27T21:53:26Z

I don't love having to use a flag for this... the feature should really just work transparently. Having different "kinds" of full sends just rubs me the wrong way. I think I prefer Matt's proposal for full sends in #6616 (comment) . It requires some receive changes, but they wouldn't be extensive, and it gets the behavior we want in pretty much every case. I'd be happy to help outline the receive changes if needed.

Fabian-Gruenbichler · 2017-10-03T10:24:32Z

pushed new iteration, now trimming freeobjects records unconditionally for both types of streams, and keeping track of objects referenced/touched when receiving streams, using that information to free all leftover objects after that mark when receiving a full stream as clone.

as discussed with @pcd1193182 on IRC, this would mean that ZoL > 0.7.2 is incompatible regarding the "receive full stream as clone" feature with ZoL 0.7.0-0.7.2, but it would be fully compatible with ZoL 0.6.5.x regarding full and incremental streams without changes in 0.6.5.

feedback welcome ;)

behlendorf

This strikes me as a pretty reasonable solution. We can live with the "receive full stream as clone" feature being incompatible with 0.7.0 - 0.7.2 tags. The most important thing is that full and incremental streams will be compatible again with 0.6.5.x as originally intended.

behlendorf · 2017-10-03T20:03:32Z

module/zfs/dmu_send.c

+
+		for (obj = rwa->last_touched_object + 1;
+		    next_err == 0;
+		    next_err = dmu_object_next(rwa->os, &obj, FALSE, 0)) {


nit: I think rewriting this in the form while (next_err == 0) { ... } would be easier to read.

that was lifted from receive_freeobjects (https://github.com/zfsonlinux/zfs/blob/master/module/zfs/dmu_send.c#L2550), but since the loop condition is much simpler here converting to a while makes sense.

on second thought - with the pesky continue we'd either need two calls to dmu_object_next or a goto, so I am not so sure..

https://gist.github.com/Fabian-Gruenbichler/8632d82204237a6219ba40c1258d261d
https://gist.github.com/Fabian-Gruenbichler/6234f96c6a33397561541f75889a7d3b

which of the three would be preferred?

That makes sense, my preference would be for the last one
https://gist.github.com/Fabian-Gruenbichler/6234f96c6a33397561541f75889a7d3b

behlendorf · 2017-10-03T20:10:49Z

module/zfs/dmu_send.c

@@ -2150,6 +2151,7 @@ struct receive_writer_arg {
 	boolean_t resumable;
 	boolean_t raw;
 	uint64_t last_object, last_offset;
+	uint64_t last_touched_object;


I think using last_touched_object for a name is misleading. To me it implies the last modified object in the stream, when in fact you're tracking the largest_object in the stream.

yes, I was thinking of "last" as in "highest ID", not "last" as in "last in stream". largest is also misleading though, as it could refer to size? maybe something like max_objid ?

Yes, good point. max_objid makes sense to me, or even max_object which would be consistent with last_object.

While you're updating this could you also split last_object and last_offset from the line above so each is declared on its own line.

behlendorf · 2017-10-03T20:13:00Z

module/zfs/dmu_send.c

@@ -3720,6 +3746,38 @@ dmu_recv_stream(dmu_recv_cookie_t *drc, vnode_t *vp, offset_t *voffp,
 	}
 	mutex_exit(&rwa->mutex);

+	/*
+	 * if we are receiving a full stream as clone, we need to free leftover
+	 * objects after the last one referenced in the stream.


I think you mean largest one referenced in the stream, not last.

leftover objects with IDs greater than the maximum ID referenced in the stream.

?

Yeah, how about something like this to make it perfectly clear.

"If we are receiving a full stream as a clone, all object ids which
are greater than the maximum ID referenced in the stream are
by definition unused and must be freed."

behlendorf · 2017-10-03T20:14:40Z

module/zfs/dmu_send.c

@@ -3720,6 +3746,38 @@ dmu_recv_stream(dmu_recv_cookie_t *drc, vnode_t *vp, offset_t *voffp,
 	}
 	mutex_exit(&rwa->mutex);

+	/*
+	 * if we are receiving a full stream as clone, we need to free leftover


nit: s/if we/If we (capital leading i)

Fabian-Gruenbichler · 2017-10-04T07:58:42Z

updated PR description for (hopefully) final approach

all objects after the last written or freed object are not supposed to exist after receiving the stream. free them accordingly, as if a freeobjects record for them had been included in the stream. Signed-off-by: Fabian Grünbichler <[email protected]>

Fabian-Gruenbichler · 2017-10-05T11:05:24Z

updated and included all of the style feedback.

I reduced the while loop even further by skipping the call to dmu_object_info. it was completely redundant, since it was only called to check if the object exists before calling dmu_free_long_object - which itself begins with calling dmu_free_long_range, which begins with the exact same call to dnode_hold with identical parameters and error handling like dmu_object_info.

I wonder if we should not do the same in receive_freeobjects as well?

Fabian-Gruenbichler · 2017-10-06T07:29:36Z

removed WIP from title - this is ready to get merged from my side.

@ahrens @pcd1193182 any objections?

behlendorf

Looks good. Once everyone's happy with this we'll want to upstream it to OpenZFS.

odoucet · 2017-10-06T17:40:56Z

Thank you Fabian for this patch !
When can we expect 0.7.3 to be released with this patch ? Myself (and many people I guess) are stuck from upgrading to 0.7.x due to this bug.

pcd1193182 · 2017-10-07T17:15:06Z

module/zfs/dmu_send.c

@@ -2824,6 +2861,9 @@ receive_free(struct receive_writer_arg *rwa, struct drr_free *drrf)
 	if (dmu_object_info(rwa->os, drrf->drr_object, NULL) != 0)
 		return (SET_ERROR(EINVAL));

+	if (drrf->drr_object > rwa->max_object)


You don't actually need these, since we only consider this value when doing a receive of a full send as a clone, and in that case you have to get an object record before you get any other kind of record for that object. Doesn't hurt to have them, though, and to have it be accurate in all cases.

any other kind of record except FREEOBJECT ;) I originally attempted to just re-use last_object, but that obviously did not work out.

I did not feel comfortable enough with my level of understanding of the code to play guessing games (regarding which patterns are possible and which aren't), so I went the "better safe than sorry" route :)

When sending an incremental stream based on a snapshot, the receiving side must have the same base snapshot. Thus we do not need to send FREEOBJECTS records for any objects past the maximum one which exists locally. This allows us to send incremental streams (again) to older ZFS implementations (e.g. ZoL < 0.7) which actually try to free all objects in a FREEOBJECTS record, instead of bailing out early. Reviewed by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Fabian Grünbichler <[email protected]> Closes #5699 Closes #6507 Closes #6616

ahrens · 2017-10-10T23:53:19Z

Changes look good. Sorry I didn't have a chance to look at this before it was integrated (was on vacation last week). Is anyone signed up to port this to OpenZFS/illumos?

All objects after the last written or freed object are not supposed to exist after receiving the stream. Free them accordingly, as if a freeobjects record for them had been included in the stream. Reviewed by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Fabian Grünbichler <[email protected]> Closes openzfs#5699 Closes openzfs#6507 Closes openzfs#6616

When sending an incremental stream based on a snapshot, the receiving side must have the same base snapshot. Thus we do not need to send FREEOBJECTS records for any objects past the maximum one which exists locally. This allows us to send incremental streams (again) to older ZFS implementations (e.g. ZoL < 0.7) which actually try to free all objects in a FREEOBJECTS record, instead of bailing out early. Reviewed by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Fabian Grünbichler <[email protected]> Closes openzfs#5699 Closes openzfs#6507 Closes openzfs#6616

All objects after the last written or freed object are not supposed to exist after receiving the stream. Free them accordingly, as if a freeobjects record for them had been included in the stream. Reviewed by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Fabian Grünbichler <[email protected]> Closes openzfs#5699 Closes openzfs#6507 Closes openzfs#6616

When sending an incremental stream based on a snapshot, the receiving side must have the same base snapshot. Thus we do not need to send FREEOBJECTS records for any objects past the maximum one which exists locally. This allows us to send incremental streams (again) to older ZFS implementations (e.g. ZoL < 0.7) which actually try to free all objects in a FREEOBJECTS record, instead of bailing out early. Reviewed by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Fabian Grünbichler <[email protected]> Closes openzfs#5699 Closes openzfs#6507 Closes openzfs#6616

All objects after the last written or freed object are not supposed to exist after receiving the stream. Free them accordingly, as if a freeobjects record for them had been included in the stream. Reviewed by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Fabian Grünbichler <[email protected]> Closes #5699 Closes #6507 Closes #6616

When sending an incremental stream based on a snapshot, the receiving side must have the same base snapshot. Thus we do not need to send FREEOBJECTS records for any objects past the maximum one which exists locally. This allows us to send incremental streams (again) to older ZFS implementations (e.g. ZoL < 0.7) which actually try to free all objects in a FREEOBJECTS record, instead of bailing out early. Reviewed by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Fabian Grünbichler <[email protected]> Closes #5699 Closes #6507 Closes #6616

waikontse · 2018-08-03T12:36:18Z

Hi,

I know the issue has been closed almost a year now, but the fix for the issue has been partially ported to the openzfs repo. The partial code commit is at openzfs/openzfs#631, or the code looks almost identical. The missing part is where backwards compatiblity is introduced, when receiving a dataset that's been 'send' in a newer version of zfs.

I don't know if there is any needs/wants to merge the missing part to the openzfs repo, but I had to patch the FreeBSD 11.2 zfs tree because I still have servers deployed running on 10.3.

I could try to make a pr for the patch, but looks like there is a lot of red tape to me.

Fabian-Gruenbichler · 2018-08-07T09:16:16Z

@waikontse feel free to port it (or file an issue) - the backwards compatibility commit is very small and should be straight forward. I don't have the capacity atm do it myself unfortunately.

waikontse · 2018-08-07T11:36:58Z

@Fabian-Gruenbichler You're right, it's 10 lines of code max I believe. I'll try to have it merged on openzfs. Thanks.

All objects after the last written or freed object are not supposed to exist after receiving the stream. Free them accordingly, as if a freeobjects record for them had been included in the stream. Reviewed by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Fabian Grünbichler <[email protected]> Closes openzfs#5699 Closes openzfs#6507 Closes openzfs#6616

Fabian-Gruenbichler force-pushed the pull/send-compatibility branch from 3763821 to 2fdb9bf Compare September 8, 2017 06:51

behlendorf added the Status: Work in Progress Not yet ready for general review label Sep 8, 2017

behlendorf previously requested changes Sep 8, 2017

View reviewed changes

Fabian-Gruenbichler mentioned this pull request Sep 13, 2017

Send/Recv backward compatibility is broken with ZFS v0.7.x #6507

Closed

Fabian-Gruenbichler force-pushed the pull/send-compatibility branch 2 times, most recently from 4c09a49 to 930b12a Compare September 22, 2017 13:05

Fabian-Gruenbichler force-pushed the pull/send-compatibility branch from 930b12a to 94f51fb Compare September 25, 2017 12:51

Fabian-Gruenbichler force-pushed the pull/send-compatibility branch from 94f51fb to 614ddfc Compare September 25, 2017 12:52

ahrens reviewed Sep 25, 2017

View reviewed changes

Fabian-Gruenbichler force-pushed the pull/send-compatibility branch from 614ddfc to 2ae3866 Compare September 26, 2017 12:41

Fabian-Gruenbichler force-pushed the pull/send-compatibility branch 2 times, most recently from 5f0d1ab to f714e38 Compare September 27, 2017 06:51

Fabian-Gruenbichler mentioned this pull request Sep 27, 2017

[RFC/WIP] backport freeobjects receive handling from 0.7.x to 0.6.5.x #6602

Closed

13 tasks

behlendorf requested changes Oct 3, 2017

View reviewed changes

Fabian-Gruenbichler force-pushed the pull/send-compatibility branch from bf7a8de to 9c2f987 Compare October 5, 2017 10:55

Fabian-Gruenbichler changed the title ~~[WIP] send/recv compatibility with 0.6.5.x~~ send/recv compatibility with 0.6.5.x Oct 6, 2017

behlendorf mentioned this pull request Oct 6, 2017

Potential incompatibility with BSD-created pools #6038

Closed

behlendorf approved these changes Oct 6, 2017

View reviewed changes

pcd1193182 approved these changes Oct 7, 2017

View reviewed changes

behlendorf removed the Status: Work in Progress Not yet ready for general review label Oct 10, 2017

behlendorf closed this in 48fbb9d Oct 10, 2017

Fabian-Gruenbichler mentioned this pull request Dec 11, 2017

zfs receive stall in D state #6946

Closed

waikontse mentioned this pull request Aug 8, 2018

8589 zfs send | recv crashes receiving system openzfs/openzfs#685

Closed

send/recv compatibility with 0.6.5.x #6616

send/recv compatibility with 0.6.5.x #6616

Conversation

Fabian-Gruenbichler commented Sep 8, 2017 • edited Loading

Description

How Has This Been Tested?

Types of changes

Checklist:

behlendorf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fabian-Gruenbichler commented Sep 13, 2017

Fabian-Gruenbichler commented Sep 18, 2017

behlendorf commented Sep 20, 2017

ahrens commented Sep 20, 2017

ahrens commented Sep 20, 2017

Fabian-Gruenbichler commented Sep 21, 2017 via email

ahrens commented Sep 21, 2017

Fabian-Gruenbichler commented Sep 22, 2017

Fabian-Gruenbichler commented Sep 22, 2017

codecov bot commented Sep 25, 2017 • edited Loading

Codecov Report

Fabian-Gruenbichler commented Sep 25, 2017

ahrens commented Sep 25, 2017

ahrens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fabian-Gruenbichler commented Sep 26, 2017

Fabian-Gruenbichler commented Sep 26, 2017

Fabian-Gruenbichler commented Sep 27, 2017

pcd1193182 commented Sep 27, 2017

Fabian-Gruenbichler commented Oct 3, 2017

behlendorf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fabian-Gruenbichler commented Oct 4, 2017

Fabian-Gruenbichler commented Oct 5, 2017

Fabian-Gruenbichler commented Oct 6, 2017

behlendorf left a comment

Choose a reason for hiding this comment

odoucet commented Oct 6, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahrens commented Oct 10, 2017

waikontse commented Aug 3, 2018

Fabian-Gruenbichler commented Aug 7, 2018

waikontse commented Aug 7, 2018

Fabian-Gruenbichler commented Sep 8, 2017 •

edited

Loading

codecov bot commented Sep 25, 2017 •

edited

Loading

odoucet commented Oct 6, 2017 •

edited

Loading