Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send/recv compatibility with 0.6.5.x #6616

Conversation

Fabian-Gruenbichler
Copy link
Contributor

@Fabian-Gruenbichler Fabian-Gruenbichler commented Sep 8, 2017

Description

ZFS 0.6.5.x does not handle large FREEOBJECTS records referencing non-existing objects well. ZFS 0.7 separately introduced two changes which lead to an incompatibility when sending from ZFS 0.7.x to ZFS 0.6.5.x:

Receiving full streams as clones (#4221)

generates FREE and FREEOBJECTS records even when generating a full stream, to allow those streams to be received as clones of existing datasets.

Increasing indirect block sizes (#5679)

blows up the number of objects contained in the final FREEOBJECTS record for the "logical hole" of unused space at the end of most datasets.

This PR caps the FREEOBJECTS records at the maximum possible USED object ID. For incremental streams, this is no problem. For full streams received as clones, this means we need an extra step of freeing any potentially leftover objects of the origin dataset, after processing the whole stream. The latter breaks compatibility for receiving full streams generated with this patch as clones on systems without this patch.

How Has This Been Tested?

Sending from patched to patched, patched to non-patched (see caveat in description), and patched to 0.6.5.11.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

@behlendorf behlendorf added the Status: Work in Progress Not yet ready for general review label Sep 8, 2017
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'd suggest squashing the module_param() commit and its documentation commit together.

More fundamentally I think it's worth re-evaluating the decision to not add a feature flag to the send stream for this functionality. The comment in zfs_ioctl.h clearly states it wasn't added initially because the stream format was thought to be fully backwards compatible.

+ * This is not implemented as a feature flag, because the receiving side does
+ * not need to have implemented it to receive this stream; it is fully backward
+ * compatible.  We need a flag, though, because full send streams without it
+ * cannot necessarily be received as a clone correctly.

I haven't dug in to the details but if that's not going to be possible we should consider converting this module option in to a proper zfs send command line option and disabling this flag by default for compatibility.

@@ -4010,4 +4010,7 @@ dmu_objset_is_receiving(objset_t *os)
#if defined(_KERNEL)
module_param(zfs_send_corrupt_data, int, 0644);
MODULE_PARM_DESC(zfs_send_corrupt_data, "Allow sending corrupt data");

module_param(zfs_send_set_freerecords_bit, int, 0644);
MODULE_PARM_DESC(zfs_send_set_freerecords_bit, "Do not set freerecords bit for backwards compatibility with 0.6.5.x");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 80 character limit.

You can wrap the block with /* BEGIN CSTYLED */ /* END CSTYLED */ to make the style checker happy about the indent. See the bottom of arc.c for an example.

@Fabian-Gruenbichler
Copy link
Contributor Author

okay, so further bisecting shows that "OpenZFS 6393 - zfs receive a full send as a clone" (#4221) is NOT the actual culprit here. while it does change the behaviour for full streams (which now include FREEOBJECTS records), that change alone would indeed (like noted in the comment) be backwards compatible. this is a bit hidden by the fact that it introduces improved error handling on the receiving side, which meant that the actual backwards-incompatibility was not noticed later on.

the first commit in the 0.6.5 -> 0.7.0 history which actually introduces the breaking behaviour is "OpenZFS 7104 - increase indirect block size" (d7958b4 / #5679), which bumps the default/max indirect block size (shift) by 3 orders of magnitude (from 14/16K to 17/128K). this value is included when calculating the BP_SPAN when dumping a hole, which affects the numobjs for the FREEOBJECTS record as well as introducing an additional 0,0 FREEOBJECTS record (the latter does not seem to be a problem).

test case:

following pool setup:

sudo modprobe zfs

sudo zpool create testpool /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1
sudo zfs create testpool/send

echo 'aaaa' | sudo tee /testpool/send/testfile
sudo zfs snapshot testpool/send@snap1
echo 'bbbb' | sudo tee /testpool/send/testfile
sudo zfs snapshot testpool/send@snap2

dumping a full and incremental replication stream using:

sudo zfs send -Rv testpool/send@snap1 | sudo zstreamdump -v > full.dump
sudo zfs send -Rv -I testpool/send@snap1 testpool/send@snap2 | sudo zstreamdump -v > incremental.dump

0.7.0 produces streams that end in:

==> 0.7/full.dump <==
FREEOBJECTS firstobj = 64 numobjs = 36028797018963904
FREEOBJECTS firstobj = 0 numobjs = 0
END checksum = 1a441875cdb/26d2f0c3ead633/3eacb4cb93d74e87/941e0bf139b514c9
END checksum = 0/0/0/0
SUMMARY:
        Total DRR_BEGIN records = 2
        Total DRR_END records = 3
        Total DRR_OBJECT records = 7
        Total DRR_FREEOBJECTS records = 5
        Total DRR_WRITE records = 8
        Total DRR_WRITE_BYREF records = 0
        Total DRR_WRITE_EMBEDDED records = 0
        Total DRR_FREE records = 12
        Total DRR_SPILL records = 0
        Total records = 37
        Total write size = 36864 (0x9000)
        Total stream length = 49204 (0xc034)

==> 0.7/incremental.dump <==
FREEOBJECTS firstobj = 64 numobjs = 36028797018963904
FREEOBJECTS firstobj = 0 numobjs = 0
END checksum = 309fbc68a9/685551b2ff9a/927e876e829f47/977bfcebffe77a14
END checksum = 0/0/0/0
SUMMARY:
        Total DRR_BEGIN records = 2
        Total DRR_END records = 3
        Total DRR_OBJECT records = 2
        Total DRR_FREEOBJECTS records = 4
        Total DRR_WRITE records = 1
        Total DRR_WRITE_BYREF records = 0
        Total DRR_WRITE_EMBEDDED records = 0
        Total DRR_FREE records = 3
        Total DRR_SPILL records = 0
        Total records = 15
        Total write size = 512 (0x200)
        Total stream length = 5940 (0x1734)

the full stream already hangs zfs recv on 0.6.5.11 on the receiving side.

applying the following patch with a sort of revert of d7958b4, on top of 0.7.0:

diff --git a/include/sys/dnode.h b/include/sys/dnode.h
index d32855d..edb0575 100644
--- a/include/sys/dnode.h
+++ b/include/sys/dnode.h
@@ -65,7 +65,7 @@ extern "C" {
  * 4 levels of indirect blocks would not be able to guarantee addressing an
  * entire object, so 5 levels will be used, but 5 * (20 - 7) = 65.
  */
-#define        DN_MAX_INDBLKSHIFT      17      /* 128k */
+#define        DN_MAX_INDBLKSHIFT      14      /* 16k */
 #define        DNODE_BLOCK_SHIFT       14      /* 16k */
 #define        DNODE_CORE_SIZE         64      /* 64 bytes for dnode sans blkptrs */
 #define        DN_MAX_OBJECT_SHIFT     48      /* 256 trillion (zfs_fid_t limit) */

now, the dumped streams end like this:

==> 0.7-patched/full.dump <==
FREEOBJECTS firstobj = 64 numobjs = 422212465065920
END checksum = 1c6379af216/2bf9931acc6d8d/a5b2c2ebf3d70779/5ff05388267cbd3d
END checksum = 0/0/0/0
SUMMARY:
        Total DRR_BEGIN records = 2
        Total DRR_END records = 3
        Total DRR_OBJECT records = 7
        Total DRR_FREEOBJECTS records = 4
        Total DRR_WRITE records = 8
        Total DRR_WRITE_BYREF records = 0
        Total DRR_WRITE_EMBEDDED records = 0
        Total DRR_FREE records = 12
        Total DRR_SPILL records = 0
        Total records = 36
        Total write size = 36864 (0x9000)
        Total stream length = 48892 (0xbefc)

==> 0.7-patched/incremental.dump <==
FREEOBJECTS firstobj = 64 numobjs = 422212465065920
END checksum = 32e89f1cf2/68d551585a31/888a46da9f1a2f/8396c872030a02ce
END checksum = 0/0/0/0
SUMMARY:
        Total DRR_BEGIN records = 2
        Total DRR_END records = 3
        Total DRR_OBJECT records = 2
        Total DRR_FREEOBJECTS records = 3
        Total DRR_WRITE records = 1
        Total DRR_WRITE_BYREF records = 0
        Total DRR_WRITE_EMBEDDED records = 0
        Total DRR_FREE records = 3
        Total DRR_SPILL records = 0
        Total records = 14
        Total write size = 512 (0x200)
        Total stream length = 5628 (0x15fc)

and sending both the full and the incremental stream to a 0.6.5.11 system works as expected.

I only tested with 0.7.0, but I am pretty sure the same applies to 0.7.1 and master.

is there any way to get out of this mess besides backporting the improved receive handling to 0.6.5.11 like in #6602?

@Fabian-Gruenbichler
Copy link
Contributor Author

@behlendorf @ahrens any change either of you could take a look at this? am I correct in assuming that bumping the indirect block size shift default value and limit does not change the actual on disk format of zpools, because otherwise it would have been hidden behind a pool feature flag and not just bumped between 0.6.5 and 0.7?

if so, one way to ease upgrades would be to revert this change and postpone it (either to 0.8 or a later point in 0.7), at least for downstreams relying on replication using zfs send/recv..

@behlendorf
Copy link
Contributor

@Fabian-Gruenbichler thanks for isolating the exact commit. To answer your question increasing the indirect block size didn't require a feature flag at the time since the size was already being stored on disk. Older versions of the software are capable of handling this larger size. That all worked as expected, where things seem to have gone wrong is this accidental change to the send stream.

if so, one way to ease upgrades would be to revert this change and postpone it

Unfortunately since this size is stored on disk that would only resolve the issue for new pools, not for existing pools. What I think might work would be updating do_dump() to assume the previous maximum indirect shift when generating DRR_FREEOBJECTS records. It looks like you could do this with a for loop in the block you referenced.

We should try and get @dankimmel's thoughts on this too since he's familiar with all this code. He may have a better idea.

@ahrens
Copy link
Member

ahrens commented Sep 20, 2017

To make sure I understand, let me summarize:

Some FREEOBJECTS records have very large number of objects, which confuses (appears to hang) old code (0.6.5). These confusing FREEOBJECTS records are generated by both full and incremental sends from current (0.7) bits. This PR makes it possible to set a tunable which changes the full send stream such that it doesn't generate these confusing records (but receive full stream as a clone won't work).

So this PR is essentially a workaround (requires setting tunable) for part of the bug (only fixes full send streams). While this is an improvement over the current state, I think we need to find a more complete solution -- at least to workaround the problem for incrementals too, and hopefully to actually fix the bug, without requiring changing tunables.

@pcd1193182 any ideas?

@ahrens
Copy link
Member

ahrens commented Sep 20, 2017

For incrementals, I think we know that there can't be any objects after dn_maxblkid * DNODES_PER_BLOCK. So we don't really need to FREEOBJECTS any objects after that. We could use this knowledge to reduce the numobjs in the big FREEOBJECTS record, which might be enough to eliminate the "hang".

For full sends, we don't know what the max objid is on the target, when receiving as a clone. But we could change the code on the receiving system to make it free all objects after the last object in the send stream. An older system couldn't receive-whole-send-as-a-clone from a system with this change, but "receive whole send as a clone" is a relatively recent, and rarely used feature, so this is much less bad than not being to send from 0.7 to earlier releases.

@Fabian-Gruenbichler
Copy link
Contributor Author

Fabian-Gruenbichler commented Sep 21, 2017 via email

@ahrens
Copy link
Member

ahrens commented Sep 21, 2017

not sure if this is worth it - if we need a fix on the receiving side, the most straight-forward one IMHO is my other PR backporting the "skip rest of FREEOBJECTS record if object does not exist" commits ;) ideally, we could fix this mess on the sender side only - in many scenarios, the target of a send/recv is not easily updatable (think centralized backup server for lots of infrastructure with conservative upgrade policy).

You're proposing that we need a fix on the receiving side in order to receive any full send stream. I'm proposing that we need a fix on the receiving side in order to receive a full send as a clone. That's a significant difference.

@Fabian-Gruenbichler
Copy link
Contributor Author

sorry, misunderstood your point there. what you mean is:

  • re-introduce the early return in dump_freeobjects for full streams, and no longer set the DRR flag
  • instead free all objects which were not explicitly part of the stream when receiving as clone

together with reducing the incremental FREEOBJECTS records to the needed minimum, this would allow sending regular streams from > 0.7.1 to 0.6.5.x again. receiving full streams as clones would still not be supported on 0.6.5.x. sending from 0.7.0/1 to > 0.7.1 would work for all combinations if we keep the receive compatibility. but sending from > 0.7.1 to 0.7.0/1 would no longer support receiving as full clone.. I still wonder whether putting the full stream with FREE(OBJECTS) behind a send feature flag would not be easier.. that would not require any changes on the receiving side..

@Fabian-Gruenbichler Fabian-Gruenbichler force-pushed the pull/send-compatibility branch 2 times, most recently from 4c09a49 to 930b12a Compare September 22, 2017 13:05
@Fabian-Gruenbichler
Copy link
Contributor Author

took a first stab at implementing the max possible object ID, and fixed some issues with the previous iteration. haven't done extensive testing yet and this is my first foray into the inner workings of ZFS, so possibly I missed some important aspects ;) did some basic sending/receiving between patched master and patched master as well as 0.6.5.11 which showed no obvious errors.

@codecov
Copy link

codecov bot commented Sep 25, 2017

Codecov Report

Merging #6616 into master will increase coverage by 0.19%.
The diff coverage is 74.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6616      +/-   ##
==========================================
+ Coverage   74.09%   74.28%   +0.19%     
==========================================
  Files         295      295              
  Lines       93882    93920      +38     
==========================================
+ Hits        69558    69767     +209     
+ Misses      24324    24153     -171
Flag Coverage Δ
#kernel 74.47% <81.25%> (+0.08%) ⬆️
#user 63.66% <0%> (+0.36%) ⬆️
Impacted Files Coverage Δ
module/zfs/dmu_send.c 75.04% <74.28%> (-0.25%) ⬇️
module/icp/algs/skein/skein_port.h 0% <0%> (-100%) ⬇️
module/zfs/vdev_file.c 82.02% <0%> (-2.25%) ⬇️
module/zfs/dsl_userhold.c 88.49% <0%> (-1.2%) ⬇️
module/zfs/vdev_label.c 94.42% <0%> (-0.6%) ⬇️
module/zfs/zio.c 92.03% <0%> (-0.5%) ⬇️
module/zfs/spa_history.c 93.66% <0%> (-0.46%) ⬇️
cmd/zdb/zdb.c 61.49% <0%> (-0.24%) ⬇️
module/zfs/vdev_raidz.c 97.83% <0%> (-0.23%) ⬇️
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 39f5662...9c2f987. Read the comment docs.

@Fabian-Gruenbichler
Copy link
Contributor Author

pushed a new iteration to make style checker happy and reduce code duplication.

@ahrens
Copy link
Member

ahrens commented Sep 25, 2017

re-introduce the early return in dump_freeobjects for full streams, and no longer set the DRR flag
instead free all objects which were not explicitly part of the stream when receiving as clone

Actually, I meant to keep setting the DRR flag, but truncate the last FREEOBJECTS record based on maxblkid as we do for incrementals, and have the receiver free any of their objects that are past the last OBJECT/FREEOBJECTS record.

I still wonder whether putting the full stream with FREE(OBJECTS) behind a send feature flag would not be easier.. that would not require any changes on the receiving side..

Yeah, we could require that you do zfs send --receivable-as-clone (better name/flag TBD) to enable the new DRR flag and FREE/FREEOBJECTS records.

Copy link
Member

@ahrens ahrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dsa_incremental_maxobj code is nice. Have you been able to test it out, sending to an older release of ZFS?


/*
* reduce numobjs according to maximum possible object id of
* base snapshot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments say what is happening, but it isn't much more work to read the code to understand that.

It would be really useful to have a comment explaining why we are doing this (especially since the receive code (now) handles longer FREEOBJECTS records just fine).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, makes sense!

@@ -934,7 +967,7 @@ dmu_send_impl(void *tag, dsl_pool_t *dp, dsl_dataset_t *to_ds,
zfs_bookmark_phys_t *ancestor_zb, boolean_t is_clone,
boolean_t embedok, boolean_t large_block_ok, boolean_t compressok,
boolean_t rawok, int outfd, uint64_t resumeobj, uint64_t resumeoff,
vnode_t *vp, offset_t *off)
vnode_t *vp, offset_t *off, uint64_t incremental_maxobj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function can figure out incremental_maxobj instead of making the caller pass it in. In addition to making the interface cleaner, it looks like we can eliminate some duplicated code in the callers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we need the maxobj of the fromsnap dataset (because those are the objects we potentially free, unless I misunderstood something..), and here we only have the ancestor_zb (which could be a bookmark!). I figured we would not want to duplicate all the dataset finding code from dmu_send(_obj) in dmu_send_impl ? but maybe I am just missing the right helper to get from zfs_bookmark_phys_t to dnode_phys_t ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or should we use tosnap to get maxblkid, because it is guarantueed that maxblkid does not get smaller from one snapshot to the next (not sure about this?) - that would indeed make it easier, and also be reusable for the full send proposal..

@Fabian-Gruenbichler
Copy link
Contributor Author

yeah, I did some basic tests (lots of files, big files, holes in files, deleting lots of files) and all sends from patched master to patched master and to vanilla 0.6.5.11 worked as expected, and the test suite results look good as well.

@Fabian-Gruenbichler
Copy link
Contributor Author

and a new iteration:

  • make send full stream as clone-receivable opt-in via a flag, instead of opt-out via module parameter
  • calculate max obj id based on "to" dataset, instead of from (less code, support for bookmarks as base of incremental stream)

I haven't tested resuming, and actually receving a full stream as clone yet ;)

@Fabian-Gruenbichler Fabian-Gruenbichler force-pushed the pull/send-compatibility branch 2 times, most recently from 5f0d1ab to f714e38 Compare September 27, 2017 06:51
@Fabian-Gruenbichler
Copy link
Contributor Author

rebased, fixed the test case about receiving full streams as clone to use the new '-C' flag, and actually allow using the '-C' flag when sending.

@pcd1193182
Copy link
Contributor

I don't love having to use a flag for this... the feature should really just work transparently. Having different "kinds" of full sends just rubs me the wrong way. I think I prefer Matt's proposal for full sends in #6616 (comment) . It requires some receive changes, but they wouldn't be extensive, and it gets the behavior we want in pretty much every case. I'd be happy to help outline the receive changes if needed.

@Fabian-Gruenbichler
Copy link
Contributor Author

pushed new iteration, now trimming freeobjects records unconditionally for both types of streams, and keeping track of objects referenced/touched when receiving streams, using that information to free all leftover objects after that mark when receiving a full stream as clone.

as discussed with @pcd1193182 on IRC, this would mean that ZoL > 0.7.2 is incompatible regarding the "receive full stream as clone" feature with ZoL 0.7.0-0.7.2, but it would be fully compatible with ZoL 0.6.5.x regarding full and incremental streams without changes in 0.6.5.

feedback welcome ;)

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strikes me as a pretty reasonable solution. We can live with the "receive full stream as clone" feature being incompatible with 0.7.0 - 0.7.2 tags. The most important thing is that full and incremental streams will be compatible again with 0.6.5.x as originally intended.


for (obj = rwa->last_touched_object + 1;
next_err == 0;
next_err = dmu_object_next(rwa->os, &obj, FALSE, 0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think rewriting this in the form while (next_err == 0) { ... } would be easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was lifted from receive_freeobjects (https://github.com/zfsonlinux/zfs/blob/master/module/zfs/dmu_send.c#L2550), but since the loop condition is much simpler here converting to a while makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second thought - with the pesky continue we'd either need two calls to dmu_object_next or a goto, so I am not so sure..

https://gist.github.com/Fabian-Gruenbichler/8632d82204237a6219ba40c1258d261d
https://gist.github.com/Fabian-Gruenbichler/6234f96c6a33397561541f75889a7d3b

which of the three would be preferred?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, my preference would be for the last one
https://gist.github.com/Fabian-Gruenbichler/6234f96c6a33397561541f75889a7d3b

@@ -2150,6 +2151,7 @@ struct receive_writer_arg {
boolean_t resumable;
boolean_t raw;
uint64_t last_object, last_offset;
uint64_t last_touched_object;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using last_touched_object for a name is misleading. To me it implies the last modified object in the stream, when in fact you're tracking the largest_object in the stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I was thinking of "last" as in "highest ID", not "last" as in "last in stream". largest is also misleading though, as it could refer to size? maybe something like max_objid ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. max_objid makes sense to me, or even max_object which would be consistent with last_object.

While you're updating this could you also split last_object and last_offset from the line above so each is declared on its own line.

@@ -3720,6 +3746,38 @@ dmu_recv_stream(dmu_recv_cookie_t *drc, vnode_t *vp, offset_t *voffp,
}
mutex_exit(&rwa->mutex);

/*
* if we are receiving a full stream as clone, we need to free leftover
* objects after the last one referenced in the stream.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean largest one referenced in the stream, not last.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover objects with IDs greater than the maximum ID referenced in the stream.

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, how about something like this to make it perfectly clear.

"If we are receiving a full stream as a clone, all object ids which
are greater than the maximum ID referenced in the stream are
by definition unused and must be freed."

@@ -3720,6 +3746,38 @@ dmu_recv_stream(dmu_recv_cookie_t *drc, vnode_t *vp, offset_t *voffp,
}
mutex_exit(&rwa->mutex);

/*
* if we are receiving a full stream as clone, we need to free leftover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/if we/If we (capital leading i)

@Fabian-Gruenbichler
Copy link
Contributor Author

updated PR description for (hopefully) final approach

all objects after the last written or freed object are not supposed to
exist after receiving the stream. free them accordingly, as if a
freeobjects record for them had been included in the stream.

Signed-off-by: Fabian Grünbichler <[email protected]>
@Fabian-Gruenbichler
Copy link
Contributor Author

updated and included all of the style feedback.

I reduced the while loop even further by skipping the call to dmu_object_info. it was completely redundant, since it was only called to check if the object exists before calling dmu_free_long_object - which itself begins with calling dmu_free_long_range, which begins with the exact same call to dnode_hold with identical parameters and error handling like dmu_object_info.

I wonder if we should not do the same in receive_freeobjects as well?

@Fabian-Gruenbichler Fabian-Gruenbichler changed the title [WIP] send/recv compatibility with 0.6.5.x send/recv compatibility with 0.6.5.x Oct 6, 2017
@Fabian-Gruenbichler
Copy link
Contributor Author

removed WIP from title - this is ready to get merged from my side.

@ahrens @pcd1193182 any objections?

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Once everyone's happy with this we'll want to upstream it to OpenZFS.

@odoucet
Copy link

odoucet commented Oct 6, 2017

Thank you Fabian for this patch !
When can we expect 0.7.3 to be released with this patch ? Myself (and many people I guess) are stuck from upgrading to 0.7.x due to this bug.

@@ -2824,6 +2861,9 @@ receive_free(struct receive_writer_arg *rwa, struct drr_free *drrf)
if (dmu_object_info(rwa->os, drrf->drr_object, NULL) != 0)
return (SET_ERROR(EINVAL));

if (drrf->drr_object > rwa->max_object)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't actually need these, since we only consider this value when doing a receive of a full send as a clone, and in that case you have to get an object record before you get any other kind of record for that object. Doesn't hurt to have them, though, and to have it be accurate in all cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any other kind of record except FREEOBJECT ;) I originally attempted to just re-use last_object, but that obviously did not work out.

I did not feel comfortable enough with my level of understanding of the code to play guessing games (regarding which patterns are possible and which aren't), so I went the "better safe than sorry" route :)

@behlendorf behlendorf removed the Status: Work in Progress Not yet ready for general review label Oct 10, 2017
behlendorf pushed a commit that referenced this pull request Oct 10, 2017
When sending an incremental stream based on a snapshot, the receiving
side must have the same base snapshot.  Thus we do not need to send
FREEOBJECTS records for any objects past the maximum one which exists
locally.

This allows us to send incremental streams (again) to older ZFS
implementations (e.g. ZoL < 0.7) which actually try to free all objects
in a FREEOBJECTS record, instead of bailing out early.

Reviewed by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Fabian Grünbichler <[email protected]>
Closes #5699
Closes #6507
Closes #6616
@ahrens
Copy link
Member

ahrens commented Oct 10, 2017

Changes look good. Sorry I didn't have a chance to look at this before it was integrated (was on vacation last week). Is anyone signed up to port this to OpenZFS/illumos?

aerusso pushed a commit to aerusso/zfs that referenced this pull request Oct 11, 2017
All objects after the last written or freed object are not supposed to
exist after receiving the stream.  Free them accordingly, as if a
freeobjects record for them had been included in the stream.

Reviewed by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Fabian Grünbichler <[email protected]>
Closes openzfs#5699
Closes openzfs#6507
Closes openzfs#6616
aerusso pushed a commit to aerusso/zfs that referenced this pull request Oct 11, 2017
When sending an incremental stream based on a snapshot, the receiving
side must have the same base snapshot.  Thus we do not need to send
FREEOBJECTS records for any objects past the maximum one which exists
locally.

This allows us to send incremental streams (again) to older ZFS
implementations (e.g. ZoL < 0.7) which actually try to free all objects
in a FREEOBJECTS record, instead of bailing out early.

Reviewed by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Fabian Grünbichler <[email protected]>
Closes openzfs#5699
Closes openzfs#6507
Closes openzfs#6616
aerusso pushed a commit to aerusso/zfs that referenced this pull request Oct 12, 2017
All objects after the last written or freed object are not supposed to
exist after receiving the stream.  Free them accordingly, as if a
freeobjects record for them had been included in the stream.

Reviewed by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Fabian Grünbichler <[email protected]>
Closes openzfs#5699
Closes openzfs#6507
Closes openzfs#6616
aerusso pushed a commit to aerusso/zfs that referenced this pull request Oct 12, 2017
When sending an incremental stream based on a snapshot, the receiving
side must have the same base snapshot.  Thus we do not need to send
FREEOBJECTS records for any objects past the maximum one which exists
locally.

This allows us to send incremental streams (again) to older ZFS
implementations (e.g. ZoL < 0.7) which actually try to free all objects
in a FREEOBJECTS record, instead of bailing out early.

Reviewed by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Fabian Grünbichler <[email protected]>
Closes openzfs#5699
Closes openzfs#6507
Closes openzfs#6616
tonyhutter pushed a commit that referenced this pull request Oct 16, 2017
All objects after the last written or freed object are not supposed to
exist after receiving the stream.  Free them accordingly, as if a
freeobjects record for them had been included in the stream.

Reviewed by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Fabian Grünbichler <[email protected]>
Closes #5699
Closes #6507
Closes #6616
tonyhutter pushed a commit that referenced this pull request Oct 16, 2017
When sending an incremental stream based on a snapshot, the receiving
side must have the same base snapshot.  Thus we do not need to send
FREEOBJECTS records for any objects past the maximum one which exists
locally.

This allows us to send incremental streams (again) to older ZFS
implementations (e.g. ZoL < 0.7) which actually try to free all objects
in a FREEOBJECTS record, instead of bailing out early.

Reviewed by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Fabian Grünbichler <[email protected]>
Closes #5699
Closes #6507
Closes #6616
@waikontse
Copy link

Hi,

I know the issue has been closed almost a year now, but the fix for the issue has been partially ported to the openzfs repo. The partial code commit is at openzfs/openzfs#631, or the code looks almost identical. The missing part is where backwards compatiblity is introduced, when receiving a dataset that's been 'send' in a newer version of zfs.

I don't know if there is any needs/wants to merge the missing part to the openzfs repo, but I had to patch the FreeBSD 11.2 zfs tree because I still have servers deployed running on 10.3.

I could try to make a pr for the patch, but looks like there is a lot of red tape to me.

@Fabian-Gruenbichler
Copy link
Contributor Author

@waikontse feel free to port it (or file an issue) - the backwards compatibility commit is very small and should be straight forward. I don't have the capacity atm do it myself unfortunately.

@waikontse
Copy link

@Fabian-Gruenbichler You're right, it's 10 lines of code max I believe. I'll try to have it merged on openzfs. Thanks.

FransUrbo pushed a commit to FransUrbo/zfs that referenced this pull request Apr 28, 2019
All objects after the last written or freed object are not supposed to
exist after receiving the stream.  Free them accordingly, as if a
freeobjects record for them had been included in the stream.

Reviewed by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Fabian Grünbichler <[email protected]>
Closes openzfs#5699
Closes openzfs#6507
Closes openzfs#6616
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants