Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate deduplicated send streams #10117

Merged
merged 1 commit into from
Mar 18, 2020
Merged

Conversation

ahrens
Copy link
Member

@ahrens ahrens commented Mar 10, 2020

Motivation and Context

Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

Closes #7887

Description

As a result, we are deprecating the use of zfs send -D and receiving
of such streams. This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:

  1. remove the kernel code for generating deduplicated streams
  2. make zfs send -D generate regular, non-deduplicated streams
  3. remove the kernel code for receiving deduplicated streams
  4. make zfs receive of deduplicated streams process them in userland
    to "re-duplicate" them, so that they can still be received.

How Has This Been Tested?

examined manpage output and ran commands:

$ sudo zfs send -D rpool/ROOT/delphix.yOWvfnp/home@a >file
WARNING: deduplicated send is deprecated, and will be removed in a
future release. (The flag will be accepted, but a regular,
non-deduplicated stream will be generated.)

$ sudo zfs receive rpool/recvd <file
WARNING: This is a deduplicated send stream.  The ability to send and
receive deduplicated send streams is deprecated.  In the future, the
performance of receiving a deduplicated send stream will be reduced, the
memory required will be increased, and the ability to receive a
deduplicated stream from a pipe will be removed.  (A deduplicated send
stream will still be able to be received, as long as it is located in a
seek-able file, rather than provided by a pipe.)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • All commit messages are properly formatted and contain Signed-off-by.

@ahrens ahrens added Type: Documentation Indicates a requested change to the documentation Status: Code Review Needed Ready for review and testing Component: Send/Recv "zfs send/recv" feature labels Mar 10, 2020
@ahrens ahrens requested review from pcd1193182 and behlendorf March 10, 2020 19:30
@kithrup
Copy link
Contributor

kithrup commented Mar 10, 2020

FWIW, I approve of this.

Copy link
Contributor

@pcd1193182 pcd1193182 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously I approve of the concept 😛

cmd/zfs/zfs_main.c Outdated Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Mar 11, 2020

Codecov Report

Merging #10117 into master will decrease coverage by 4.04%.
The diff coverage is 60.96%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10117      +/-   ##
==========================================
- Coverage   79.28%   75.24%   -4.05%     
==========================================
  Files         386      381       -5     
  Lines      122448   121892     -556     
==========================================
- Hits        97087    91716    -5371     
- Misses      25361    30176    +4815     
Flag Coverage Δ
#kernel 74.17% <70.76%> (-5.18%) ⬇️
#user 64.65% <53.64%> (-2.04%) ⬇️
Impacted Files Coverage Δ
include/os/linux/spl/sys/time.h 100.00% <ø> (ø)
include/sys/dmu.h 100.00% <ø> (ø)
lib/libshare/smb.c 8.80% <0.00%> (-0.12%) ⬇️
lib/libspl/include/sys/time.h 100.00% <ø> (ø)
lib/libzfs/libzfs_mount.c 83.44% <0.00%> (-1.46%) ⬇️
lib/libzfs/libzfs_util.c 68.11% <0.00%> (-6.25%) ⬇️
module/os/linux/zfs/vdev_disk.c 84.36% <ø> (+0.36%) ⬆️
module/os/linux/zfs/zfs_debug.c 93.05% <ø> (ø)
module/os/linux/zfs/zfs_ioctl_os.c 85.18% <ø> (+0.64%) ⬆️
module/os/linux/zfs/zfs_vfsops.c 72.01% <ø> (-3.78%) ⬇️
... and 195 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01243e7...3a81a56. Read the comment docs.

@ahrens
Copy link
Member Author

ahrens commented Mar 11, 2020

I'm working on implementing the "re-duplicating" of send streams. It's slightly tricky because of zfs send -RD, whose first BEGIN record doesn't have the DEDUP flag set. To simplify the utility, I'd like to change the deprecation message to something like:

In the future, the ability to receive a deduplicated send stream with
"zfs receive" will be removed. However, in the future, a utility will
be provided to convert a deduplicated send stream to a regular
(non-deduplicated) stream. This future utility will require that the
send stream be located in a seek-able file, rather than provided by a
pipe.

Would that be acceptable? (feel free to react with 👍)

Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Closes openzfs#7887
Signed-off-by: Matthew Ahrens <[email protected]>
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Mar 12, 2020
@behlendorf behlendorf merged commit 652bdc9 into openzfs:master Mar 18, 2020
@ahrens ahrens mentioned this pull request Apr 15, 2020
12 tasks
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Apr 22, 2020
Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes openzfs#7887
Closes openzfs#10117
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Apr 22, 2020
Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes openzfs#7887
Closes openzfs#10117
behlendorf pushed a commit that referenced this pull request Apr 23, 2020
Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such
streams) are deprecated.  Deduplicated send streams can be received by
first converting them to non-deduplicated with the `zstream redup`
command.

This commit removes the code for sending and receiving deduplicated send
streams.  `zfs send -D` will now print a warning, ignore the `-D` flag,
and generate a regular (non-deduplicated) send stream.  `zfs receive` of
a deduplicated send stream will print an error message and fail.

The resulting code simplification (especially in the kernel's support
for receiving dedup streams) should help enable future performance
enhancements.

Several new tests are added which leverage `zstream redup`.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Issue #7887
Issue #10117
Issue #10156
Closes #10212
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Apr 28, 2020
Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes openzfs#7887
Closes openzfs#10117
tonyhutter pushed a commit that referenced this pull request May 12, 2020
Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #7887
Closes #10117
as-com pushed a commit to as-com/zfs that referenced this pull request Jun 20, 2020
Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such
streams) are deprecated.  Deduplicated send streams can be received by
first converting them to non-deduplicated with the `zstream redup`
command.

This commit removes the code for sending and receiving deduplicated send
streams.  `zfs send -D` will now print a warning, ignore the `-D` flag,
and generate a regular (non-deduplicated) send stream.  `zfs receive` of
a deduplicated send stream will print an error message and fail.

The resulting code simplification (especially in the kernel's support
for receiving dedup streams) should help enable future performance
enhancements.

Several new tests are added which leverage `zstream redup`.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Issue openzfs#7887
Issue openzfs#10117
Issue openzfs#10156
Closes openzfs#10212 
(cherry picked from commit 196bee4)
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes openzfs#7887 
Closes openzfs#10117
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such
streams) are deprecated.  Deduplicated send streams can be received by
first converting them to non-deduplicated with the `zstream redup`
command.

This commit removes the code for sending and receiving deduplicated send
streams.  `zfs send -D` will now print a warning, ignore the `-D` flag,
and generate a regular (non-deduplicated) send stream.  `zfs receive` of
a deduplicated send stream will print an error message and fail.

The resulting code simplification (especially in the kernel's support
for receiving dedup streams) should help enable future performance
enhancements.

Several new tests are added which leverage `zstream redup`.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Issue openzfs#7887
Issue openzfs#10117
Issue openzfs#10156
Closes openzfs#10212
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Send/Recv "zfs send/recv" feature Status: Accepted Ready to integrate (reviewed, tested) Type: Documentation Indicates a requested change to the documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecate dedup send/receive
6 participants