Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate dedup send/receive #7887

Closed
pcd1193182 opened this issue Sep 11, 2018 · 5 comments · Fixed by #10117
Closed

Deprecate dedup send/receive #7887

pcd1193182 opened this issue Sep 11, 2018 · 5 comments · Fixed by #10117
Assignees
Labels
Component: Send/Recv "zfs send/recv" feature

Comments

@pcd1193182
Copy link
Contributor

This issue is created to track the proposal from the OpenZFS Developer Summit 2018 that dedup send could be deprecated and removed from the filesystem. For full context, watch the presentation at https://youtu.be/WvBURTKUy1o?t=5177 . A shortened version of the argument is:

  • Dedup send can only deduplicate over the set of blocks in the send command being invoked, and it does not take advantage of the dedup table to do so. This is a very common misconception among not only users, but developers, and makes the feature seem more useful than it is. As a result, many users are using the feature but not getting any benefit from it .
  • Dedup send requires a nontrivial expenditure of memory and CPU to operate, especially if the dataset(s) being sent is (are) not already using a dedup-strength checksum.
  • Dedup send adds developer burden. It expands the test matrix when developing new features, causing bugs in released code, and delaying development efforts by forcing more testing to be done.

When considered as a group, these issues with the feature make it a solid candidate for removal from ZFS. This thread is intended to be a hub for discussion and comments around this proposal, as well as to track its progress.

This is a significant milestone not only because we're potentially removing this feature from ZFS, but because we're potentially removing a feature at all. No user-facing feature has ever been removed from ZFS in this manner before, and the protocol undertaken as part of this effort could define how we remove features in the future. And the ability to reduce code complexity and volume is an important part of any open source project, so we need a plan for this at some point.

The proposed next step is that a warning would be added to the man page for ZFS and printed out whenever dedup send is used as soon as possible. The actual work of removing functionality would be delayed for a significant time to allow end users and vendors to accommodate the change. This timeline is just a proposal, however. We want input from all interested parties on the right way to proceed with this, or even whether to proceed with this. We especially want to hear from users and vendors that take advantage of this feature, and what their use case is.

@snajpa
Copy link
Contributor

snajpa commented Sep 12, 2018

I've never used dedup send/recv, for our setup raw send of compressed data is far more relevant.

I wouldn't mind if it went away - and I bet I'm not alone. I'd be interested to hear about what other people use it for - specifically, how big the datasets are and how much time/bandwidth does it save in those cases.

@ahrens
Copy link
Member

ahrens commented Sep 12, 2018

I'd love to see this go away. I think maintaining receivability of dedup streams by "re-duping" them in userland would be a great way to get 99% of the benefit while not actually abandoning this aspect of the send stream format.

@GregorKopka
Copy link
Contributor

GregorKopka commented Sep 15, 2018

I have made an argument in 2016 to deprecate the current use of -r and -R on zfs rollback (to replace it with a real recursive flag, after it being non-functional for long enough so unintended use won't happen).

The main argument against this was:

There really is no "reasonable length of time" for phase 2 or 3 of this plan. The "zfs" and "zpool" commands are a committed, documented interface, which people are allowed (and encouraged!) to depend on.

It's unfortunate when design decisions get made that turn out to be suboptimal in hindsight, but these are lessons to be observed when making future design decisions rather than opportunities to make breaking changes.

Along that argument: should dedup send be removed as a feature the -D switch must be kept, even when turned into a NOP.

Nevertheless:

I would prefer (instead of having to drag every wart along for eternity) ZFS having a clear deprecation process that enables to remove any bad (in hindsight) design decision - certainly over a reasonable timespan (and with warnings to users of the feature in question) to not cause problems in production.

ZFS can't be the last word in filesystems (google search) - or in anything - as long as it's unable to rid itself of diagnosed problems.

@ahrens
Copy link
Member

ahrens commented Sep 17, 2018

I think there's a substantive difference between this proposal (which at a high level would "just" reduce the dedup ratio achieved by zfs send -D), and the previous proposal which would eventually make zfs rollback -r/-R destroy different data than it used to. I'd like to redirect discussion of zfs rollback to a different channel so that we can concentrate on dedup send/recv in this issue. I've just followed up on the rollback mailing list thread.

I agree that zfs send -D should continue to be a valid argument, and at some point it would just cause a warning message to be printed, and no actual deduping to happen.

@behlendorf behlendorf added the Component: Send/Recv "zfs send/recv" feature label Mar 20, 2019
@ahrens ahrens self-assigned this Mar 10, 2020
ahrens added a commit to ahrens/zfs that referenced this issue Mar 10, 2020
Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Closes openzfs#7887
Signed-off-by: Matthew Ahrens <[email protected]>
ahrens added a commit to ahrens/zfs that referenced this issue Mar 11, 2020
Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Closes openzfs#7887
Signed-off-by: Matthew Ahrens <[email protected]>
ahrens added a commit to ahrens/zfs that referenced this issue Mar 12, 2020
Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Closes openzfs#7887
Signed-off-by: Matthew Ahrens <[email protected]>
ahrens added a commit to ahrens/zfs that referenced this issue Mar 12, 2020
Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Closes openzfs#7887
Signed-off-by: Matthew Ahrens <[email protected]>
behlendorf pushed a commit that referenced this issue Mar 18, 2020
Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #7887 
Closes #10117
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Apr 22, 2020
Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes openzfs#7887
Closes openzfs#10117
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Apr 22, 2020
Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes openzfs#7887
Closes openzfs#10117
behlendorf pushed a commit that referenced this issue Apr 23, 2020
Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such
streams) are deprecated.  Deduplicated send streams can be received by
first converting them to non-deduplicated with the `zstream redup`
command.

This commit removes the code for sending and receiving deduplicated send
streams.  `zfs send -D` will now print a warning, ignore the `-D` flag,
and generate a regular (non-deduplicated) send stream.  `zfs receive` of
a deduplicated send stream will print an error message and fail.

The resulting code simplification (especially in the kernel's support
for receiving dedup streams) should help enable future performance
enhancements.

Several new tests are added which leverage `zstream redup`.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Issue #7887
Issue #10117
Issue #10156
Closes #10212
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Apr 28, 2020
Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes openzfs#7887
Closes openzfs#10117
tonyhutter pushed a commit that referenced this issue May 12, 2020
Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #7887
Closes #10117
as-com pushed a commit to as-com/zfs that referenced this issue Jun 20, 2020
Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such
streams) are deprecated.  Deduplicated send streams can be received by
first converting them to non-deduplicated with the `zstream redup`
command.

This commit removes the code for sending and receiving deduplicated send
streams.  `zfs send -D` will now print a warning, ignore the `-D` flag,
and generate a regular (non-deduplicated) send stream.  `zfs receive` of
a deduplicated send stream will print an error message and fail.

The resulting code simplification (especially in the kernel's support
for receiving dedup streams) should help enable future performance
enhancements.

Several new tests are added which leverage `zstream redup`.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Issue openzfs#7887
Issue openzfs#10117
Issue openzfs#10156
Closes openzfs#10212 
(cherry picked from commit 196bee4)
jsai20 pushed a commit to jsai20/zfs that referenced this issue Mar 30, 2021
Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes openzfs#7887 
Closes openzfs#10117
jsai20 pushed a commit to jsai20/zfs that referenced this issue Mar 30, 2021
Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such
streams) are deprecated.  Deduplicated send streams can be received by
first converting them to non-deduplicated with the `zstream redup`
command.

This commit removes the code for sending and receiving deduplicated send
streams.  `zfs send -D` will now print a warning, ignore the `-D` flag,
and generate a regular (non-deduplicated) send stream.  `zfs receive` of
a deduplicated send stream will print an error message and fail.

The resulting code simplification (especially in the kernel's support
for receiving dedup streams) should help enable future performance
enhancements.

Several new tests are added which leverage `zstream redup`.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Issue openzfs#7887
Issue openzfs#10117
Issue openzfs#10156
Closes openzfs#10212
@UnitedMarsupials
Copy link

Having learned about this deprecation from a warning, that the nightly cron-job started generating after an upgrade, I'll post my stats for the record. All of the backups used --dedup before -- while on FreeBSD-11. After the upgrade -- to FreeBSD-13 -- the option had to be removed.

The backup script pipes the output of zfs send into xz -9 (and then -- into ccrypt).

Filesystem Dedup Backup size with -D (bytes) New backup size (bytes) Relative increase
imap spool off 23299280396 24124613244 3.54%
OwnCloud file-store sha256,verify 37755248048 41634861232 10.27%

Looks like the removed feature was quite effective, and I'm sorry to see it go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Send/Recv "zfs send/recv" feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants
@pcd1193182 @behlendorf @ahrens @GregorKopka @snajpa @UnitedMarsupials and others