Fast Copy Between Related Datasets #13516

Haravikk · 2022-05-28T11:27:13Z

Describe the feature would like to see added to OpenZFS

When copying a file from one dataset to another, it should be possible for the file to be copied instantly so long as both datasets are unencrypted, or the source and target dataset share the same encryption root.

I would propose a fastcopy setting for datasets to enable some control over whether files can be fast-copied to them; settings would be on (always fast copy where possible), off (never fast copy), or exact (fast copy only when settings that affect the data structure, such as compression or recordsize are an exact match between the two datasets, this will allow targets to force recompression/restructuring). This setting should probably default to off or exact.

When fast-copying between datasets that do not have the same copies setting, if the copies on the target are lower than the source, additional copies can simply be ignored, whereas if the target wants more copies then the extras will need to be created (e.g- if the source has copies=2 and the target has copies=3 then two copies are linked, and a new third copy is created).

For sending and receiving support, the default behaviour would need to be to treat the fast-copied file as a full copy, using metadata to lookup and send the blocks as if they belonged to the dataset they were fast-copied into. I'm not sure if it would be possible to indicate to zfs send that the receiver should already have a the necessary blocks, as there seems like way too much potential to go out of sync (trying to send a fast-copy of a file that hasn't been sent from the source dataset yet etc.), and I think the logic to handle a send/receive dependant on multiple datasets on each side is too complex (definitely so for this request).

How will this feature improve OpenZFS?

While ZFS makes it very easy to use large numbers of datasets to fine tune settings such as redundancy, record size, etc., copying/moving between them can be comparatively costly as a whole new copy of the file is created within the same pool, resulting in excess redundancy that would require deduplication to prevent (usually too costly to be worth it).

A fast copy of this type would make it possible to copy/move instantly between datasets with zero overhead.

While this may mean a file using smaller records is copied into a dataset with a larger record size (or vice versa), in general this will be preferable to creating a whole new copy, however this will be an opt-in behaviour if the default setting for fastcopy is exact or off.

Additional context

This feature is related to/dependent upon support for cp --reflink requested by various issues such as #13349.

I initially couldn't decide on whether to request this separately or simply add it is a note to other issues, but this feels like a separate feature that would need to be built on top of reflink support, as while basic reflinks are (comparatively) simple, this feature is a bit more complex so will likely be better handled later.

The text was updated successfully, but these errors were encountered:

rincebrain · 2022-05-28T12:30:54Z

I'm not really sure what it is you're looking for, here.

"Don't let me cp --reflink if the dataset settings are different" is a strange requirement, and everything else is just #13392.

Haravikk · 2022-08-06T13:54:21Z

Forgot about this issue when I posted a more focused request as #13572

Haravikk added the Type: Feature Feature request or new feature label May 28, 2022

Haravikk mentioned this issue May 28, 2022

Block Cloning #13392

Merged

13 tasks

Haravikk closed this as completed Aug 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast Copy Between Related Datasets #13516

Fast Copy Between Related Datasets #13516

Haravikk commented May 28, 2022 •

edited

Loading

rincebrain commented May 28, 2022

Haravikk commented Aug 6, 2022

Fast Copy Between Related Datasets #13516

Fast Copy Between Related Datasets #13516

Comments

Haravikk commented May 28, 2022 • edited Loading

Describe the feature would like to see added to OpenZFS

How will this feature improve OpenZFS?

Additional context

rincebrain commented May 28, 2022

Haravikk commented Aug 6, 2022

Haravikk commented May 28, 2022 •

edited

Loading