Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input/Output Error when sending an encrypted incremental dataset back to it's source #11983

Closed
GeneralGresi opened this issue Apr 30, 2021 · 3 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@GeneralGresi
Copy link

System information

System 1

Type Version/Name
Distribution Name Proxmox VE
Distribution Version 6.3
Linux Kernel 5.4.106-1-pve
Architecture x86_64
ZFS Version 2.0.4-pve1
SPL Version 2.0.4-pve1

System 2

Type Version/Name
Distribution Name Ubuntu
Distribution Version 21.04
Linux Kernel 5.11.0-16-generic
Architecture x86_64
ZFS Version 2.0.2-1ubuntu5
SPL Version 2.0.2-1ubuntu5

Describe the problem you're observing

I use zfs send/recv to do backups.
This is done via incremental raw replication zfs sends / recvs. (zfs send -Rw(i))
When sending an incremental snapshot in raw mode from the backupserver back to the primary server, the mount of the dataset on the local side fails with Input/Output error. If the dataset is mounted locally, and something is written there after the receive but before unmounting, everything works fine. If the dataset is not mounted locally while receiving, a mount will fail with Input/Output error.
In that case, I have to destroy the dataset on the local server and resend the complete dataset from the backup server. For configuring something like a cluster, where data is send back and forth between two nodes, this completly breaks replication if no files are altered before unmounting a dataset which isn't always possible.
Writing data to the remote dataset before incrementally sending it back doesn't make a difference.
Without encryption, this problem doesn't occour.

Afaik the problem exists since zfs 0.8 and occours on every system I tried. I set up a brand new ubuntu machine just for testing, and it also occours there.
I don't know if the FreeBSD Variant has the same issue.

I didn't find any hint of the cause anywhere in the system logs nor while searching online/in the open issues.
In fact, the system logs doesn't say anything regarding that.

Describe how to reproduce the problem

I made a bash script for that purpose zfs_produce_io_error.zip

This script tests 5 things:

  • snapshot is sent to remote, mounted/unmounted there and incrementally sent back, while the local dataset is not mounted - fails
  • snapshot is sent to remote without touching it at all on the remote side and incrementally sent back - fails
  • same as 2 but unencrypted. - works
  • same as 2 but before unmounting test1 data is written there. - works
  • same as 1 but with altering data on the remote side - fails

Sample output:

Creating tmp file /tmp/foo
128+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.047836 s, 2.8 GB/s
Creating zpool testpool with file /tmp/foo
############################################# TEST 1 ##############################################
Creating encrypted zfs-dataset test1
Snapshot test1@initial
Sending test@initial to test2
Snapshot test1@incremental
Sending test1@incremental to test2
Loading key on test2
Mounting test2 for testing - this works!
Unmount test2 again
Unmount test1 for unloading key
Unloading test1 key for receiving
Snapshot test2@sendBack
Sending test2@sendBack to test1
Loading key on test1
Mounting test1 for testing - fails with Input/Output error!

cannot mount 'testpool/test1': Input/output error

Cleaning up...
################################################ TEST 2 ##############################################
Testing again without unloading the key on test1 first and also not loading the key / mounting the dataset on test2 at all

Creating encrypted zfs-dataset test1
Snapshot test1@initial
Sending test@initial to test2
Snapshot test1@incremental
Sending test1@incremental to test2
Snapshot test2@sendBack
Sending test2@sendBack to test1
Unmount test1
Mounting test1 for testing - fails with Input/Output error!!

cannot mount 'testpool/test1': Input/output error

Cleaning up...
################################################ TEST 3 ###############################################
Testing the same again unencrypted

Creating encrypted zfs-dataset test1
Snapshot test1@initial
Sending test@initial to test2
Snapshot test1@incremental
Sending test1@incremental to test2
Snapshot test2@sendBack
Sending test2@sendBack to test1
Unmount test1
Mounting test1 for testing - now it works!
Snapshot test1@incremental_1
Sending test1@incremental_1 to test2
Cleaning up...
################################################## TEST 4 #################################################
Testing again but writing data to test1 after receiving but before unmounting

Creating encrypted zfs-dataset test1
Snapshot test1@initial
Sending test@initial to test2
Snapshot test1@incremental
Sending test1@incremental to test2
Snapshot test2@sendBack
Sending test2@sendBack to test1
Writing data to test1
Unmount test1
Mounting test1 for testing - now it works!
Getting the data from test1 back

Testdata

Snapshot test1@incremental_1
Sending test1@incremental_1 to test2
Cleaning up...
################################################## TEST 5 #####################################################
Testing again and altering a file remotely

Creating encrypted zfs-dataset test1
Snapshot test1@initial
Sending test@initial to test2
Snapshot test1@incremental
Sending test1@incremental to test2
Loading key on test2
Mounting test2 for testing - this works!
Writing data to test2
Unmount test2 again
Unmount test1 for unloading key
Unloading test1 key for receiving
Snapshot test2@sendBack
Sending test2@sendBack to test1
Loading key on test1
Mounting test1 for testing - fails with Input/Output error!

cannot mount 'testpool/test1': Input/output error

Cleaning up...
Destroy testpool and remove /tmp/foo
@GeneralGresi GeneralGresi added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels Apr 30, 2021
@rincebrain
Copy link
Contributor

rincebrain commented Apr 30, 2021

Fascinating but true data points I found while curiously poking at this:
If you have a dataset testpool/test1, generated with the above test case, and throwing EIO on mount:

  • zfs rollback testpool/test1@sendBack does not affect this error, and zfs get written testpool/test1 says 0, but...
  • If you zfs send -Rw testpool/test1@sendBack | zfs recv testpool/test3;zfs load-key testpool/test3;zfs mount testpool/test3; it works.
  • Exporting and importing the pool does not influence the behavior. Nor does rebooting the system.
  • While zpool status reports an error in <0x0> whenever you try mounting testpool/test1, none of the R/W/C counters for the pool are incremented, and zpool scrub does not report any such error (indeed, scrubbing twice makes it report no errors...until you try zfs mount again.)

edited to add: worth noting the test cases present here stop breaking on 2.1.0-rc4, presumably because of d1d4769, the commit that #11300 is improving upon, but existing broken datasets will still EIO on attempted mount.

@behlendorf
Copy link
Contributor

There's actually a WIP PR open for this, see #11300 for all the details. There are still some outstanding concerns which need to be worked through in the PR, but the underlying issue is understood.

@gamanakis
Copy link
Contributor

Closed by #12981.

@behlendorf behlendorf removed the Status: Triage Needed New issue which needs to be triaged label Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

4 participants