Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw sending to a pool with larger ashift results in unmountable filesystem #13067

Closed
gamanakis opened this issue Feb 5, 2022 · 6 comments · Fixed by #13074
Closed

Raw sending to a pool with larger ashift results in unmountable filesystem #13067

gamanakis opened this issue Feb 5, 2022 · 6 comments · Fixed by #13074
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@gamanakis
Copy link
Contributor

gamanakis commented Feb 5, 2022

System information

Type Version/Name
Distribution Name archlinux
Distribution Version rolling
Kernel Version 5.15.16-1-lts
Architecture x86_64
OpenZFS Version v2.1.99-756_gf2c5bc150

Describe the problem you're observing

Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with ashift=12 results to failure when mounting pool2/encrypted (Input/Output error).
Notably, the opposite, raw sending from a larger ashift to a lower one does not fail.

Describe how to reproduce the problem

zpool create -f pool1 -o ashift=9 ~/vdevs/a
zpool create -f pool2 -o ashift=12 ~/vdevs/b
echo 'password' | zfs create -o encryption=on -o keyformat=passphrase pool1/enc9

zfs snap pool1/enc9@snap
zfs send -w pool1/enc9@snap | zfs receive pool2/enc9

echo password | zfs load-key pool2/enc9

zfs mount pool2/enc9
cannot mount 'pool2/enc9': Input/output error

Setting zfs_flags=512 and looking at zfs_dbgmsg, arc_untransform() fails. If we instrument that with spl_dumpstack() we get 2 stacks:

[  +0.000001] Call Trace:
[  +0.000002]  <TASK>
[  +0.000001]  dump_stack_lvl+0x46/0x62
[  +0.000054] Showing stack for process 43326
[  -0.000048]  arc_untransform+0x3a/0xa0 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000107]  dbuf_read_verify_dnode_crypt+0x162/0x1d0 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000104]  ? queued_spin_unlock+0x5/0x10 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000114]  dbuf_read_impl.constprop.0+0x16b/0x560 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000119]  ? spl_kmem_cache_alloc+0x87/0x270 [spl 5d3cb12fc1fda75fa74cd25ad9350e8b2595be43]
[  +0.000018]  dbuf_read+0x126/0x690 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000093]  dmu_buf_hold+0x69/0xa0 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000096]  ? dbuf_rele_and_unlock+0x236/0x620 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000090]  zap_lockdir+0x77/0x150 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000154]  zap_lookup_norm+0x62/0xe0 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000152]  ? spl_kmem_alloc_impl+0xb2/0xd0 [spl 5d3cb12fc1fda75fa74cd25ad9350e8b2595be43]
[  +0.000011]  zap_lookup+0x12/0x30 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000159]  sa_setup+0x310/0x680 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000138]  zfsvfs_init+0x3c3/0x5b0 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000180]  ? spl_kmem_alloc_track+0xf4/0x160 [spl 5d3cb12fc1fda75fa74cd25ad9350e8b2595be43]
[  +0.000032]  zfsvfs_create_impl+0x1e9/0x2b0 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000186]  zfsvfs_create+0x98/0x100 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000154]  zfs_domount+0xc8/0x460 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000163]  zpl_fill_super+0x34/0xb0 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000164]  zpl_mount_impl+0x165/0x1c0 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000153]  zpl_mount+0x2b/0x90 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000189]  legacy_get_tree+0x27/0x50
[  +0.000005]  vfs_get_tree+0x25/0xc0
[  +0.000003]  path_mount+0x485/0xa70
[  +0.000003]  __x64_sys_mount+0x11f/0x160
[  +0.000003]  do_syscall_64+0x5c/0x90
[  +0.000005]  ? exc_page_fault+0x72/0x160
[  +0.000002]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  +0.000001] Call Trace:
[  +0.000002]  <TASK>
[  +0.000003]  dump_stack_lvl+0x46/0x62
[  +0.000006]  arc_untransform+0x3a/0xa0 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000116]  dbuf_read_verify_dnode_crypt+0x162/0x1d0 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000079] Showing stack for process 43847
[  +0.000018]  ? __list_add+0x12/0x40 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000141]  dbuf_read_bonus+0x20/0x1c0 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000094]  ? zrl_add_impl+0xf7/0x150 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000166]  dbuf_read_impl.constprop.0+0x321/0x560 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000136]  ? __raw_spin_unlock+0x5/0x10 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000090]  ? dbuf_rele_and_unlock+0x236/0x620 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000091]  dbuf_read+0x126/0x690 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000091]  dmu_bonus_hold_by_dnode+0x8d/0x210 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000094]  dmu_bonus_hold+0x56/0xa0 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000092]  dmu_objset_space_upgrade+0xa6/0x230 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000097]  dmu_objset_id_quota_upgrade_cb+0xa9/0x190 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000096]  dmu_objset_upgrade_task_cb+0xda/0x160 [zfs 70de6298eedd020573bd4d01f34f042d715f63ca]
[  +0.000096]  taskq_thread+0x232/0x490 [spl 5d3cb12fc1fda75fa74cd25ad9350e8b2595be43]
[  +0.000015]  ? wake_up_q+0x90/0x90
[  +0.000006]  ? taskq_thread_spawn+0x60/0x60 [spl 5d3cb12fc1fda75fa74cd25ad9350e8b2595be43]
[  +0.000011]  kthread+0x127/0x150
[  +0.000005]  ? set_kthread_struct+0x50/0x50
[  +0.000003]  ret_from_fork+0x22/0x30
[  +0.000005]  </TASK>

arc_untransform() fails because arc_hdr_decrypt->spa_do_crypt_abd->zio_do_crypt_data->zio_do_crypt_uio fails.

@gamanakis
Copy link
Contributor Author

@pcd1193182 would you mind taking a look?

@gamanakis gamanakis changed the title Raw sending to a pool with larger ashift fails Raw sending to a pool with larger ashift results in unmountable dataset Feb 5, 2022
@gamanakis gamanakis changed the title Raw sending to a pool with larger ashift results in unmountable dataset Raw sending to a pool with larger ashift results in unmountable filesystem Feb 5, 2022
@gamanakis
Copy link
Contributor Author

gamanakis commented Feb 6, 2022

If we dump the "faulty" ashift 12 filesystem that received the raw snapshot from the pool with ashift 9 we get:

zfs send -w pool2/enc9@snap

226 WRITE object = 35 type = 46 checksum type = 7 compression type = 15 flags = 0 offset = 0 logical_size = 1536 comp    ressed_size = 1536 payload_size = 1536 props = 8f00020002 salt = bc0345ace03fa2b7 iv = a057c5c4fadf03a2a57577f5 m    ac = 227444816f161536270982f81f8e4916

Notably this differs from the original send stream:

zfs send -w pool1/enc9@snap

226 WRITE object = 35 type = 46 checksum type = 7 compression type = 15 flags = 0 offset = 0 logical_size = 1536 comp    ressed_size = 512 payload_size = 512 props = 8f00000002 salt = bc0345ace03fa2b7 iv = a057c5c4fadf03a2a57577f5 mac     = 227444816f161536270982f81f8e4916

I don't know if this is relevant, but it seems in the first case we have a faulty psize (1536 != 512, the extra bytes are 0 padded). It may be that the faulty psize leads zio_do_crypt_uio() and subsequently arc_untransform() to fail.

That WRITE object has type = 46 denoting it is a SA attr registration, probably related to the stack involving zfs_mount()->sa_setup() above.

@gamanakis
Copy link
Contributor Author

This is interesting:
A pool with a single vdev ashift=12 has

226 WRITE object = 35 type = 46 checksum type = 7 compression type = 15 flags = 0 offset = 0 logical_size = 1536 comp    ressed_size = 1536 payload_size = 1536 props = 8f00020002 salt = bc0345ace03fa2b7 iv = a057c5c4fadf03a2a57577f5 m    ac = 227444816f161536270982f81f8e4916

whereas a pool with a single vdev ashift=9 has:

226 WRITE object = 35 type = 46 checksum type = 7 compression type = 15 flags = 0 offset = 0 logical_size = 1536 comp    ressed_size = 512 payload_size = 512 props = 8f00000002 salt = bc0345ace03fa2b7 iv = a057c5c4fadf03a2a57577f5 mac     = 227444816f161536270982f81f8e4916

Again, notice the discrepancy between the different psize.

@gamanakis
Copy link
Contributor Author

gamanakis commented Feb 6, 2022

Thus, the problem is in zio_compress_write():

When first creating the object in the pool with ashift=9 we have:

1730                         /*
1731                          * Round compressed size up to the minimum allocation
1732                          * size of the smallest-ashift device, and zero the
1733                          * tail. This ensures that the compressed size of the
1734                          * BP (and thus compressratio property) are correct,
1735                          * in that we charge for the padding used to fill out
1736                          * the last sector.
1737                          */
1738                         ASSERT3U(spa->spa_min_alloc, >=, SPA_MINBLOCKSHIFT);
1739                         size_t rounded = (size_t)roundup(psize,
1740                             spa->spa_min_alloc);
1741                         if (rounded >= lsize) {
1742                                 compress = ZIO_COMPRESS_OFF;
1743                                 zio_buf_free(cbuf, lsize);
1744                                 psize = lsize;
1745                         } else {
1746                                 abd_t *cdata = abd_get_from_buf(cbuf, lsize);
1747                                 abd_take_ownership_of_buf(cdata, B_TRUE);
1748                                 abd_zero_off(cdata, psize, rounded - psize);
1749                                 psize = rounded;
1750                                 zio_push_transform(zio, cdata,
1751                                     psize, lsize, NULL);
1752                         }
1753                 }

and when the object is received in the pool with ashift=12 we have:

1776         } else if (zio->io_flags & ZIO_FLAG_RAW_COMPRESS) {
1777                 size_t rounded = MIN((size_t)roundup(psize,
1778                     spa->spa_min_alloc), lsize);
1779
1780                 if (rounded != psize) {
1781                         abd_t *cdata = abd_alloc_linear(rounded, B_TRUE);
1782                         abd_zero_off(cdata, psize, rounded - psize);
1783                         abd_copy_off(cdata, zio->io_abd, 0, 0, psize);
1784                         psize = rounded;
1785                         zio_push_transform(zio, cdata,
1786                             psize, rounded, NULL);
1787                 }

In the second part, upon receiving, the rounded size is 1536 but the psize is 512. So it sets the psize to 1536 and now decryption will fail.

@gamanakis
Copy link
Contributor Author

gamanakis commented Feb 7, 2022

Changing:

1778                     spa->spa_min_alloc), lsize);

to

1778                     spa->spa_min_alloc), psize);

resolves the bug, however it takes the

1776         } else if (zio->io_flags & ZIO_FLAG_RAW_COMPRESS) {

codepath when raw receiving an encrypted stream because both ZIO_FLAG_RAW_COMPRESS and ZIO_FLAG_RAW_ENCRYPT are set (which shouldn't be the case in the first place).

@gamanakis
Copy link
Contributor Author

Will prepare a PR later today.

behlendorf pushed a commit that referenced this issue Feb 16, 2022
Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with
ashift=12 results to failure when mounting pool2/encrypted (Input/Output
error). Notably, the opposite, raw sending from a greater ashift to a
lower one does not fail.

This happens because zio_compress_write() falsely checks only
ZIO_FLAG_RAW_COMPRESS and not ZIO_FLAG_RAW_ENCRYPT which is also set in
encrypted raw send streams. In this case it rounds up the psize and if
not equal to the zio->io_size it modifies the block by zeroing out
the extra bytes. Because this happens in a SA attr. registration object
(type=46), the decryption fails upon mounting the filesystem, and zpool
status falsely reports an error.

Fix this by checking both ZIO_FLAG_RAW_COMPRESS and ZIO_FLAG_RAW_ENCRYPT
before deciding whether to zero-pad a block.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes #13067 
Closes #13074
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Feb 24, 2022
Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with
ashift=12 results to failure when mounting pool2/encrypted (Input/Output
error). Notably, the opposite, raw sending from a greater ashift to a
lower one does not fail.

This happens because zio_compress_write() falsely checks only
ZIO_FLAG_RAW_COMPRESS and not ZIO_FLAG_RAW_ENCRYPT which is also set in
encrypted raw send streams. In this case it rounds up the psize and if
not equal to the zio->io_size it modifies the block by zeroing out
the extra bytes. Because this happens in a SA attr. registration object
(type=46), the decryption fails upon mounting the filesystem, and zpool
status falsely reports an error.

Fix this by checking both ZIO_FLAG_RAW_COMPRESS and ZIO_FLAG_RAW_ENCRYPT
before deciding whether to zero-pad a block.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes openzfs#13067 
Closes openzfs#13074
nicman23 pushed a commit to nicman23/zfs that referenced this issue Aug 22, 2022
Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with
ashift=12 results to failure when mounting pool2/encrypted (Input/Output
error). Notably, the opposite, raw sending from a greater ashift to a
lower one does not fail.

This happens because zio_compress_write() falsely checks only
ZIO_FLAG_RAW_COMPRESS and not ZIO_FLAG_RAW_ENCRYPT which is also set in
encrypted raw send streams. In this case it rounds up the psize and if
not equal to the zio->io_size it modifies the block by zeroing out
the extra bytes. Because this happens in a SA attr. registration object
(type=46), the decryption fails upon mounting the filesystem, and zpool
status falsely reports an error.

Fix this by checking both ZIO_FLAG_RAW_COMPRESS and ZIO_FLAG_RAW_ENCRYPT
before deciding whether to zero-pad a block.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes openzfs#13067 
Closes openzfs#13074
nicman23 pushed a commit to nicman23/zfs that referenced this issue Aug 22, 2022
Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with
ashift=12 results to failure when mounting pool2/encrypted (Input/Output
error). Notably, the opposite, raw sending from a greater ashift to a
lower one does not fail.

This happens because zio_compress_write() falsely checks only
ZIO_FLAG_RAW_COMPRESS and not ZIO_FLAG_RAW_ENCRYPT which is also set in
encrypted raw send streams. In this case it rounds up the psize and if
not equal to the zio->io_size it modifies the block by zeroing out
the extra bytes. Because this happens in a SA attr. registration object
(type=46), the decryption fails upon mounting the filesystem, and zpool
status falsely reports an error.

Fix this by checking both ZIO_FLAG_RAW_COMPRESS and ZIO_FLAG_RAW_ENCRYPT
before deciding whether to zero-pad a block.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes openzfs#13067 
Closes openzfs#13074
andrewc12 pushed a commit to andrewc12/openzfs that referenced this issue Aug 30, 2022
Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with
ashift=12 results to failure when mounting pool2/encrypted (Input/Output
error). Notably, the opposite, raw sending from a greater ashift to a
lower one does not fail.

This happens because zio_compress_write() falsely checks only
ZIO_FLAG_RAW_COMPRESS and not ZIO_FLAG_RAW_ENCRYPT which is also set in
encrypted raw send streams. In this case it rounds up the psize and if
not equal to the zio->io_size it modifies the block by zeroing out
the extra bytes. Because this happens in a SA attr. registration object
(type=46), the decryption fails upon mounting the filesystem, and zpool
status falsely reports an error.

Fix this by checking both ZIO_FLAG_RAW_COMPRESS and ZIO_FLAG_RAW_ENCRYPT
before deciding whether to zero-pad a block.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes openzfs#13067 
Closes openzfs#13074
lundman pushed a commit to openzfsonwindows/openzfs that referenced this issue Sep 1, 2022
Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with
ashift=12 results to failure when mounting pool2/encrypted (Input/Output
error). Notably, the opposite, raw sending from a greater ashift to a
lower one does not fail.

This happens because zio_compress_write() falsely checks only
ZIO_FLAG_RAW_COMPRESS and not ZIO_FLAG_RAW_ENCRYPT which is also set in
encrypted raw send streams. In this case it rounds up the psize and if
not equal to the zio->io_size it modifies the block by zeroing out
the extra bytes. Because this happens in a SA attr. registration object
(type=46), the decryption fails upon mounting the filesystem, and zpool
status falsely reports an error.

Fix this by checking both ZIO_FLAG_RAW_COMPRESS and ZIO_FLAG_RAW_ENCRYPT
before deciding whether to zero-pad a block.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes openzfs#13067 
Closes openzfs#13074
andrewc12 pushed a commit to andrewc12/openzfs that referenced this issue Sep 23, 2022
Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with
ashift=12 results to failure when mounting pool2/encrypted (Input/Output
error). Notably, the opposite, raw sending from a greater ashift to a
lower one does not fail.

This happens because zio_compress_write() falsely checks only
ZIO_FLAG_RAW_COMPRESS and not ZIO_FLAG_RAW_ENCRYPT which is also set in
encrypted raw send streams. In this case it rounds up the psize and if
not equal to the zio->io_size it modifies the block by zeroing out
the extra bytes. Because this happens in a SA attr. registration object
(type=46), the decryption fails upon mounting the filesystem, and zpool
status falsely reports an error.

Fix this by checking both ZIO_FLAG_RAW_COMPRESS and ZIO_FLAG_RAW_ENCRYPT
before deciding whether to zero-pad a block.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes openzfs#13067 
Closes openzfs#13074
andrewc12 pushed a commit to andrewc12/openzfs that referenced this issue Sep 23, 2022
Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with
ashift=12 results to failure when mounting pool2/encrypted (Input/Output
error). Notably, the opposite, raw sending from a greater ashift to a
lower one does not fail.

This happens because zio_compress_write() falsely checks only
ZIO_FLAG_RAW_COMPRESS and not ZIO_FLAG_RAW_ENCRYPT which is also set in
encrypted raw send streams. In this case it rounds up the psize and if
not equal to the zio->io_size it modifies the block by zeroing out
the extra bytes. Because this happens in a SA attr. registration object
(type=46), the decryption fails upon mounting the filesystem, and zpool
status falsely reports an error.

Fix this by checking both ZIO_FLAG_RAW_COMPRESS and ZIO_FLAG_RAW_ENCRYPT
before deciding whether to zero-pad a block.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes openzfs#13067 
Closes openzfs#13074
andrewc12 pushed a commit to andrewc12/openzfs that referenced this issue Sep 23, 2022
Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with
ashift=12 results to failure when mounting pool2/encrypted (Input/Output
error). Notably, the opposite, raw sending from a greater ashift to a
lower one does not fail.

This happens because zio_compress_write() falsely checks only
ZIO_FLAG_RAW_COMPRESS and not ZIO_FLAG_RAW_ENCRYPT which is also set in
encrypted raw send streams. In this case it rounds up the psize and if
not equal to the zio->io_size it modifies the block by zeroing out
the extra bytes. Because this happens in a SA attr. registration object
(type=46), the decryption fails upon mounting the filesystem, and zpool
status falsely reports an error.

Fix this by checking both ZIO_FLAG_RAW_COMPRESS and ZIO_FLAG_RAW_ENCRYPT
before deciding whether to zero-pad a block.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes openzfs#13067 
Closes openzfs#13074
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant