-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compressed receive with different ashift can result in incorrect PSIZE on disk #8462
Comments
I don't know if I'l have time to work on this in the near future. Im going to unassign it from me for the moment. |
No worries, I accidentally assigned it to you, thinking that I was requesting a code review. |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
It looks like this issue may have been partially modified by the lightweight write changes. The code that directly manipulated the arc buf hdr in dmu_recv was removed, except in the case of spill blocks. There is no longer an arc_buf_hdr for the lightweight writes; instead, they stay as lightweight write structures until dbuf_sync_lightweight, when they get For the |
I think those would work with compressed streams, and then you could add assertions in the zio pipeline that the buffer/psize provided is a multiple of 1<<ashift. It might be cleaner if the zio pipeline accepted (raw) writes that are not sector-aligned and rounded them up itself, as it does for normal writes. However, I'm not sure that either of these approaches will work in practice with encrypted streams, because the PSIZE (of L0-blocks) is verified by the MAC, so I think it can not change across an encrypted send. See |
After some offline discussions, we decided that the plan we're going to attempt moving forwards is to fix the PSIZE for future cross-ashift compressed-but-not-encrypted receives. This can be done pretty simply in the receive logic or the zio pipeline. Fixing existing incorrect blocks is much more challenging; rewriting the blocks to use the correct PSIZE would be slow, fixing just the ds_ metadata would also be somewhat slow, and no solution we've come up with would allow us to fix the PSIZE of received encrypted blocks. We also considered modifying the |
…E on disk (openzfs#8462) Signed-off-by: Paul Dagnelie <[email protected]>
…E on disk (openzfs#8462) Signed-off-by: Paul Dagnelie <[email protected]>
…E on disk We round up the psize to the nearest multiple of the asize or to the lsize, whichever is smaller. Once that's done, we allocate a new buffer of the appropriate size, zero the tail, and copy the data into it. This adds a small performance cost to these kinds of writes, but fixes the bookkeeping problems. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Co-authored-by: Matthew Ahrens <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#12522 Closes openzfs#8462
…E on disk We round up the psize to the nearest multiple of the asize or to the lsize, whichever is smaller. Once that's done, we allocate a new buffer of the appropriate size, zero the tail, and copy the data into it. This adds a small performance cost to these kinds of writes, but fixes the bookkeeping problems. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Co-authored-by: Matthew Ahrens <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#12522 Closes openzfs#8462
System information
N/A, problem found from code inspection, verified on Illumos
Describe the problem you're observing
When performing a compressed send, the PSIZE of the block is used as the compressed_size of the record. When the receiving system receives these records, they do a raw write, taking the compressed data and writing it directly to disk. When the sending pool has disks with ashift=9 and the receiving system has disks with ashift=12, this result in a bug.
For normal writes,
zio_write_compress
rounds the PSIZE of the blkptr up to the ashift of the smallest ashift device. For raw writes, this step is bypassed. Instead, the zio's io_size is used directly as the psize. The zio's io_size comes from the arc buf header's size viadbuf_sync_leaf -> dbuf_write -> arc_write -> zio_write
. The arc buf header's size is set to be the compressed_size of the record inreceive_read_record
. Since this size is from a system with a different set of ashifts, this can result in a situation where a block is stored using 4kb on disk, but the psize is only 512 bytes.So far the only negative effect of this issue I've found is that the dataset's bookkeeping is messed up, resulting in incorrect lrefer and compression ratio statistics. There may be other issues lurking, however.
Describe how to reproduce the problem
zdb -vvvvv
to examine the block pointers.Include any warning/errors/backtraces from the system logs
N/A
The text was updated successfully, but these errors were encountered: