-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intel QAT support when root is on a ZFS filesystem #8323
Comments
@wli5 , Hi @geppi , thanks very much for your work. When the QAT driver is "DOWN", the data will be processed by software. I have commited the changes in my repository "https://github.com/cfzhu/zfs.git", the branch name is "qat" I do not know if these changes are useful, could you do some test for this? Thank you. |
@cfzhu @wli5 Sorry for the delay in coming back but I was fighting a strange issue when testing the new code. It seems there is a problem with the checksum accelaration. Let me demonstrate. First I have checksum acceleration disabled and create a pool:
I can export the pool and reimport without problems
As soon as I enable checksum accelaration there is something strange happening.
At this stage I can still make everything sane by simply disabling checksum acceleration.
But if I enable checksum acceleration now on the imported pool it gets corrupted for once and ever.
It doesn't help anymore to disable checksum acceleration. The same happens if you create the pool right from the beginning with checksum acceleration enabled. You can work with the pool as long as it is in the initial imported state. It looks like there is something going on with the metadata of the pool because I didn't even expect that the SHA-256 acceleration would be used because the default checksum algorithm is fletcher4 as far as I remember. This is in coincidence with the fact that the qat counters for checksum in I would be very interested to understand what exactly is happening here to the pool because I was a little bit incautious when testing the new code and have a mirrored pool now that contains data i would love to get back. |
Hi @geppi, I tried to reproduce the problem on my server, but the ZFS worked well.
Is there something wrong with my steps? |
Hello @cfzhu, until that point everything was OK on my side as well. As I described in my last post the problem did start after I exported the zpool while the checksum acceleration was turned on. Since I assume that the checksum accelaration is working well and has been thoroughly tested for the code version without the changes I can only assume that it's a problem with the particular setup on my side. I'm running this on a system with a Denverton C3758 Soc and will further investigate. So I have started to dive into kernel module debugging and I'm currently creating the setup to do this. Nevertheless, maybe you can repeat the steps above and just add an export and another import of the pool at the end to see if you can replicate the problem ? Preferably on a Denverton Soc ? |
@cfzhu Sorry, I should probably better read my own posts. What I said above is wrong. The procedure you performed should in fact have already shown the problem. As I wrote in my initial post I cannot import the pool as soon as checksum acceleration is turned on. So the Obviously you cannot replicate the problem on your side and it must be something related to my particular setup. This problem is driving me crazy. |
Hi @geppi, I just came back from vacation, just want to know if there is any updates from your side? |
@cfzhu @wli5 I do now have a little more information but it doesn't explain why you can't reproduce the problem on your side. First I've tried to figure out what is wrong with a zpool exported with checksum acceleration turned on.
which requires a data size between 4K and 128K to use QAT acceleration.
That would explain why you can't import the pool anymore when checksum acceleration is turned off.
As you can see the checksum values calculated now are different from the ones that were calculated when the label was last written ! I couldn't drill down deeper to see what's going on in the actual checksum calculation routines, partially because I'm lacking the knowledge about what's going on in detail in the QAT checksum routines but also because the kernel didn't like to get stopped and continued when breaking in those routines, so the system got frozen very quick. A notable observation is that "00c0010000000000" is a recurring pattern in the calculated checksum values. The above values are from a zpool created on a 1GB file.
Not only do you see again the "00c0010000000000" pattern but also the checksums for all 4 vdev labels are the same ! As already noted all this does not explain why it's only me having this propblem. |
Hi @geppi, we eventually reproduced problem, and confirmed it is a real issue. The buffer "Cpa8U digest_buffer[sizeof (zio_cksum_t)]" in qat_checksum() stack cannot guarantee physical contiguous, and not cross pages, so the DMA of the digest result to this buffer may have problem. We will create a PR to fix it. Thanks for reporting the problem! |
Hi @geppi , sorry for the late reply, as I found some new problems, I have commited the changes in my repository: https://github.com/cfzhu/zfs/tree/qat. Could you try it again? Thank you. |
1. Support QAT when ZFS is root file-system: When ZFS module is loaded before QAT started, the QAT can be started again in post-process, e.g.: echo 0 > /sys/module/zfs/parameters/zfs_qat_compress_disable echo 0 > /sys/module/zfs/parameters/zfs_qat_encrypt_disable echo 0 > /sys/module/zfs/parameters/zfs_qat_checksum_disable 2. Verify alder checksum of the de-compress result 3. Allocate Digest, IV and AAD buffer in physical contiguous memory by QAT_PHYS_CONTIG_ALLOC. 4. Update the documentation for zfs_qat_compress_disable, zfs_qat_checksum_disable, zfs_qat_encrypt_disable. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Weigang Li <[email protected]> Signed-off-by: Chengfeix Zhu <[email protected]> Closes #8323 Closes #8610
System information
This feature request is about enabling Intel Quick Assist Technology (QAT) support when booting from a ZFS filesystem.
Background
Intel Quick Assist Technology (QAT) support was introduced to ZFS initially for hardware accelerated compression. #5846
It was extended to support acceleration of AES-GCM encryption #7282
and the acceleration of SHA256 and SHA512 checksums. #7295
The current implementation initializes the QAT support for ZFS when the ZFS kernel module is loaded.
A prerequisite for this initialization to succeed is that the Intel QAT driver has already been initialized.
If this is not the case the QAT support for ZFS will not be available even if the QAT driver becomes available at a later stage.
Issue
When booting a system from a ZFS filesystem the ZFS kernel module is loaded in an early phase from the root filesystem in the initramfs.
Currently the initramfs does not provide the QAT driver initialization required by the QAT support initialization routine in the ZFS kernel module.
Therefore the QAT support for ZFS initialization always fails when the root filesystem is ZFS.
Motivation
Using ZFS as the root filesystem is very desirable because of the copy on write based guaranteed file system integrity and the possibility to implement redundancy by using multiple disks for a ZFS root filesystem mirror.
Currently it is not possible to have both at the same time, a ZFS root filesystem and QAT support for ZFS.
The text was updated successfully, but these errors were encountered: