-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVX2 not available for RAIDZ oder Fletcher algorithms on Ubuntu 22.04 #15223
Comments
Also the new Ubuntu 22.04 LTS HWE kernel version 6.2.0-31-generic with the official Ubuntu zfs version 2.1.9-2ubuntu1.1 is showing the problem. |
I think I'm seeing the same (or very similar) issue on Debian 11 and 12 using On Debian I'm seeing very high (near 100%) multicore CPU usage during reads (writes too) from an encrypted dataset on RAID-Z2 pool. htop with hiding kernel threads disabled shows multiple
CPUs: Ryzen 5600G (baremetal), 5700X (baremetal) and i7-8650U (in VMs using Distros tested:
(maybe relevant for troubleshooting ideas: #9215) |
My suspicion, based on another report someone gave me once, was that on some systems, it wasn't correctly detecting certain newer architecture features in the compile-time checks, and so compiling them out entirely, leading to FPU functions that are using much less efficient implementations. I'm kind of tempted to either refactor the existing Linux kfpu_begin/end to include which things it thinks are supported or expose it in /proc or something to make it easier to catch that, assuming of course it is the issue at hand... I'll be at home after tomorrow and in a position to test these theories. |
The problem appears to be that e: I suspect that this is the issue that we're seeing, so when that lands, it should go away. But that doesn't help people now, now does it... e2: I think the above link, when that patch lands, will fix it, but if we want this to work in the interim, I don't see a good option other than parsing the feature bits ourselves or just doing what they do and unconditionally make the check pass and assume everyone wanting to check that also has more checks that would break if this wasn't actually true? So something like
in I don't think we can use |
With a Ubuntu 22.04 LTS test VM and CPU passthrough configuration i have seen the problem only when running it on host systems with AMD CPUs (Zen3) and not with a Intel CPU (checked with Haswell CPU E5-1630 v3). |
I literally already linked the patch discussing the bug and the previous patch breaking it. |
This also happen to my system. I'm ubuntu 22.04 with 5.15.0-83-generic kernel. ZFS is 2.1.5. And my cpu is xeon gold 6154. |
It seems that on kernel 6.1.52-1 (6.1.0-12-amd64 on Debian 12) AVX2 works again, checked on 5600G and i7-8650U. The patch mentioned earlier has landed in stable tree at 6.1.50: https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.50 |
FWIW, Ubuntu still hasn't pulled torvalds/linux@2c66ca3, though they pulled torvalds/linux@b81fac906a8f in 6.2.0-30.30. |
FWIW, the next version of Proxmox kernels (6.2.16-14) will contain the cherry-picked fix (already confirmed to fix the regression, but currently still in internal testing): https://git.proxmox.com/?p=pve-kernel.git;a=commit;h=9ba0dde971e6153a12f94e9c7a7337355ab3d0ed also already reported on the Ubuntu side, so should be fixed there at some point in the near future as well: https://bugs.launchpad.net/bugs/2034745 |
(Un)interestingly, this actually causes owners of CPUs with AVX2 to run into #10846. In my case, encryption+sha512 checksums+raidz2: I had a workload where a VM was downloading a Steam game, and I saw all txg syncing grind to a halt. |
Problem fixed with the release of kernel 5.15.0-88.98 on ubuntu 22.04. |
I just upgraded to Debian 12, but I'm still getting the same. My CPU is G4400. Anything I'm missing? ❯ uname -r
6.1.0-23-amd64
❯ cat -p /sys/module/zfs/parameters/zfs_fletcher_4_impl
[fastest] scalar superscalar superscalar4 sse2 ssse3
❯ cat -p /sys/module/zfs/parameters/zfs_vdev_raidz_impl
cycle [fastest] original scalar sse2 ssse3 |
Intel ARK doesn't mention AVX2 support for this CPU: https://ark.intel.com/content/www/us/en/ark/compare.html?productIds=88179,124968 |
System information
Type | Version/Name
Ubuntu | 22.04 LTS
Distribution Name | Ubuntu
Distribution Version | 22.04
Kernel Version | 5.15.0-82-generic and 6.2.0-31-generic
Architecture | x86
OpenZFS Version | 2.1.12-1 (self compiled) and 2.1.9-2ubuntu1.1
Describe the problem you're observing
After booting the new Ubuntu kernel 5.15.0-82-generic on a dedicated AMD Epyc Zen3 System (also with updated amd64-microcode package version 3.20191218.1ubuntu2.2 which updates the microcode version from 0xa001173 to 0xa0011d1 ) and a VM hosted on a AMD Epyc Zen3 System (an openSUSE 15.4 system with not updated kernel and microcode package) i recognized that AVX2 is not anymore available in RAIDZ or Fletcher algorithms.
Because of the not "recognized" AVX2 the fastest algorithms are now "ssse3".
Because there was also an microcode patch for Zen3 systems i've tried on the dedicated AMD Epyc Zen3 System booting the former kernel 5.15.0-79-generic with the updated microcode package. There is AVX2 available again.
Describe how to reproduce the problem
#boot Ubuntu kernel 5.15.0-82-generic on a AMD Epyc Zen3 system (e.g. AMD EPYC 7443P CPU or a VM hosted on such a system)
cat /sys/module/zfs/parameters/zfs_vdev_raidz_impl
#output is: "cycle [fastest] original scalar sse2 ssse3" instead of expected "cycle [fastest] original scalar sse2 ssse3 avx2"
cat /sys/module/zcommon/parameters/zfs_fletcher_4_impl
#output is: "[fastest] scalar superscalar superscalar4 sse2 ssse3" instead of expected "[fastest] scalar superscalar superscalar4 sse2 ssse3 avx2"
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: