Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux 4.14, 4.19, 5.0+ compat: SIMD save/restore #9406

Merged
merged 1 commit into from
Oct 24, 2019

Conversation

behlendorf
Copy link
Contributor

Motivation and Context

Issue #9346

Description

Contrary to initial testing we cannot rely on these kernels to
invalidate the per-cpu FPU state and restore the FPU registers.
Therefore, the kfpu_begin() and kfpu_end() functions have been
updated to unconditionally save and restore the FPU state.

How Has This Been Tested?

Manually tested on a 4.19 and 5.2 kernel using the very effective
reproducer provided in #9346. While scrubbing a pool to force a
large number of checksum operations execute mprime -t to run
the internal mprime stress tests.

Without this change mprime errors out almost immediately, with
this PR applied no errors have been observed for either kernel.
The xsave, fxsr, and fnsave save/restore call paths where each
tested by modifying the code to force each callpath, and then
limiting the allowed SIMD instruction as appropriate (e.g. sse-only
for fxsr).

@Fabian-Gruenbichler would you mind reviewing this change.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 4, 2019
Copy link
Contributor

@Fabian-Gruenbichler Fabian-Gruenbichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cursory glance looks okay, I'll take a closer look and test early next week!

include/os/linux/kernel/linux/simd_x86.h Show resolved Hide resolved
include/os/linux/kernel/linux/simd_x86.h Show resolved Hide resolved
include/os/linux/kernel/linux/simd_x86.h Outdated Show resolved Hide resolved
include/os/linux/kernel/linux/simd_x86.h Outdated Show resolved Hide resolved
include/os/linux/kernel/linux/simd_x86.h Outdated Show resolved Hide resolved
@behlendorf
Copy link
Contributor Author

Updated to address @Fabian-Gruenbichler 's feedback. Thanks!

@Fabian-Gruenbichler
Copy link
Contributor

as discussed in #9346, this PR is apparently still incomplete/broken on some systems.. I'll try to dig in today - please don't merge it yet ;)

@behlendorf
Copy link
Contributor Author

@Fabian-Gruenbichler any assistance would be appreciated, and nothing will be merged until we get this sorted out. Unfortunately, I still haven't been able to reproduce the issue in my testing. I saw in #9346 you were able to reproduce the issue though, would you mind sharing your reproduction steps.

@Fabian-Gruenbichler
Copy link
Contributor

SETUP

Debian Buster, baremetal:

vendor_id       : GenuineIntel
cpu family      : 6
model           : 79
model name      : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
stepping        : 1
microcode       : 0xb00002e
cpu MHz         : 2099.815
cache size      : 20480 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 20
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts flush_l1d
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips        : 4199.63
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

following FPU information is reported on startup:

[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.

RAIDZ-1 with 4 SSDs, default 0.8.2 settings

test consist of the following operations in parallel:

$ mprime -t
$ dd if=/dev/zero of=/testpool/testfile.dat bs=16M count=20000 oflag=direct status=progress
$ zpool scrub

the scrub gets repeated from time to time if it has finished. whole procedure left running until mprime reports errors, or a few test rounds have passed successfully.

I tested the following kernels:

/boot/vmlinuz-4.19.0-6-amd64 (stock Debian Buster)
/boot/vmlinuz-4.19.77-041977-generic (4.19.77 mainline, Ubuntu PPA)
/boot/vmlinuz-5.0.21-2-pve (our Proxmox VE kernel, based on Ubuntu Disco's)
/boot/vmlinux-5.2.0-050200-generic (5.2 mainline, Ubuntu PPA)
/boot/vmlinuz-5.2.19-050219-generic (5.2.19 mainline, Ubuntu PPA)
/boot/vmlinuz-5.3.5-050305-generic (5.3.5 mainline, Ubuntu PPA)

Ubuntu Mainline PPA

each individual kernel+code combination got tested with the following sequence:

git checkout FOO
git clean -xdf
./autogen.sh && ./configure && make -j $(nproc)
./scripts/zfs.sh
./cmd/zpool/zpool import testpool
TEST
./scripts/zfs.sh -u

dmesg -wk got monitored to ensure correct modules are loaded/unloaded

I tested both this PR, current master, and current master with the following diff applied:

commit 3f5676a332490b12c8ee4e4052861741288f55a1
Author: Thomas Lamprecht <[email protected]>
Date:   Wed Sep 25 10:48:48 2019 +0200

    FPU register save/restore is also required on 5.0 kernels

    NOTE: the kernel needs to have the copy_kernel_to_xregs_err,
    copy_kernel_to_fxregs_err and copy_kernel_to_fregs_err functions
    backported for this to work.

    Signed-off-by: Thomas Lamprecht <[email protected]>

diff --git a/include/os/linux/kernel/linux/simd_x86.h b/include/os/linux/kernel/linux/simd_x86.h
index c59ba4174..869c3909b 100644
--- a/include/os/linux/kernel/linux/simd_x86.h
+++ b/include/os/linux/kernel/linux/simd_x86.h
@@ -172,7 +172,6 @@ kfpu_begin(void)
        preempt_disable();
        local_irq_disable();

-#if defined(HAVE_KERNEL_TIF_NEED_FPU_LOAD)
        /*
         * The current FPU registers need to be preserved by kfpu_begin()
         * and restored by kfpu_end().  This is required because we can
@@ -181,11 +180,11 @@ kfpu_begin(void)
         * context switch.
         */
        copy_fpregs_to_fpstate(&current->thread.fpu);
-#elif defined(HAVE_KERNEL_FPU_INITIALIZED)
+
+
+#if defined(HAVE_KERNEL_FPU_INITIALIZED)
        /*
-        * There is no need to preserve and restore the FPU registers.
-        * They will always be restored from the task's stored FPU state
-        * when switching contexts.
+        * Was removed with 5.2 as it was always set to 1 there
         */
        WARN_ON_ONCE(current->thread.fpu.initialized == 0);
 #endif
@@ -194,7 +193,6 @@ kfpu_begin(void)
 static inline void
 kfpu_end(void)
 {
-#if defined(HAVE_KERNEL_TIF_NEED_FPU_LOAD)
        union fpregs_state *state = &current->thread.fpu.state;
        int error;

@@ -206,7 +204,6 @@ kfpu_end(void)
                error = copy_kernel_to_fregs_err(&state->fsave);
        }
        WARN_ON_ONCE(error);
-#endif

        local_irq_enable();
        preempt_enable();

RESULTS

works - no ill-effects noticed
broken A - mprime reports rounding errors with values around ~0.5, with < 0.4 expected
broken B - mprime reports rounding errors with values all over the place

sometimes the errors get reported within seconds, sometimes it takes a few minutes. I am very happy about the KABI check parallelization btw, without it this would have taken ages :-P

PR

  • 4.19.0-6-amd64: broken B
  • 4.19.77-041977-generic: broken B
  • 5.0.21-2-pve: broken B
  • 5.2.0-050200-generic: broken B
  • 5.2.19-050219-generic: broken B
  • 5.3.5-050305-generic: broken B

=> very consistently broken

master

  • 4.19.0-6-amd64: broken A (expected)
  • 4.19.77-041977-generic: broken A (expected)
  • 5.0.21-2-pve: broken A (expected)
  • 5.2.0-050200-generic: broken B
  • 5.2.19-050219-generic: broken B
  • 5.3.5-050305-generic: broken B

=> breakage switches from no save/restore < 5.2 to broken save/restore >= 5.2 ?

patched master

  • 4.19.0-6-amd64: not applicable (missing backport of _err functions)
  • 4.19.77-041977-generic: (missing backport of _err functions)
  • 5.0.21-2-pve: works (expected)
  • 5.2.0-050200-generic: broken B (expected == master)
  • 5.2.19-050219-generic: broken B (expected == master)
  • 5.3.5-050305-generic: broken B (expected == master)

=> save restore works on < 5.2 (with backported _err restore), breaks on >= 5.2 ?

5.2 introduced the "defer FPU state load until return to userspace" change, which seems like a likely culprit for why (patched) master breaks then. but OTOH, this would mean that the code in master never worked for any kernel (< 5.2 was broken since it did no copy/restore whatsoever, and >= 5.2 did a broken copy/restore)?

I don't understand yet why the PR is broken across the board - it does make sense that it is broken on >= 5.2 since it does essentially the same as master (I started some tests yesterday, but those were all on 5.3 which I only realized late in the testing was broken for master as well, so most of that testing went into the wrong direction :-/)

@Fabian-Gruenbichler
Copy link
Contributor

okay, so I dug a little deeper and AFAIU the situation is as follows

kernel <= 5.1

  • kernel always saves state of old task and restores state of new task when context switching (if FPU is initialized)
  • kernel additionally saves and restores state if a kernel thread uses the FPU via kernel_fpu_begin/end

kernel >= 5.2

  • kernel always saves state of old task when switching from userland to kernel land, and sets flag TIF_NEED_FPU_LOAD
  • kernel restores old state before switching back to userland, if TIF_NEED_FPU_LOAD is set AND FPU state is invalid
  • kernel clears TIF_NEED_FPU_LOAD before switching back to userland, if TIF_NEED_FPU_LOAD is set
  • kernel saves state and sets flag if a NON-kernel thread whose state has not been saved yet uses the FPU via kernel_fpu_begin/end!?

it seems likely that we trigger a situation where we modified the FPU state, but the kernel thinks it is still valid, and thus skips the restore of the state that the kernel saved automatically on context-switch. we can't do the invalidation that the kernel does though, since we don't have access to the needed helpers. I'm a bit stumped on how to proceed - but I'll try to think some more tonight/tomorrow.

@behlendorf behlendorf added Status: Work in Progress Not yet ready for general review and removed Status: Code Review Needed Ready for review and testing labels Oct 10, 2019
@behlendorf
Copy link
Contributor Author

@Fabian-Gruenbichler thank's for digging in to this. I ended up coming to basically the same conclusion, we're somehow leaving the kernel FPU in an inconsistent state despite the save/restore to the task struct. Given that, I decided to try a slightly different approach. The updated PR now uses a per-cpu variable to save/restore the FPU state to ensure nothing can modify it. As before the basic idea is to put everything back as we found it so what the kernel thinks is valid is still valid and there's no need to invalidate the register state. This is still a WIP and needs some more work/cleanup, but my early results are encouraging using the mprime + dd + scrub test case.

@Fabian-Gruenbichler
Copy link
Contributor

giving it a spin atm - the approach looks sound, initial testing as well! :)

should be more robust since we just need to watch for new save/restore instructions appearing - that is usually known quite a while before they show up in actual hardware so we should not be in for surprises hopefully ;)

Copy link
Contributor

@Fabian-Gruenbichler Fabian-Gruenbichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

been running it for ~2h without any issues cropping up - looking good ;)

some additional questions/suggestions, mostly stylistic.

include/os/linux/kernel/linux/simd_x86.h Show resolved Hide resolved
include/os/linux/kernel/linux/simd_x86.h Show resolved Hide resolved
include/os/linux/kernel/linux/simd_x86.h Show resolved Hide resolved
include/os/linux/kernel/linux/simd_x86.h Outdated Show resolved Hide resolved

for_each_possible_cpu(cpu) {
zfs_kfpu_fpregs[cpu] = kmalloc_node(sizeof (struct fpu),
GFP_KERNEL, cpu_to_node(cpu));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it guaranteed that all variants of saving overwrite their respective fpregs_state value completely, so that restoring does not leak (uninitialized) memory? or do we need to initialize this here and re-initialize it after each restore (or alternatively, before each save?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thought. I've updated the patch to zero this memory when initially allocated. As for the need to re-initialize after a restore since we're using dedicated memory for this only available to zfs I don't believe we need to re-initialize it after each restore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for not being verbose enough - what I am worried about is

task A does something with FPU
ZFS saves state (does this always write the full fsave/fxsave/xsave struct?)
ZFS restores state
task A
kernel clears HW FPU state if needed
task B
ZFS saves state (same question?)
ZFS restore (if the two questions above are not a yes, then this might now restore partial state from task A)
task B

or

task A does something with FPU
ZFS saves state (same question)
ZFS restores state
ZFS saves state (same question)
ZFS restores state
task A (potentially sees (partial) state from ZFS FPU operations in HW registers)

it's probably just a question of reading the spec - I won't have time for that today unfortunately, but hopefully on Monday.

module/zcommon/zfs_prop.c Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Oct 10, 2019

Codecov Report

Merging #9406 into master will decrease coverage by 0.13%.
The diff coverage is 93.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9406      +/-   ##
==========================================
- Coverage   79.17%   79.03%   -0.14%     
==========================================
  Files         412      412              
  Lines      123602   123577      -25     
==========================================
- Hits        97864    97674     -190     
- Misses      25738    25903     +165
Flag Coverage Δ
#kernel 79.62% <83.87%> (-0.12%) ⬇️
#user 66.73% <93.54%> (-0.17%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f3dc4a8...b6a1d5b. Read the comment docs.

@behlendorf behlendorf force-pushed the issue-9346 branch 3 times, most recently from f95fa71 to 4d5cfde Compare October 10, 2019 21:57
Contrary to initial testing we cannot rely on these kernels to
invalidate the per-cpu FPU state and restore the FPU registers.
Nor can we guarantee that the kernel won't modify the FPU state
which we saved in the task struck.

Therefore, the kfpu_begin() and kfpu_end() functions have been
updated to save and restore the FPU state using our own dedicated
per-cpu FPU state variables.

This has the additional advantage of allowing us to use the FPU
again in user threads.  So we remove the code which was added to
use task queues to ensure some functions ran in kernel threads.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#9346
@behlendorf
Copy link
Contributor Author

This is ready for another round of review and testing.

  • (Most) review feedback incorporated.
  • Additional cleanup / refactoring
  • Remove taskq dispatch code for encryption and micro benchmarks (no longer needed)

@Fabian-Gruenbichler
Copy link
Contributor

It might make sense to split the revert of the taskq dispatching into its own commit - it's not yet part of 0.8.x, would make for easier backporting of the rest ;)

Currently giving this another round of battering, will report back with results.

@behlendorf
Copy link
Contributor Author

@Fabian-Gruenbichler were you able to perform any testing with the PR? Any results?

@behlendorf behlendorf added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Oct 18, 2019
@Fabian-Gruenbichler
Copy link
Contributor

@Fabian-Gruenbichler were you able to perform any testing with the PR? Any results?

sorry for missing posting an update - yes, I left our reproducer and some additional stress-testing running for quite a while without any visible issues.

besides the mask constification (which definitely falls into the 'nit' category) there was only the part about whether we need clearing / re-initialization on each begin/end cycle:

sorry for not being verbose enough - what I am worried about is

task A does something with FPU
ZFS saves state (does this always write the full fsave/fxsave/xsave struct?)
ZFS restores state
task A
kernel clears HW FPU state if needed
task B
ZFS saves state (same question?)
ZFS restore (if the two questions above are not a yes, then this might now restore partial state from task A)
task B

or

task A does something with FPU
ZFS saves state (same question)
ZFS restores state
ZFS saves state (same question)
ZFS restores state
task A (potentially sees (partial) state from ZFS FPU operations in HW registers)

not sure whether you saw that or not? I am not very fond of Github's review interface..

basically the question is - do we need to care about

  • whether saving into memory can leave some partial prior state in the fpregs struct
  • that could than be restored by the next restore
    (my gut feeling says no)

and

  • whether restore can just restore SOME registers
    (here I am not so sure!)

both could leak ZFS or other data to the task that runs after ZFS.

@behlendorf
Copy link
Contributor Author

Sorry I missed that comment!

does this always write the full fsave/fxsave/xsave struct?

Good question. Yes, that is my understanding. As long as we set all of the bits in the xsave mask (which we do) we should be saving and restoring the registers exactly as they were and not leaking any ZFS state. We could try to get fancier and only save/restore the registers we're going to use, but I wanted to try and keep things simple.

@behlendorf
Copy link
Contributor Author

@Fabian-Gruenbichler if you're satisfied with the current PR, could you please approve it. If not, then let's see if we can get any remaining concerns sorted out.

@interduo
Copy link

@tonyhutter Are You planning to release 0.8.3 version if this PR will go upstream?

Copy link
Contributor

@Fabian-Gruenbichler Fabian-Gruenbichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XSAVE/XRSTOR (and variants) don't always save the full state even with an all-1 mask, but they keep track of what got saved and (re-)initialize the rest upon restore.
FXSAVE/FSAVE always save the full state AFAICT (unless SSE is disabled altogether ;)), FSAVE even clears the registers afterwards. FXRSTOR/FRSTOR thus always restore the full state as well.

the remaining stylistic changes can be pulled in as follow-ups.

@tonyhutter
Copy link
Contributor

@interduo yes, we'll include it in the next release assuming we don't find any other issues.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 24, 2019
@behlendorf behlendorf merged commit 10fa254 into openzfs:master Oct 24, 2019
@interduo
Copy link

@tonyhutter my question was a little bit different - will this PR could be a purpose for next release?
Because this finally closes one of the major performance issues (#8836 and #8793)

@vorsich-tiger
Copy link

vorsich-tiger commented Oct 26, 2019

... assuming we don't find any other issues.

@Fabian-Gruenbichler
@behlendorf
I'd appreciate very much if you'd help to establish here the final and valid set of assumptions that this new solution is based on.
I'm asking this because while trying to understand the validity of the new solution, I constantly ran into the problem that I wondered what kind of implicit assumptions about kernel/scheduler behaviour can be made (and which cannot).
Specifically I'd like to get reasoned confirmation that the following potential loophole in the scheme does not exist:
Let me start with a short summary of how I understand the new approach.
Please correct me where I'm wrong:

  1. zfs module (when FPU API is not available) in future does not intend to rely on any specific kernel fpu treatment behaviour
  2. thus, zfs has a per cpu private FPU state save/restore buffer
  3. when any zfs in kernel thread starts using the fpu, it saves the existing/current fpu state to the thread-current cpu specific zfs save buffer
  4. when any zfs in kernel thread stops using the fpu, it restores the "original" fpu state of the thread-current cpu specific zfs save buffer
  5. OTOH - the current kernels seem to track 1 fpu state per thread though

My worries now concern the following:
What if there exists any (by intention) supported kernel/scheduler (i.e. maybe an RT kernel) which preemts a zfs in kernel thread between 3. and 4. ?
There I'd expect a problem if the scheduler assigns that cpu which was active at 3. (but did not yet arrive at 4., so it has effectively "corrupted" the fpu state) to a completely different thread - in that case the kernel's view would be assuming the fpu's state is identical to the saved state at 3., right ?
The kernel might decide to not restore the fpu state from the new thread's save buffer if that thread and the zfs thread alternate on that cpu, right ?
Further, when the zfs thread is resumed it fails to restore the fpu state of the time of the preemtion, right ?
What assumption can be made that this does not happen ?
Is this by general kernel design rules ?

@Fabian-Gruenbichler
Copy link
Contributor

Fabian-Gruenbichler commented Oct 28, 2019 via email

@tonyhutter
Copy link
Contributor

@interduo I would say that this patch is a contributing factor to wanting to put out a new release, but there are other reasons to create a 0.8.3 release as well (like misc bugfixes that have gone into master). I imagine we'll start putting together a 0.8.3 patch list in the upcoming weeks.

@interduo
Copy link

@tonyhutter is there any hope to see 0.8.3 in this month?

@tonyhutter
Copy link
Contributor

@interduo probably not, since Thanksgiving is this week. I was planning to talk to @behlendorf tomorrow and go over the potential patchlist though. There's ~260 patches in master that aren't in the release branch, and we need to figure out which ones we want to include.

@interduo
Copy link

interduo commented Nov 26, 2019

So maybe there is a need to do 0.9.x-rc1 release and then include all the patches?
I am always thinking why the release cycle of zfs is taking always so long - last was at Sep 26 2019.
This would give us opportunity to do some more testing at testing envirnoments before final release (...)

Ok, thanks for the answer - we are patiently waiting for release :)

@tonyhutter
Copy link
Contributor

tonyhutter commented Nov 26, 2019

FYI - we actually plan to skip 0.9.x and go straight to a 2.0 release in 2020. 2.0 will include FreeBSD support. See slide 21 here:
https://drive.google.com/file/d/197jS8_MWtfdW2LyvIFnH58uUasHuNszz/view

The 2.0 release will include all the patches from master. I imagine we'll do 2.0-rc releases like we did with 0.7.0 and 0.8.0.

@faern
Copy link

faern commented Nov 26, 2019

Ah, the good old Windows/iPhone trick of skipping version 9.

@interduo I think a few months is an extremely fast release cycle for a filesystem. Right now I'm also eager for SIMD fixes, but except for that I don't want what's taking care of all my data to change often at all.

@spacelama
Copy link

I've had some general unstableness since updating to 0.8.2-2 (debian backports). Including general protection faults not apparently in ZFS code (alas, I've lost all the backtraces), and just hangs that went unlogged. Debian do have the their own patch to the patch:
zfs-linux (0.8.2-2) unstable; urgency=medium

  • Remove a patch implicated in data corruption (related to
    linux-5.0-simd-compat.patch)

Any chance they've screwed up the FPU restore, what would be the symptoms, and would they manifest at random places outside the ZFS code?

What's the patch they should be applying?

@cdluminate
Copy link
Contributor

@spacelama What you mentioned in the changelog is exactly this patch:

https://salsa.debian.org/zfsonlinux-team/zfs/blob/master/debian/patches/series#L14
https://salsa.debian.org/zfsonlinux-team/zfs/blob/master/debian/patches/Fix-CONFIG_X86_DEBUG_FPU-build-failure.patch

And I'd say that we are not patching ZFS without forwarding to upstream, except for those distro-specific ones.

@interduo
Copy link

interduo commented Dec 11, 2019

@interduo probably not, since Thanksgiving is this week. I was planning to talk to @behlendorf tomorrow and go over the potential patchlist though. There's ~260 patches in master that aren't in the release branch, and we need to figure out which ones we want to include.

@tonyhutter is there any hope to see 0.8.3 in december or january?

@bobobo1618
Copy link

Did this make it into the 0.8.3 release? I see in the changes:

  • SIMD: Use alloc_pages_node to force alignment (seems to be 35155c0)
  • Linux 5.0 compat: SIMD compatibility (seems to be 62c034f)

But I don't see 10fa254.

@bobobo1618
Copy link

Ah, I see. 62c034f squashes 10fa254 and a few other commits, so yes, this is in 0.8.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants