Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fold aarch64 feature +fp into +neon #91608

Merged
merged 5 commits into from
Mar 23, 2022
Merged

Conversation

workingjubilee
Copy link
Member

@workingjubilee workingjubilee commented Dec 6, 2021

Arm's FEAT_FP and Feat_AdvSIMD describe the same thing on AArch64:
The Neon unit, which handles both floating point and SIMD instructions.
Moreover, a configuration for AArch64 must include both or neither.
Arm says "entirely proprietary" toolchains may omit floating point:
https://developer.arm.com/documentation/102374/0101/Data-processing---floating-point
In the Programmer's Guide for Armv8-A, Arm says AArch64 can have
both FP and Neon or neither in custom implementations:
https://developer.arm.com/documentation/den0024/a/AArch64-Floating-point-and-NEON

In "Bare metal boot code for Armv8-A", enabling Neon and FP
is just disabling the same trap flag:
https://developer.arm.com/documentation/dai0527/a

In an unlikely future where "Neon and FP" become unrelated,
we can add "[+-]fp" as its own feature flag.
Until then, we can simplify programming with Rust on AArch64 by
folding both into "[+-]neon", which is valid as it supersets both.

"[+-]neon" is retained for niche uses such as firmware, kernels,
"I just hate floats", and so on.

I am... pretty sure no one is relying on this.

An argument could be made that, as we are not an "entirely proprietary" toolchain, we should not support AArch64 without floats at all. I think that's a bit excessive. However, I want to recognize the intent: programming for AArch64 should be simplified where possible. For x86-64, programmers regularly set up illegal feature configurations because it's hard to understand them, see #89586. And per the above notes, plus the discussion in #86941, there should be no real use cases for leaving these features split: the two should in fact always go together.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 6, 2021
@workingjubilee
Copy link
Member Author

cc @adamgemmell

@rust-log-analyzer

This comment has been minimized.

@workingjubilee
Copy link
Member Author

workingjubilee commented Dec 6, 2021

Oh interesting, I'd need to fix those too. nbd, assuming this is a fine direction.

@Amanieu
Copy link
Member

Amanieu commented Dec 8, 2021

I don't really see the motivation for this. While it's true that you'll never find an actual CPU with only one FP or NEON, these are still considered separate features in the ISA and that is what we are following for feature names.

You can't end up with an invalid feature combination either since LLVM defines the neon feature as depending on fp-armv8, which means that enabling neon automatically enables fp-armv8.

@workingjubilee
Copy link
Member Author

This is informed by any other combination being specifically ruled out several times, and because users in the past have tried to use incorrect feature combinations for various architectures. It is not about whether or not LLVM will necessarily miscompile the code. It is about what users expect. They do not necessarily intuit such a nuance that on the ISA level, floating point requires the vector registers, and the vector registers require floating point. If they see two independent features, they are naturally inclined to conclude that they have independent meaning. For AArch64, they do not, and it is doubtful they ever will.

By melding the two into one, we reflect this reality, and simplify actual usage of the compiler. There is no "necessary complexity" this would be omitting: the main reason the two features here are split is a legacy detail from being based on Armv8-A, which is a red herring when the execution mode, emitted code, and management of architectural state for A64 is different.

I wish to make things simpler for users so issues like https://project-oak.github.io/rust-verification-tools/2021/05/15/verifying-vectorized-code2.html and other related issues do not come up again.

And at the moment, FEAT_PAuth is one feature as well, yet we are inclined to support an anticipated PACA/PACG split. If we decide to support splitting the ISA-level features before this has actually happened, then we have already arrogated to ourselves the power to decide what is and is not a true differentiator.

@apiraino apiraino added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Dec 9, 2021
@adamgemmell
Copy link
Contributor

adamgemmell commented Dec 9, 2021

Sorry, I've been ill the last couple of days.

  • What would we do for features that implicitly enable fp? (see my documentation PR). I think it's slightly disingenuous to not mention it, and saying they enable neon is wrong.
  • I'd argue hiding fp worsens the kind of issue you referred to (where sse2 also includes IEEE floating point). We're being less honest about what's going on underneath and muddling otherwise discrete parts of the ISA.
  • We also break the clean mapping of features to their FEAT_ counterparts in the ARMv8/9 architecture, though splitting pauth also does this.

Not really relevant but bits of the rust compiler use the LLVM name fp-armv8 directly. I couldn't find any user-code doing the same.

@camelid camelid added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 25, 2022
@workingjubilee
Copy link
Member Author

workingjubilee commented Feb 13, 2022

I have been silent because I worry there is no clear way to express what I want that is not simply repeating myself, and it is hard to express my concerns easily. I worry that I am wrong. I worry that I will go too far if I elaborate too much. I try nonetheless:

Briefly, as to cleanness: The problem is that the Arm architecture isn't clean either. There's no machine instructions defined by Arm to only live in FEAT_FP without FEAT_ASIMD in the A64 machine language. They are of the "same type". Thus, forcing -neon to mean -fp means that it would immediately break compilation, because unlike x86, there is no fallback to the x87 FPU. They have to enable soft float, instead of failing silently.

And there is no architectural state that lives only in FEAT_FP without FEAT_ASIMD either, as far as I am aware. When someone wants to disable the vector registers in a code segment, they usually want to do it because they want to avoid clobbering vector state. This is a concern throughout the Linux kernel. But any floating point operations also clobber vector state, inherently.

As @jacobbramley put it, this is undefined behavior., and LLVM offers no coherent compilation contract that I am aware of to rationalize this combination.

I worry that allowing this to stand makes resolving the current issues of -Ctarget-feature unsoundness harder in the future, because there is no definition for how to handle this combination that I am aware of in the Arm Architecture Procedure Call Standard, and the only sensible answer is to just ignore it and implicitly promote everything handling +fp to the +neon architectural feature anyways.

@Amanieu
Copy link
Member

Amanieu commented Feb 17, 2022

I would like to keep the actual features separate since they are separate in the ISA spec. However, the compiler could enforce that these two features are enabled/disabled together to disallow invalid combinations. This was done in #93782 for the paca and pacg features which share the same underlying LLVM feature but may return different results from is_aarch64_feature_detected! because they depend on OS support.

@jacobbramley
Copy link
Contributor

And at the moment, FEAT_PAuth is one feature as well ...

Actually, that's split in the CPU architecture in that there are multiple ID registers describing what, exactly is implemented.

There are clearly use-cases for disabling both FP and NEON (like the kernel one you gave), though it's semantically a bit unclean if the objective is to avoid writing to V registers. What happens when SVE comes along? It uses Z registers, which alias those. -Cdont_write_v_regs is probably what the programmer means, there, but there's no way to express that currently.

I think what this comes down to is:

  1. What does it mean to disable just one of FP or NEON?
  2. Is it ever going to be useful to do so?

The answer to 1 is not specified (architecturally or by LLVM) so I think we have to assume that we don't know. However, LLVM might define something in the future, perhaps for some non-obvious use-case.
The answer to 2, for me, is "maybe"; I don't have a compelling use-case but I don't want to say that no-one will ever want to do that. My best attempt (on #86941) was as a debugging tool, but that's admittedly quite weak.

Personally, I like the approach of tying fp and neon together, using the same mechanism that @adamgemmell introduced for paca and pacg. This forbids combinations that we don't know how to handle, whilst maintaining compatibility with a hypothetical future LLVM that defines what it means to have just one enabled.

I don't think one should imply the other, though, otherwise disabling just one could be ignored if the other is enabled by default.

@workingjubilee
Copy link
Member Author

Enforcing the XNOR / biconditional is acceptable.

I do have a lingering concern that it is still in fact simpler to not enable the FEAT_FP switch in the frontend at all as that may make it easier to bypass rustc's opinions using -Cllvm-args for experimentation for a more experienced user who knows Exactly What They Are Doing. Maybe I am wrong on that, though, or maybe that is undesirable, or maybe that is implicitly Possible Anyways! I wish to clarify that detail:

I am more concerned about making sure the language frontend handles this in a "correct" and also somewhat "user-friendly" way, on the thesis that, as the combination is unspecified, it shouldn't be surprising if a codegen backend simply refuses aarch64 with "+fp, -neon", or acts in surprising ways. If someone really tries to go around us, they'll reach there eventually, but they have "voided the warranty" in doing so. Thus I am less concerned about barring it outright for all cases and more that I think reasoning about the combination could be deferred from the frontend in this case (which incidentally minimizes code involved).

I don't think one should imply the other, though, otherwise disabling just one could be ignored if the other is enabled by default.

Well, if +neon bundles +fp, and -neon is encountered without +fp, then I am pretty sure that forces an error, at the latest when the Rust toolchain has to lower a float (either in Rust or LLVM). That said, handling this directly in Rust and enforcing "+neon, +fp", or else "-neon, -fp, +soft-float" is enough.

@workingjubilee
Copy link
Member Author

workingjubilee commented Feb 17, 2022

For more future-oriented considerations:

What happens when SVE comes along? It uses Z registers, which alias those. -Cdont_write_v_regs is probably what the programmer means, there, but there's no way to express that currently.

My understanding is that SVE2 (the one consumers will start holding in their hands Very Soon) also implies or requires Neon, in the eyes of LLVM? And I am pretty sure that LLVM exposes a "use integer registers only" toggle for the AArch64 target... ah, -mgeneral-regs-only, I think?

I would like to expose the ability to simply more finely control architectural state that is used where possible, though I have not worked out yet what that would require.

@adamgemmell
Copy link
Contributor

adamgemmell commented Feb 21, 2022

My understanding is that SVE2 (the one consumers will start holding in their hands Very Soon) also implies or requires Neon, in the eyes of LLVM?

It implicitly enables SVE, which in turn enables fp16, which enables fp.

There's several other features that implicitly enable just fp. I think we'd have to put in something to have these also enable neon to fit our model if we tie fp and neon together as otherwise the documentation would show the hypocrisy. All this also duplicates feature handling logic in LLVM and makes things more fragile. To me this shifts the default to keeping them separate but please say if you disagree.

I'm fairly inexperienced with using these flags in practice so please correct me if I'm wrong. However, when reasoning about what target_feature is supposed to do I've been using the following line from the RFC:

unconditional code generation: using the function attribute #[target_feature(enable = "avx2")] to allow the compiler to generate code under the assumption that this code will only be reached in hosts that support the feature.

The ARM Architecture Reference Manual does describe FP and Neon together. However, while there are instructions appropriate for both, there are also instructions clearly used for FP and others used for SIMD, and LLVM seems to be aware of this (their FP tests don't bother enabling neon). Does this alone not give +fp,-neon a reason to exist?

-mgeneral-regs-only

This looks to be handled just in Clang - it works by passing -fp-armv8,-crypto,-neon.

@workingjubilee
Copy link
Member Author

workingjubilee commented Feb 22, 2022

It implicitly enables SVE, which in turn enables fp16, which enables fp.

That sounds incorrect, then, actually, if SVE2 enables the FP path only and without Neon/ASIMD support, because SVE/SVE2 is predicated on Neon availability. To arbitrarily exclude the integer register instructions being generated seems incorrect.

The ARM Architecture Reference Manual does describe FP and Neon together. However, while there are instructions appropriate for both, there are also instructions clearly used for FP and others used for SIMD, and LLVM seems to be aware of this (their FP tests don't bother enabling neon). Does this alone not give +fp,-neon a reason to exist?

LLVM may have such an interpretation, but does it promise to maintain such consistently across versions? Does it agree with GCC on the precise instructions? With Cranelift?

Please understand that the RFC is deeply aspirational, and that at the moment, the entirety of all vector architectures in Rust, due to us not having a worked out story for actually handling passing data in registers when the target features may not perfectly align, labors under this penalty:

// This is a fun case! The gist of what this is doing is
// that we want callers and callees to always agree on the
// ABI of how they pass SIMD arguments. If we were to *not*
// make these arguments indirect then they'd be immediates
// in LLVM, which means that they'd used whatever the
// appropriate ABI is for the callee and the caller. That
// means, for example, if the caller doesn't have AVX
// enabled but the callee does, then passing an AVX argument
// across this boundary would cause corrupt data to show up.
//
// This problem is fixed by unconditionally passing SIMD
// arguments through memory between callers and callees
// which should get them all to agree on ABI regardless of
// target feature sets. Some more information about this
// issue can be found in #44367.
//
// Note that the platform intrinsic ABI is exempt here as
// that's how we connect up to LLVM and it's unstable
// anyway, we control all calls to it in libstd.
Abi::Vector { .. }
if abi != SpecAbi::PlatformIntrinsic
&& self.tcx.sess.target.simd_types_indirect =>
{
arg.make_indirect();
return;
}
_ => return,
}

That means that between two functions, if they call each other and are not inlined or otherwise fixed up during mem2reg, even if their target features match precisely, they will still tend to spill the entire vector state to memory. This means vector functions in Rust are always simply slower than their C or C++ equivalents, because even after all the inlining is done, they have to do a ton of cleanup and teardown at the final step. That means not inlining is unusually bad to begin with.

And when you have established +fp is valid on an aarch64 target, then you know it is always valid to call Neon instructions. If a piece of aarch64 code compiled with +fp calls into code compiled with +neon, or vice versa, then LLVM should always be allowed to inline both into each other. Ideally, target_feature code would not even be unsafe to call in Rust in such a nested context, either, because once you had established one of these safety requirements (this was one of the more aspirational bits of the target_feature plans). And ideally we would always handle the calling conventions correctly between code, so we didn't have to spill to memory.

Which also means that in order to end that problem, we may have to duplicate quite a bit of LLVM's understanding of target features anyways, just to be able to emit correct code for LLVM without breaking, or else the Rust ABI will depend solely on inlining for performance. Even without committing to such duplication, allowing this combination may only be a performance footgun, but it's still a footgun, most people invoking the compiler don't have the faintest idea about these nuances of the Arm architecture to avoid that, even if they compile for AArch64 machines every day. That's not a bug, that's a feature.

Ultimately, setting target features, when the Rust toolchain is invoked, is done at the level of Rust. The fact that the target feature resolution code happens to still live entirely in rustc_codegen_llvm is mostly due to not having lifted it into cg_ssa yet.

@adamgemmell
Copy link
Contributor

I see the intent clearly now - thanks for taking the time to write that, it's very informative.

@adamgemmell
Copy link
Contributor

adamgemmell commented Feb 22, 2022

Another consideration when tying fp and neon: this code will break

@workingjubilee
Copy link
Member Author

workingjubilee commented Feb 23, 2022

Erm, I don't think any of those actually would be broken.
In fact, those are essentially why I raised this concern:

Developers using Rust mostly are not concerned about the FP vs. AdvSIMD split in practice, and functionally do depend on +neon implying both. And when you enter a #[target_feature(enable)] context you are supposed to have proven that feature in question is available, so they've already applied the feature test in question or their code is already broken. And my concern is mostly about AArch64, where the two architecturally imply each other. I am in fact actually greatly concerned, now that you have told me that +sve2 does not actually imply +neon but takes a path that skips directly to +fp, that developers will somehow manage to write incorrect code by reasoning otherwise and get immensely stuck on resulting weird inlining behaviors from LLVM if there is a mismatch between a "+fp, +neon" caller and a "+fp, +sve2" callee.

I have also written and maintained many cases of #[cfg(target_feature)] for Arm and AArch64 targets by now and I have not once depended on +neon not implying +fp, and I have not ever had need for +fp only, except in the vague sense of "The architecture may not have a Neon unit at all."

In fact, the packed_simd crate that appears in your search, which I have been effectively the maintainer of for quite some while, had several misconfigurations regarding Arm and AArch64 features that effectively wound up being no-ops due to this accursedly stringly-typed interface for target features causing silent errors. I had to clear quite a lot of that up to get it working and back on crates.io as packed_simd_2. And it has a nontrivial number of dependents.

I have also had the opportunity to address the questions of and observe the behavior of quite a number of programmers addressing these feature interfaces. The reason I bring up non-intuitive inlining issues is at least partly because they do. The reason I recommend a more terse and simplified interface is because I have seen what they write. People find tinkering with precise target features in this manner exhausting and mostly it is a source of gotchas and frustration, as evidenced in the fact that they have appeared in some of the most popular crates in Rust's SIMD ecosystem.

@adamgemmell
Copy link
Contributor

adamgemmell commented Feb 28, 2022

I mean that if we tie fp and neon, as paca and pagc are tied, then code that just specifies neon will no longer compile. However the target_features are still unstable of course, though we'd like to stabilise them before the next release cycle.

that developers will somehow manage to write incorrect code by reasoning otherwise and get immensely stuck on resulting weird inlining behaviors from LLVM if there is a mismatch between a "+fp, +neon" caller and a "+fp, +sve2" callee.

This could be fixed by making all features that imply fp also enable neon. I think this is easy to do in to_llvm_feature and as we've established it shouldn't practically break anything. It can even be cleanly explained in the docs by saying the feature implies neon rather than fp.

@bors
Copy link
Contributor

bors commented Mar 2, 2022

☔ The latest upstream changes (presumably #87402) made this pull request unmergeable. Please resolve the merge conflicts.

@workingjubilee
Copy link
Member Author

...Hm, that seems like an acceptable resolution but does seem to have looped back around to something equivalent to my initial suggestion ("functionally, make +fp and +neon the same idea for aarch64").

@workingjubilee
Copy link
Member Author

In light of #95002 I have rebased this PR and resolved the incongruity of the error messages with asm!. I am once again proposing consideration of this PR as a solution to the problem that arose there. It is forward-compatible to add the "+fp" feature if we find a defined subset of Neon's behavior (as all targets will either have "-neon" or "+neon" until then) that we can enable separately as a single feature, and as far as I can tell, it makes stdarch build again.

@jacobbramley
Copy link
Contributor

Summary: Perhaps the best solution is to try this PR, and fall back to #95044 if it breaks things.


I've been discussing this with @adamgemmell along the way, and have just re-read this thread, and related PRs, to try to understand how we got here. I maintain that the cleanest design, ignoring legacy, is to map features to hardware ID registers (and OS features), like "paca" and "pacg". However, I think I'd overlooked (or misunderstood the importance of) two significant factors:

  1. LLVM's "neon" implies "fp-armv8", so they're already tied to some extent.
  2. These features are used much more often than I'd thought (as in aarch64 neon intrinsics broken after #90621 #95002) so we have significant potential breakage to think about. (When I'd done similar work in the past, I had no legacy users to consider.)

In addition, "neon" and "fp" are somewhat special in that they are enabled by default for almost all AArch64 targets. There's an ergonomic problem with simply tying them in that code that implements or uses something called a "NEON intrinsic" also has to enable "fp", which appears superfluous.

So, that leaves us with the two proposed solutions we have today:

  • Fix unnecessary error when using neon target feature #95044, which I think is the most backwards-compatible because uses of "fp" continue to work. These uses appear to be rare, or even non-existent.
  • This PR, which I think is the most forwards-compatible because we can define "fp" to do something else later. It's also simpler overall, as you noted in Fix unnecessary error when using neon target feature #95044. (We still need the tying infrastructure for "paca"/"pacg".)
    • This is also consistent with how we handle FP16: FP16 support in FP and NEON again has separate hardware registers linked by policy, but Rust only exposes a single "fp16".

Re-evaluating all of that, I'm happy to change my position: perhaps the best, most pragmatic solution is to try this PR, and fall back to #95044 in the unlikely event that it breaks some uses of "fp". Were it not for existing uses, I'd propose renaming "neon" to "fpneon", but it's too late for that.

To echo what @adamgemmell said a while back: thank you, @workingjubilee, for spending the time to explain your position so clearly!

@bors
Copy link
Contributor

bors commented Mar 23, 2022

⌛ Testing commit 8fa4ae8 with merge bb5f3f680e895438843678a021bab464d74975f9...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Mar 23, 2022

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Mar 23, 2022
@workingjubilee
Copy link
Member Author

Notes to self:

  • You can check smaller, simpler compile tests in Godbolt under --target=${cross_platform}
  • The Arch Linux repos allow pacman -Syu aarch64-linux-gnu-gcc for cross-compiling
  • ./x.py test src/test/ui/* --target aarch64-unknown-linux-gnu is an option (even if it SIGILLs)

Apologies for the thrash.

@bors r=nagisa,Amanieu

@bors
Copy link
Contributor

bors commented Mar 23, 2022

📌 Commit 6c19dc9 has been approved by nagisa,Amanieu

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 23, 2022
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 23, 2022
Rollup of 6 pull requests

Successful merges:

 - rust-lang#91608 (Fold aarch64 feature +fp into +neon)
 - rust-lang#92955 (add perf side effect docs to `Iterator::cloned()`)
 - rust-lang#94713 (Add u16::is_utf16_surrogate)
 - rust-lang#95212 (Replace `this.clone()` with `this.create_snapshot_for_diagnostic()`)
 - rust-lang#95219 (Modernize `alloc-no-oom-handling` test)
 - rust-lang#95222 (interpret/validity: improve clarity)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 67d6cc6 into rust-lang:master Mar 23, 2022
@rustbot rustbot added this to the 1.61.0 milestone Mar 23, 2022
@workingjubilee workingjubilee deleted the fold-neon-fp branch March 23, 2022 06:37
@ehuss
Copy link
Contributor

ehuss commented Mar 23, 2022

@adamgemmell I'm not sure, does this need an update to the docs?

@workingjubilee
Copy link
Member Author

Yes. I will follow up.

HeroicKatora added a commit to image-rs/jpeg-decoder that referenced this pull request Mar 24, 2022
The build is currently failing:

<https://github.com/image-rs/jpeg-decoder/runs/5618199478>

1. `aarch64_target_feature` was stabilized in rust-lang/rust#90621
2. `neon`/`fp` must be activated together, which is not yet the case for
   some intrinsics in `std`. See rust-lang/rust#91608 and
   rust-lang/rust#95044.

Once either of the above solutions lands we can remove
`aarch64_target_feature` and unpin nightly again.
HeroicKatora added a commit to image-rs/jpeg-decoder that referenced this pull request Mar 25, 2022
The build is currently failing:

<https://github.com/image-rs/jpeg-decoder/runs/5618199478>

1. `aarch64_target_feature` was stabilized in rust-lang/rust#90621
2. `neon`/`fp` must be activated together, which is not yet the case for
   some intrinsics in `std`. See rust-lang/rust#91608 and
   rust-lang/rust#95044.

Once either of the above solutions lands we can remove
`aarch64_target_feature` and unpin nightly again.
wartmanm pushed a commit to wartmanm/jpeg-decoder that referenced this pull request Sep 15, 2022
The build is currently failing:

<https://github.com/image-rs/jpeg-decoder/runs/5618199478>

1. `aarch64_target_feature` was stabilized in rust-lang/rust#90621
2. `neon`/`fp` must be activated together, which is not yet the case for
   some intrinsics in `std`. See rust-lang/rust#91608 and
   rust-lang/rust#95044.

Once either of the above solutions lands we can remove
`aarch64_target_feature` and unpin nightly again.
@apiraino apiraino removed the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Feb 3, 2023
JamieCunliffe added a commit to JamieCunliffe/rust that referenced this pull request May 23, 2023
In rust-lang#91608 the fp-armv8 feature was removed as it's tied to the neon
feature. However disabling neon didn't actually disable the use of
floating point registers and instructions, for this `-fp-armv8` is
required.
bors added a commit to rust-lang-ci/rust that referenced this pull request May 23, 2023
Fix some issues with folded AArch64 features

In rust-lang#91608 the `fp` feature was removed for AArch64 and folded into the `neon` feature, however disabling the `neon` feature doesn't actually disable the `fp` feature. If my understanding on that thread is correct it should do.

While doing this, I also noticed that disabling some features would disable features that it shouldn't. For instance enabling `sve` will enable `neon`, however, when disabling `sve` it would then also disable `neon`, I wouldn't expect disabling `sve` to also disable `neon`.

cc `@workingjubilee`
drunest added a commit to drunest/jpegdecoder that referenced this pull request Jun 28, 2024
The build is currently failing:

<https://github.com/image-rs/jpeg-decoder/runs/5618199478>

1. `aarch64_target_feature` was stabilized in rust-lang/rust#90621
2. `neon`/`fp` must be activated together, which is not yet the case for
   some intrinsics in `std`. See rust-lang/rust#91608 and
   rust-lang/rust#95044.

Once either of the above solutions lands we can remove
`aarch64_target_feature` and unpin nightly again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-AArch64 Armv8-A or later processors in AArch64 mode S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet