MCP: More Cranelift-friendly portable SIMD intrinsics #381

KodrAus · 2020-11-04T11:36:03Z

Proposal

cc @rust-lang/project-portable-simd

Summary

Set an expectation that simd_* intrinsics from extern "platform-intrinsic" will form the basis of backend-agnostic intrinsics within the standard library.
Fill in any missing gaps in existing simd_* intrinsics so that core::simd can avoid any direct LLVM dependency.
Provide non-vectorized reference implementations for all simd_* intrinsics that are exposed through compiler_builtins.

Implementation of this MCP will be coordinated between the Portable SIMD project and Compiler team for Cranelift and LLVM implementations.

Motivations

The user-facing goal of the core::simd API being developed by the @rust-lang/project-portable-simd group is to let users take advantage of SIMD intrinsics in a way that's automatically portable across hardware. Realizing that goal requires special backend support so that core::simd can utilize existing intrinsics rather than having to target ISAs specifically. Since rustc has multiple compiler backends (LLVM and Cranelift) the core::simd implementation also needs to also be portable across compiler backends in order to be portable across hardware in the way we want.

We currently have a best-effort approach to backend-agnostic SIMD intrinsics through the extern "platform-intrinsic" mechanism. There are a number of existing platform intrinsics like simd_add and simd_mul that are used throughout core::arch, but also many ISA idiosyncrasies that aren't reasonable to create platform intrinsics for so it also uses a mix of LLVM intrinsics and inline assembly. The story for core::simd is a bit different. Its goal is specifically to expose standard lowest-common-denominator functionality that's consistent across all supported ISAs. That makes core::simd a great fit for platform intrinsics, and core::simd would like to use them exclusively over LLVM intrinsics.

We can think of core::simd almost like a thin user-friendly wrapper over platform intrinsics. Since it's going to rely on these intrinsics and make promises about their behavior across platforms we should write reference implementations of them all in plain Rust. These reference implementations will both document the conformant behavior of these intrinsics and give compiler backends a way to support core::simd without necessarily having to generate vectorized code for all platforms.

Implementation

The overall picture of how Rust's portable SIMD story will fit together covers a few areas of the compiler and libraries:

Libraries like core::simd call platform intrinsics through extern "platform-intrinsic". These are generic functions that are backend agnostic. They work with types that are #[repr(simd)].
Compiler backends like LLVM or Cranelift receive these generic platform intrinsics and can either emit appropriate ISA-specific vector instructions, or they can emit a function call to a generic fallback implementation.
The fallback implementations that backends can call are monomorphic intrinsics from compiler_builtins. They're named so that backends can easily name them given the platform intrinsic and shape of the inputs. Instead of working with some generic #[repr(simd)] struct T, they work with [T; N]s.
While the fallback implementations are monomorphic so they can live in compiler_builtins, they can be backed by a single generic implementation to make them easier to write.

To get a better idea of how this is expected to hang together, let's consider a working example of a conceptual core::simd and Rust compiler.
This example implements addition for i64x4, a vector type with 4 i64 lanes:

#![feature(core_intrinsics, min_specialization, min_const_generics)]
#![allow(non_camel_case_types)]

#[test]
fn i64x4_add() {
    use crate::simd::i64x4;
    
    let a = i64x4([1, 2, 3, 4]);
    let b = i64x4([4, 3, 2, 1]);
    
    assert_eq!(i64x4([5, 5, 5, 5]), a + b);
}

pub mod simd {
    /*!
    An external crate, `core::simd`.
    
    It wants to call platform intrinsics instead of direct backend intrinsics.
    */

    use std::ops::Add;
    
    use crate::compiler::{ReprSimd, platform_intrinsics};
    
    /**
    A vector type with 4 `i64` lanes.
    */
    #[derive(Debug, Clone, Copy, PartialEq, Eq)]
    pub struct i64x4(pub [i64; 4]);
    
    /**
    Our vector type is `#[repr(simd)]`
    */
    impl ReprSimd for i64x4 {
        type Type = i64;
        const SIZE: usize = 4;
        
        fn into_fields(self) -> Vec<Self::Type> {
            vec![self.0[0], self.0[1], self.0[2], self.0[3]]
        }

        fn from_fields(fields: Vec<Self::Type>) -> Self {
            i64x4([fields[0], fields[1], fields[2], fields[3]])
        }
    }
    
    /**
    We can add two of our vectors together using the `simd_add` platform intrinsic.
    Hopefully it'll generate vectorized code! But it's ok if it doesn't.
    */
    impl Add for i64x4 {
        type Output = Self;
        
        fn add(self, rhs: Self) -> Self::Output {
            platform_intrinsics::simd_add(self, rhs)
        }
    }
}

pub mod compiler {
    /*!
    Our Rust compiler.
    
    It contains a few private child modules to demonstrate where each piece lives
    and how they interact to expose the single `platform-intrinsics` API.
    */

    mod builtins {
        /*!
        The compiler builtins.
        
        We expose manually monomorphized SIMD fallback intrinsics here that
        codegen backends can call into if they don't support generating
        specialized code for a SIMD platform intrinsic yet.
        */

        use std::{
            intrinsics,
            ops::Add,
        };
    
        pub fn simd_fallback_add_i64(a: &[i64], b: &[i64], r: &mut [i64]) { i64::simd_add(a, b, r) }
        pub fn simd_fallback_add_f64(a: &[f64], b: &[f64], r: &mut [f64]) { f64::simd_add(a, b, r) }
    
        /**
        Let's model our `simd_add` operation as a trait so we can specialize.
        It doesn't really matter how we do this, so long as it's readable.
        */
        trait SimdAdd {
            type Output;
        
            fn simd_add(a: &[Self], b: &[Self], r: &mut [Self::Output])
            where
                Self: Sized;
        }
    
        impl<T> SimdAdd for T
        where
            T: Add + Copy,
        {
            type Output = T;
        
            default fn simd_add(a: &[Self], b: &[Self], r: &mut [Self::Output]) {
                for i in 0..r.len() {
                    // behavior on overflow is one of the edge cases the fallback should specify
                    r[i] = intrinsics::wrapping_add(a[i], b[i]);
                }
            }
        }

        impl SimdAdd for f64 {
            fn simd_add(a: &[Self], b: &[Self], r: &mut [Self]) {
                for i in 0..r.len() {
                    // we might want to treat floats specially to guarantee some behavior for NaN or Infinity
                    r[i] = unsafe { intrinsics::fadd_fast(a[i], b[i]) };
                }
            }
        }
    }
    
    mod codegen_support {
        /*!
        Some backend-agnostic helpers.
        */

        use std::{
            mem,
            any::type_name,
        };
    
        use super::builtins;
    
        /**
        Find a builtin fallback function by operation and type.
        
        This is what backends would do if they can't generate specific code
        for a given platform intrinsic.
        */
        pub fn find_simd_fallback<T>(op: &str) -> fn(&[T], &[T], &mut [T]) {
            match (op, type_name::<T>()) {
                ("simd_add", "i64") => unsafe { mem::transmute::<for<'r, 's, 't0> fn(&'r [i64], &'s [i64], &'t0 mut [i64]), for<'r, 's, 't0> fn(&'r [T], &'s [T], &'t0 mut [T])>(builtins::simd_fallback_add_i64) },
                ("simd_add", "f64") => unsafe { mem::transmute::<for<'r, 's, 't0> fn(&'r [f64], &'s [f64], &'t0 mut [f64]), for<'r, 's, 't0> fn(&'r [T], &'s [T], &'t0 mut [T])>(builtins::simd_fallback_add_f64) },
                (op, ty) => unimplemented!("{} is unsupported for {}", op, ty),
            }
        }
    }
    
    mod codegen_cranelift {
        /*!
        An example of the code a backend might use to implement platform intrinsics.
        */

        use super::ReprSimd;
        use super::codegen_support;
    
        pub fn simd_add<T>(a: T, b: T) -> T
        where
            T: ReprSimd,
        {
            // We don't generate specific code for `simd_add`, so use a fallback
            // to find the appropriate function to call
            simd_fallback("simd_add", a, b)
        }

        fn simd_fallback<T>(op: &str, a: T, b: T) -> T
        where
            T: ReprSimd,
        {
            let a = a.into_fields();
            let b = b.into_fields();
            let mut r = vec![T::Type::default(); T::SIZE];
            
            assert_eq!(T::SIZE, a.len());
            assert_eq!(T::SIZE, b.len());
    
            (codegen_support::find_simd_fallback::<T::Type>(op))(&*a, &*b, &mut *r);
            
            T::from_fields(r)
        }
    }
    
    pub mod platform_intrinsics {
        /*!
        Let's expose our backend's `simd_add` as a platform intrinsic.
        
        This is the 'public' API that our `simd` consumer can call.
        */

        pub use super::codegen_cranelift::simd_add;
    }
    
    /**
    A codeified representation of the requirements we expect `#[repr(simd)]`
    types to satisfy.
    
    We're using `Vec` here instead of `[T; N]` just to make things compile
    for this example.
    */
    pub trait ReprSimd {
        type Type: Default + Copy;
        const SIZE: usize;
        
        fn into_fields(self) -> Vec<Self::Type>;
        fn from_fields(fields: Vec<Self::Type>) -> Self;
    }
}

At the top level, in core::simd we call into generic platform intrinsics.

In our compiler backend, we have one of two options; emit ISA-specific vectorized code for simd_add (such as a call to x86's _mm256_add_epi64), or emit a call to the fallback implementation from compiler_builtins. The #[repr(simd)] attribute ensures we can convert to and from arrays for the fallback implementation. Given a #[repr(simd)] struct with N fields of type T we can read them into a [T; N] and then write them from a [T; N] back into the #[repr(simd)] struct.

We pass these fallback vectors to the intrinsic as slices instead of arrays, so that arbitrarily sized inputs can be supported.

New SIMD operations can be defined with a reference implementation in compiler_builtins. They can then be picked up by compiler backends automatically until they choose to generate specific code for them.

There's an implicit bound on the T in simd_add, which is that it somehow maps to an intrinsic like simd_fallback_add_i64. This isn't currently communicated in its contract, and could be masked by compiler backends that don't forward to intrinsics. We suggest compiler backends always validate an appropriate fallback exists for a given platform intrinsic and vector type, even if it doesn't plan to use it.

Who defines intrinsics?

The Portable SIMD group will be responsible for determining the behavior of the simd_* intrinsics it needs. These will take the form of the reference/fallback implementations in compiler_builtins.

What do we need from `T-compiler`?

We'll need a lot of help figuring out where to put reference implementations and how to make them available to the various backends.

Mentors or Reviewers

@bjorn3 who's been working on rustc's Cranelift backend and extern "platform-intrinsic".

Process

The main points of the Major Change Process is as follows:

File an issue describing the proposal.
A compiler team member or contributor who is knowledgeable in the area can second by writing @rustbot second.
- Finding a "second" suffices for internal changes. If however you are proposing a new public-facing feature, such as a -C flag, then full team check-off is required.
- Compiler team members can initiate a check-off via @rfcbot fcp merge on either the MCP or the PR.
Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered approved.

You can read more about Major Change Proposals on forge.

Comments

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

The text was updated successfully, but these errors were encountered:

rustbot · 2020-11-04T11:36:05Z

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

pnkfelix · 2021-04-01T14:43:30Z

@rustbot second

apiraino · 2021-05-06T17:09:29Z

@rustbot label -final-comment-period +major-change-accepted

rustbot · 2021-05-06T17:09:31Z

Error: The feature relabel is not enabled in this repository.
To enable it add its section in the triagebot.toml in the root of the repository.

Please let @rust-lang/release know if you're having trouble with this bot.

KodrAus added T-compiler Add this label so rfcbot knows to poll the compiler team major-change A proposal to make a major change to rustc labels Nov 4, 2020

rustbot added the to-announce Announce this issue on triage meeting label Nov 4, 2020

spastorino removed the to-announce Announce this issue on triage meeting label Nov 5, 2020

KodrAus mentioned this issue Nov 9, 2020

Portable SIMD Project Group rust-lang/libs-team#4

Open

workingjubilee mentioned this issue Dec 10, 2020

Get Cranelifted rust-lang/portable-simd#40

Open

rustbot added the final-comment-period The FCP has started, most (if not all) team members are in agreement label Apr 1, 2021

calebzulawski mentioned this issue Apr 17, 2021

Don't set fast-math for the SIMD operations we set it for previously rust-lang/rust#84274

Merged

apiraino closed this as completed May 6, 2021

apiraino added major-change-accepted A major change proposal that was accepted and removed final-comment-period The FCP has started, most (if not all) team members are in agreement labels May 6, 2021

rustbot added the to-announce Announce this issue on triage meeting label May 6, 2021

apiraino removed the to-announce Announce this issue on triage meeting label May 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCP: More Cranelift-friendly portable SIMD intrinsics #381

MCP: More Cranelift-friendly portable SIMD intrinsics #381

KodrAus commented Nov 4, 2020 •

edited

Loading

rustbot commented Nov 4, 2020

pnkfelix commented Apr 1, 2021

apiraino commented May 6, 2021

rustbot commented May 6, 2021

MCP: More Cranelift-friendly portable SIMD intrinsics #381

MCP: More Cranelift-friendly portable SIMD intrinsics #381

Comments

KodrAus commented Nov 4, 2020 • edited Loading

Proposal

Summary

Motivations

Implementation

Who defines intrinsics?

What do we need from T-compiler?

Mentors or Reviewers

Process

Comments

rustbot commented Nov 4, 2020

pnkfelix commented Apr 1, 2021

apiraino commented May 6, 2021

rustbot commented May 6, 2021

KodrAus commented Nov 4, 2020 •

edited

Loading

What do we need from `T-compiler`?