Skip to content

Latest commit

 

History

History
498 lines (390 loc) · 14.3 KB

0000-deferenceable-volatile.md

File metadata and controls

498 lines (390 loc) · 14.3 KB
  • Relevant components: svd2rust
  • Feature Name: deferenceable_volatile
  • Start Date: 2019-10-23
  • RFC PR: (leave this empty)
  • Rust Issue: (leave this empty)

Summary

The current implementation of the peripheral API generated by svd2rust is affected by bug rust-lang/rust#55005. The (hypothetical) observable effect of this bug on the svd2rust API is that LLVM may insert spurious reads to MMIO registers in optimized builds. These spurious reads in turn may cause some bits of the register to be cleared (e.g. interrupt flags) or flipped, changing the intended behavior of the program.

The way to prevent this rustc/LLVM bug is to never create a reference (&- or &mut-) to a MMIO register, or in general to memory that is meant to be accessed only through volatile operations. This document explores alternative implementations of the svd2rust API to avoid this bug.

Current implementation

To visualize the problem consider an hypothetical device with 2 similar USART peripherals. Each USART peripheral has two registers: a 32-bit control register (CR) and a 16-bit data register (DR); these registers are laid out next to each other in memory; the CR registers appears first in memory (has the lower address).

In the current implementation these peripherals are represented using the following types:

pub struct USART1 { _not_sync: PhantomData<*mut ()> }
pub struct USART2 { _not_sync: PhantomData<*mut ()> }

impl Deref for USART1 { // repeat for USART2
    type Target = usart::RegisterBlock;

    fn deref(&self) -> &usart::RegisterBlock {
        unsafe { &*(0x4000_0000 as *const usart::RegisterBlock) }
    }
}

// repeat for USART2 with address `0x4001_0000`

mod usart {
    #[repr(C)]
    pub struct RegisterBlock {
        pub CR: CR, // newtype over `VolatileCell<u32>`
        pub DR: DR, // newtype over `VolatileCell<u16>`
    }
}

The VolatileCell<T> is a newtype over UnsafeCell<T> with the following API:

struct VolatileCell<T> { inner: UnsafeCell<T> }

impl<T> VolatileCell<T> {
    unsafe fn read(&self) -> T {  // it also has a `write` method
        ptr::read_volatile(self.inner.get())
    }
}

The UnsafeCell::get(&self) method (implicitly) creates a shared reference to an UnsafeCell<T> and thus runs into rust-lang/rust#55005. Because VolatileCell::{read,write} call UnsafeCell::get they are also affected by the bug.

Code like &usart1.cr (where usart1: USART1) may also trigger the bug because it creates a shared reference to CR, which is a newtype over UnsafeCell<u32>.

Detailed design

The main problem with the current implementation is that RegisterBlock is a sized type that represents a register block in MMIO memory. Therefore all instances of &any::RegisterBlock are references (pointers) into MMIO memory and will run into the bug. The logical solution is to replace the concept of RegisterBlock with something else.

This section explores two alternative implementations.

ZSTs all the way down

One approach is to turn RegisterBlock into a zero sized type (ZST): Registers (*). This Registers structure will contain more ZSTs that represent the registers of a peripheral. Each of these ZST registers will implement today's API by directly calling ptr::{read,write}_volatile_ on a raw pointer -- this way references to MMIO memory are never created. The code below illustrates the idea:

(*) we don't really have to change the name of the type in the actual implementation but using different names makes the change easier to explain.

struct USART1 { _not_sync: PhantomData<*const ()> }

pub mod usart {
    pub struct Registers {
        pub CR: CR,
        pub DR: DR,
    }

    pub struct CR { _not_send_or_sync: PhantomData<*const ()> }

    impl CR {
        // simplified so it doesn't match the current svd2rust API
        pub fn read(&self) -> u32 {
            unsafe { ptr::read_volatile(0x4000_0000 as *const u32) }
        }
    }

    impl DR {
        pub fn read(&self) -> u16 {
            unsafe { ptr::read_volatile(0x4000_0004 as *const u16) }
            //                                    ^            ^^
        }
    }
}

Breaking changes

In today's API one can write a generic function that works with peripherals USART1 and USART2 using the following trait bound: U: Deref<Target = usart::RegisterBlock>. Alternatively, one can use a concrete function that takes an argument of type &usart::RegisterBlock; this function works with both &USART1 and &USART2 because they implement the Deref trait.

fn generic<U>(usart: U) where U: Deref<Target = usart::RegisterBlock> {
    usart.dr.read();
}

fn concrete(usart: &usart::RegisterBlock) {
    usart.dr.read();
}

fn pass_usart1(usart: USART1) {
    concrete(&usart); // (A)
    generic(usart);
}

fn pass_usart2(usart: USART2) {
    //                     ^
    concrete(&usart); // (B)
    generic(usart);
}

The ability to write the concrete function is lost when moving to the ZST approach. The reason is that under this approach &usart::Registers is also a ZST; in contrast, in the current implementation &usart::RegisterBlock is a pointer into MMIO memory so in line (A) we are passing the pointer 0x4000_0000 to concrete and in line (B) we are passing the pointer 0x4001_0000.

Under the ZST approach it's possible to write something like the generic function but the implementation needs to become more complex:

trait Peripheral {
    const BASE_ADDRESS: usize;
}

pub type USART1 = usart::Registers<usart::_1>;
pub type USART2 = usart::Registers<usart::_2>;

mod usart {
    struct _1;
    struct _2;

    // repeat for `_2` but with address = `0x4000_1000`
    impl Peripheral for _1 {
        const BASE_ADDRESS: usize = 0x4000_0000;
    }

    pub struct Registers<P>
    where
        P: crate::Peripheral,
    {
        pub CR: CR<P>,
        pub DR: DR<P>,
    }

    pub struct CR<P>
    where
        P: Peripheral,
    {
        _p: PhantomData<P>,_
        _not_send: PhantomData<*mut ()>,
    }

    // repeat for `DR<P>` but with `OFFSET=0x4` and `u16` instead of `u32`
    impl<P> CR<P>
    where
        P: Peripheral,
    {
        const OFFSET: usize = 0x0;

        pub fn read(&self) -> u32 {
            unsafe {
                ptr::read_volatile((P::BASE_ADDRESS + Self::OFFSET) as *const u32)
            }
        }
    }

    impl<P> Registers<P> {
        pub(crate) unsafe fn new() -> Self {
            Registers {
                CR: CR { _p: PhantomData, _not_send: PhantomData_},
                // ..
            }
        }
    }
}

pub struct Peripherals {
    pub USART1: USART1,
    pub USART2: USART2,
}

impl Peripherals {
    pub fn take() -> Option<Self> {
        // omitted: runtime check

        unsafe {
            Some(Peripherals {
                USART1: usart::Registers::new(),
                USART2: usart::Registers::new(),
            })
        }
    }
}

Then generic code will look like this:

fn generic_zst<P>(usart: USART<P>) where P: Peripheral {
    usart.dr.read();
}

Furthermore, this ZST implementation with generics changes the signatures of registers types: CR<_1> is the control register of USART1 and CR<_2> is the control register of USART2. These two types are ZST so the exact address of the register must be stored in the type system. In today's implementation, the &CR value is a pointer, that points to the particular register (0x4000_0000 or `0x4000_1000**).

In summary, the breaking changes specific to this approach are:

  • You can't write a concrete function that works with all instances of a peripheral.

  • Generic functions that work with all instances of a peripheral need to be modified (generic -> generic_zst).

  • The types of registers changes from a concrete type (today::CR ) to a generic type (zst::CR<P>). Note that registers of peripherals that have only a single instance do not need to be turned into generic types.

VolAddress

Another approach is to not use ZSTs but instead store the address of each register "on the stack" using the VolAddress type. The code below illustrates the idea:

pub struct USART1(usart::Registers);
pub struct USART2(usart::Registers);

impl Deref for USART1 {
    type Target = usart::Registers;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

mod usart {
    pub struct Registers {
        // simplified; these should be newtypes
        pub CR: VolAddress<u32>,
        pub DR: VolAddress<u16>,
    }

    impl Registers {
        pub(crate) unsafe fn new(base_address: usize) -> Self {
            Registers {
                CR: VolAddress::new(base_address + 0x0),
                DR: VolAddress::new(base_address + 0x4),
            }
        }
    }
}

pub struct Peripherals {
    pub USART1: USART1,
    pub USART2: USART2,
}

impl Peripherals {
    pub fn take() -> Option<Self> {
        // omitted: runtime check

        unsafe {
            Some(Peripherals {
                USART1: USART1(usart::Registers::new(0x4000_0000)),
                USART2: USART2(usart::Registers::new(0x4000_1000)),
            })
        }
    }
}

Breaking changes

This approach has no specific breaking changes. The concrete and generic that one can write today will continue to work under this alternative implementation. The register type does not change in any visible way when switching to the VolAddress approach.

Non-zero cost

The VolAddress approach is not zero cost in comparison to today's implementation and to the ZST approach. This can be observed when a peripheral type (or an abstraction that contains a peripheral) is stored in a static variable. This pattern is commonly used in RTFM applications (peripherals are used as late resources) but also appears in cortex-m-rt-only applications that use the Mutex<RefCell<Option<_>>> pattern.

The following program demonstrates the non-zero-cost-ness.

VolAddress

// a late resource
// (or this could be `static _: _Mutex<RefCell<Option<usart::Registers>>>`)
static mut USART1: MaybeUninit<usart::Registers> = MaybeUninit::uninit();

#[entry]
fn main() -> ! {
    // simulate the initialization of a late resource
    unsafe {
        USART1
            .as_mut_ptr()
            .write(usart::Registers::new(0x4000_0000));
    }

    loop {}
}

#[exception]
fn SVCall() {
    unsafe {
        (*USART1.as_ptr()).DR.read();
    }
}

#[exception]
fn SysTick() {
    unsafe {
        (*USART1.as_ptr()).CR.read();
    }
}

Disassembly (arm-none-eabi-objdump):

00000400 <SVCall>:
 400:   f240 0000       movw    r0, #0
 404:   f2c2 0000       movt    r0, #8192       ; 0x2000
 408:   6840            ldr     r0, [r0, #4]
 40a:   8800            ldrh    r0, [r0, #0]
 40c:   4770            bx      lr

0000040e <SysTick>:
 40e:   f240 0000       movw    r0, #0
 412:   f2c2 0000       movt    r0, #8192       ; 0x2000
 416:   6800            ldr     r0, [r0, #0]
 418:   6800            ldr     r0, [r0, #0]
 41a:   4770            bx      lr

Binary size (arm-none-eabi-size):

app  :
section             size         addr
(..)
.bss                 0x8   0x20000000
(..)

Current implementation

static mut USART1: MaybeUninit<USART1> = MaybeUninit::uninit();
//                             ^^^^^^

#[entry]
fn main() -> ! {
    unsafe {
        USART1.as_mut_ptr().write(USART1::steal());
        //                        ^^^^^^^^^^^^^^^
    }

    loop {}
}

// SVCall and SysTick don't change

Disassembly (arm-none-eabi-objdump):

00000400 <SVCall>:
 400:   2004            movs    r0, #4
 402:   f2c4 0000       movt    r0, #16384      ; 0x4000
 406:   8800            ldrh    r0, [r0, #0]
 408:   4770            bx      lr

0000040a <SysTick>:
 40a:   f04f 4080       mov.w   r0, #1073741824 ; 0x40000000
 40e:   6800            ldr     r0, [r0, #0]
 410:   4770            bx      lr

Binary size (arm-none-eabi-size):

app  :
section             size         addr
(..)
.bss                 0x0   0x20000000
(..)

ZST

The ZST approach produces the same machine code as the current implementation.


The difference boils down to voladdress::USART1 not being a ZST. This causes the register pointers (VolAddress values) to be stored in static memory; this can use hundred of bytes of SRAM for peripherals that have many registers (each register uses 4-byte of memory on a 32-bit device). Apart from the memory usage, each register has indirection: the register address is first loaded from static memory.

Common breaking changes

It is not possible to implement the RegisterBlock::ptr API under any of the alternative implementations. This API creates a raw pointer into the register block, and thus into MMIO memory. This action by itself does not trigger the bug (rust-lang/rust#55005) but in both alternative implementations the Registers struct that replaces RegisterBlock no longer represents MMIO memory so *mut Registers* is not equivalent to *mut RegisterBlock. In the ZST approach, &Registers is also a ZST; and in the VolAddress approach, &Registers is a pointer into some stack-allocated value.

Alternatives

Do nothing

An alternative is to keep the svd2rust API as it is and hope for rust-lang/rust#55005 to be fixed upstream (e.g. by removing the deferenceable attribute from &UnsafeCell<T> when rustc lowers it to LLVM-IR).

Conclusion

Three options have been explored in order to address bug rust-lang/rust#55005.

  • Keep the current implementation and hope the bug is fixed soon.

  • Re-implement the svd2rust API using the ZST approach. This approach is as zero cost as the current implementation but causes 4 breaking changes in the public API.

  • Re-implement the svd2rust API using the VolAddress approach. This approach only causes one breaking change but it's not zero cost: it increases memory usage and reduces performance in a common use case.

If it is decided to use an alternative implementation it is also advisable to deprecate the vcell and volatile-register crates and point current direct users of those crates to the alternatives presented in this document.