Atomic operations in PSRAM (esp. with 3rd party libraries) #2027

ProfFan · 2024-08-28T19:19:56Z

AFAIK the current atomic operations on the S3 uses native atomics. However this creates a problem: a 3rd party library could be using atomics on their data structure placed on the SPI PSRAM.

In the IDF, they handled it by first checking for PSRAM address, then try acquire the external RAM lock, then do CAS: https://github.com/espressif/esp-idf/blob/c9df77efbf871d4c3ae9fb828778ff8c4ab36804/components/esp_hw_support/cpu.c#L199

I wonder if a similar thing should be done for the S3 (for example, a PR to portable-atomic that adds a feature for a similar implementation?

Based on discussions with @ivmarkov

The text was updated successfully, but these errors were encountered:

bugadani · 2024-08-29T07:00:54Z

I don't see a simple way out of this without introducing our own, "PSRAM-safe" atomics. Which will be a usability and interoperability nightmare, because library developers will not likely use it, or even know about it.

Turning portable-atomic into a shim of sorts for us is at least technically possible, but still doesn't solve every issue - for MCUs that can use the core atomic types, those ones can't really be shimmed for us to provide our own implementation.

I don't think we can extend portable-atomic to know about our particular memory mappings.

MabezDev · 2024-08-29T08:37:20Z

Do we know if an exception is raised whilst accessing atomics in psram, or does it just silently fail? If we can detect it, that at the very least opens up some options for us.

ivmarkov · 2024-08-30T14:05:56Z

Do we know if an exception is raised whilst accessing atomics in psram, or does it just silently fail? If we can detect it, that at the very least opens up some options for us.

I doubt any exception is raised. From the APP/PRO CPU POV, accessing PSRAM is like accessing regular internal RAM. And it ends up accessing regular internal RAM after all - thanks to the caching MMU.

With that said, I'm not in the position to discuss a possible fix, as I do not understand the operation of the MMUs to a deeper-enough detail. All I know is that ESP-IDF provides a special xchg atomic implemented in software, which is to be used when PSRAM is enabled, as pointed to by @ProfFan .

With that said, whoever wrote and maintains this is probably in that position. (@bjoernQ - is that you?).

I would be happy to know though why hardware atomics cannot work with PSRAM:

Is it because the two CPUs might have their own separate dram caches for the same psram; why would we even want such a setup?
Or is it because an atomic might span across two cache-lines (the end of one, and the beginning of the other) and as such the writing might actually not be atomic at all?

Anyway, I guess me and @ProfFan are just raising the flag that there is an issue with atomics and PSRAM, and somebody needs to look into it. For both ESP-IDF and baremetal by the way, as the ESP-IDF multicore targets are also happily using only hardware atomics ATM; and once we know of a fix, we should probably change the definition of the upstream multicore STD targets from just using hardware atomics to emitting calls to the atomic intrinsic functions, so that the user code can use whatever ESP-IDF had put in the definition of these intrinsic functions, depending on user settings).

Ah, one more question, couldn't resist:

Would hardware atomics work correctly, if the programmer is ensuring that the relevant PSRAM memory region is accessed only by one of the two CPUs? As in, do we even have a workaround for the current situation...

bjoernQ · 2024-08-30T14:19:48Z

With that said, whoever wrote and maintains this is probably in that position. (@bjoernQ - is that you?).

Yes, I ported the code and no I don't know. I also doubt an exception is raised there but we can just give it a try

I'm also curious about the "why doesn't it just work"

MabezDev · 2024-08-30T14:26:25Z

I'll try and find those answers and get back to you.

In the meantime, the only truly bullet-proof way to solve this right now is to have a target variant where we disable hardware atomics, falling back to portable-atomic or something else to ensure atomicity (note: this is exactly what esp-idf does, when the work around is enabled). I'd really like to avoid doing this though if we can help it.

ivmarkov · 2024-08-30T14:37:31Z

I'll try and find those answers and get back to you.

In the meantime, the only truly bullet-proof way to solve this right now is to have a target variant where we disable hardware atomics, falling back to portable-atomic or something else to ensure atomicity (note: this is exactly what esp-idf does, when the work around is enabled). I'd really like to avoid doing this though if we can help it.

The question is, is portable-atomic even having a feature that would enable atomics (at least on bare-metal, for ESP-IDF we have other options) to operate correctly.

The critical-section one is tempting, but do you remember why our support for u64 on multicore riscv got kicked out from upstream STD? Because it essentially used disable-all-interrupts-critical-sections-plus-spinlocks and they said using spinloocks for atomic operations defeats their very purpose in the first place.

Also if you notice, the software PSRAM-safe xchg impl in the ESP-IDF does something different... yet similar. They seem to implement the "fallible" xchg variant only, which can sporadically fail. But then the question is, what would the user do when presented with such an xchg impl, other than re-trying in a loop, which suspiciously looks like a cousin to spinlocks anyway...

ProfFan · 2024-08-30T15:13:10Z

Would hardware atomics work correctly, if the programmer is ensuring that the relevant PSRAM memory region is accessed only by one of the two CPUs?

I think the answer currently for the S3 is most likely true but I am not sure. It's probably good to open an issue in esp-idf and tag @igrr for help 🍭

for MCUs that can use the core atomic types, those ones can't really be shimmed for us to provide our own implementation.

If we have a custom target (for example xtensa-none-elf-esp32s3) this can be put as a part in core right? As we build core with the application anyway.

The critical-section one is tempting, but do you remember why our support for u64 on multicore riscv got kicked out from upstream STD? Because it essentially used disable-all-interrupts-critical-sections-plus-spinlocks and they said using spinloocks for atomic operations defeats their very purpose in the first place.

I wonder what they did for Cortex-M0/M0+ and such? See https://doc.rust-lang.org/std/sync/atomic/index.html#portability
thumbv6m does not support CAS at all.

ivmarkov · 2024-08-30T15:29:41Z

I wonder what they did for Cortex-M0/M0+ and such? See https://doc.rust-lang.org/std/sync/atomic/index.html#portability thumbv6m does not support CAS at all.

If Cortex-M0/M0+ is always single-core (is it?) then disable-all-interrupts-one-way-or-another as a strategy does work fine and since it is a single core, you don't have the "is a spinlock a lock or just waiting" issue.

By the way... by reading the opening paragraph in the "Portability section"... namely:
"
All atomic types in this module are guaranteed to be lock-free if they’re available. This means they don’t internally acquire a global mutex. Atomic types and operations are not guaranteed to be wait-free. This means that operations like fetch_or may be implemented with a compare-and-swap loop.
"

I'm now even more puzzled as to why they kicked out our u64 multicore support...
I mean, if we talk about "spinlocks" (that we used in that implementation...)

Are these "locks" in the terminology of that section? <-- I DOUBT this now
Or are these "waits" in the terminology of that section <-- I now believe this is the case, and if so, our spinlocking impl should've not been kicked out in the first place...

bugadani · 2024-08-30T15:32:42Z

If Cortex-M0/M0+ is always single-core (is it?)

No, see RP2040

ivmarkov · 2024-08-30T15:38:30Z

If Cortex-M0/M0+ is always single-core (is it?)

No, see RP2040

Ah so you mean for rp2040 they use a spinlock+disable-all-interrupts software impl? If yes, then this only reinforces my suspicion, that our u64 atomic impl Shall Be Re-Instated...

I guess at the time (and now still) I was not understanding the subject matter well enough to object their decision on technical ground...

ivmarkov · 2024-08-30T15:40:52Z

...and if spinlocking+disable-all-interrupts is fine as a strategy, then the critical-section impl of portable-atomic is good enough of a workaround for multicore.

ProfFan · 2024-08-30T15:43:41Z

No, see RP2040

Beaten me in 5 secs lol

Ah so you mean for rp2040 they use a spinlock+disable-all-interrupts software impl? If yes, then this only reinforces my suspicion, that our u64 atomic impl Shall Be Re-Instated...

Probably no, see rust-lang/rust#99668

...and if spinlocking+disable-all-interrupts is fine as a strategy, then the critical-section impl of portable-atomic is good enough of a workaround for multicore.

For real-time applications like what we use ESPs for I think this is worse than the IDF way of "best effort" weak CAS, but can live without if it's too complicated to make something similar in Rust.

ivmarkov · 2024-08-30T15:53:49Z

No, see RP2040

Beaten me in 5 secs lol

Ah so you mean for rp2040 they use a spinlock+disable-all-interrupts software impl? If yes, then this only reinforces my suspicion, that our u64 atomic impl Shall Be Re-Instated...

Probably no, see rust-lang/rust#99668

TL;DR :(
If you can summarize for me, what does anybody use for CAS in multicore M0?
Because I don't think - whatever is used - can avoid using a loop and a loop is as close to a spinlock as you can get? As both are essentially just wait-and-check OR wait-and-retry which is essentially the same?

...and if spinlocking+disable-all-interrupts is fine as a strategy, then the critical-section impl of portable-atomic is good enough of a workaround for multicore.

For real-time applications like what we use ESPs for I think this is worse than the IDF way of "best effort" weak CAS, but can live without if it's too complicated to make something similar in Rust.

Because the weak cas esp idf impl does not disable interrupts (which is... slow?) or if not why then?

Because a strong cas impl on top of a weak cas would also require a loop anyway...

MabezDev · 2024-09-11T10:16:58Z

I would be happy to know though why hardware atomics cannot work with PSRAM

Whilst it's related to the cache/MMU setup, the reason it doesn't work isn't PSRAM is treated too much like flash. Unfortunately, the appropriate signals to provide the required locking (which aren't needed when caching flash) were also not implemented for caching PSRAM, meaning the CPU will happily do atomic modifications on PSRAM memory without actually ensuring it's an atomic operation.

As you mentioned, PSRAM is treated just like DRAM in the esp32 and esp32s3, and because of this, we can't use the ATOMCTL register to control what the CPU does when it executes s32c1l on external memories.

Would hardware atomics work correctly, if the programmer is ensuring that the relevant PSRAM memory region is accessed only by one of the two CPUs? As in, do we even have a workaround for the current situation

Hmm, I'm not sure if this would work, consider the case of running one core but sharing an atomic variable between an interrupt. I believe these would be two different accesses, but I suppose the CPU (being a single core in this case) can only do one thing at a time so access would be sequential?

So in summary, we have no way of detecting this, we need to be proactive when using PSRAM. Essentially for every atomic access, we need to

Check if the variable is located in PSRAM
If it is, also acquire a global spinlock to ensure atomicity

I guess the question is where we do this. I already suggested a different target as a nuclear option but we can probably do better. Maybe we can do better, perhaps with something like -Ctarget-feature=-feature to disable atomics and fall back to libcalls that have the work around in place when psram is being used in the program?

MabezDev · 2024-09-13T11:02:29Z

From our discussion in the meeting, we need to determine whether enabling portable-atomic/critical-section even on targets with native atomics will use a critical section. If it does, we at least have a nice-ish work around for the issue as we can detect PSRAM use and enable this feature.

This is not ideal though as it incurs a cost to all atomics, even ones stored in internal memory. Longer term it would be better to have a solution like esp-idf's where it checks where the atomic is before deciding whether to guard access with a spinlock.

bugadani · 2024-09-13T11:54:43Z

As we're using atomic compare_exchange in our multi-core critical section implementation, can we enable portable-atomic/critical-section without things blowing up?

MabezDev · 2024-09-13T12:16:27Z

As we're using atomic compare_exchange in our multi-core critical section implementation, can we enable portable-atomic/critical-section without things blowing up?

Ah I forgot about that, good catch. Fortunately, I think we can control this by using core::sync::atomic::* in that impl, and ensuring that the global spinlock gets placed in internal memory via a linker attribute.

taiki-e · 2024-10-06T17:41:02Z

we need to determine whether enabling portable-atomic/critical-section even on targets with native atomics will use a critical section.

portable-atomic/critical-section is ignored if the target has atomic (i.e., cfg!(target_has_atomic = "ptr") is true).

I'm open to adding a new cfg to adjust the behavior here, but do not intend to change the default or add a new Cargo feature because:

Due to bad behavior of the cargo v1 resolver, other platform-specific dependencies enable this feature, causing build failures or silent performance degradation.
Some people unconditionally enable the portable-atomic/critical-section feature without reading documentation, e.g., regression with portable-atomics in 1.20.0 matklad/once_cell#264.
Not officially documented, but in the current portable-atomic, if cfg!(target_has_atomic = "N") is true, portable_atomic::Atomic{I,U}N is lock-free¹, but changing this behavior risks introducing issues into the code that assumes it.

Except in the case of a compiler or target-spec bug, such as https://github.com/rust-lang/rust/issues/117305, https://github.com/rust-lang/rust/issues/117306 ↩

MabezDev · 2024-10-08T10:55:46Z

Thanks @taiki-e ❤️!

I'm open to adding a new cfg to adjust the behavior here, but do not intend to change the default

I think this entirely reasonable, I have a couple of questions:

Do you have something in mind for the impl, or would you like us to have a go and PR to portable-atomic?
What do you think the name of the feature should be?
Do you think there is any scope for a conditional check? For instance we don't need to use the critical section if the atomic is placed in internal memory, so it would be nice to run a small address range check prior before committing to acquiring the lock.

taiki-e · 2024-10-08T16:13:19Z

Do you have something in mind for the impl, or would you like us to have a go and PR to portable-atomic?

What do you think the name of the feature should be?

Do you think there is any scope for a conditional check? For instance we don't need to use the critical section if the atomic is placed in internal memory, so it would be nice to run a small address range check prior before committing to acquiring the lock.

I think there are two options on what implementation to use:

Add an option to use the (sharded) global lock based fallback (currently used for wide atomics on platforms where CAS is available) even if the corresponding native atomic operations are available.
- If this is done, we also need to add an option to disable interrupts in the lock based fallback.
Add an option to use critical-section impl not the lock based fallback.

(The former is more efficient but more complex to implement (although there is already code related to interrupt disabling).)

And even two options on when to use it:

Platform-independent Implementation that does it in all cases.
Platform-specific implementation that does it only for a specific address

(The latter is more efficient but more complex as well.)

Of course the cfg name will be different depending on which option is selected.

Could you tell me which of these options you would actually like?

MabezDev · 2024-10-10T13:38:14Z

I think there are two options on what implementation to use:

It sounds like using the critical-section impl would be more straight forward I think, but if you have some of the scaffolding ready for the global sharded lock, then it might be worth trying that. Either of these approaches would be acceptable to us.

And even two options on when to use it:

For our specific use case, the latter is more attractive to me. Note that we only need to do this on four targets, xtensa-esp32-none-elf, xtensa-esp32-esp-idf, xtensa-esp32s3-none-elf, xtensa-esp32s3-esp-idf. Are you suggesting special casing some address ranges for these targets, or do you think a more general solution of somehow supplying an address range to portable-atomic?

taiki-e · 2024-10-22T15:26:02Z

It sounds like using the critical-section impl would be more straight forward I think, but if you have some of the scaffolding ready for the global sharded lock, then it might be worth trying that. Either of these approaches would be acceptable to us.

Note that we only need to do this on four targets, xtensa-esp32-none-elf, xtensa-esp32-esp-idf, xtensa-esp32s3-none-elf, xtensa-esp32s3-esp-idf.

Is it safe to call interrupt disable instruction from user code in xtensa-*-esp-idf? For example, RISC-V and Arm usually restrict the use of such instructions in user mode. If the same is true for Xtensa esp-idf, there will be no option to use anything other than critical-section.

Are you suggesting special casing some address ranges for these targets, or do you think a more general solution of somehow supplying an address range to portable-atomic?

I think this depends on the actual spec and guarantees. If it is something that is clearly guaranteed in the spec, then I guess you can hard code it, but if it is something like “vendor may change it in the future” or “different for each chip or firmware configuration” then we can't hard code the value into the portable-atomic code.

MabezDev · 2024-10-24T14:42:37Z

Is it safe to call interrupt disable instruction from user code in xtensa-*-esp-idf? For example, RISC-V and Arm usually restrict the use of such instructions in user mode. If the same is true for Xtensa esp-idf, there will be no option to use anything other than critical-section.

@ivmarkov Could you clarify this, the details are a bit fuzzy for me.

I think this depends on the actual spec and guarantees. If it is something that is clearly guaranteed in the spec, then I guess you can hard code it, but if it is something like “vendor may change it in the future” or “different for each chip or firmware configuration” then we can't hard code the value into the portable-atomic code.

The silicon has a hardcoded address space for PSRAM, it will never change. It is dependant on the chip configuration (at time of production) this means, the esp32 and esp32-s3 have different address spaces for PSRAM so we'll need a solution that accounts for that (I believe we can achieve this with cfg's). Firmware configuration has no impact.

jessebraham added this to esp-rs Aug 28, 2024

github-project-automation bot moved this to Todo in esp-rs Aug 28, 2024

ivmarkov mentioned this issue Aug 31, 2024

ESP-IDF targets' Atomic*64 is not lock-free rust-lang/rust#117305

Closed

bjoernQ mentioned this issue Sep 6, 2024

esp-wifi uses global allocator, esp-alloc supports multiple regions #2099

Merged

6 tasks

MabezDev mentioned this issue Sep 17, 2024

ESP32 PSRAM behaves erratically if flashed using 80Mhz flash mode #1566

Closed

tom-borcin added this to the 0.22.0 milestone Oct 7, 2024

MabezDev added the upstream The issue lies with a dependency label Oct 8, 2024

MabezDev added the status:blocked Unable to progress - dependent on another task label Nov 8, 2024

MabezDev removed this from the 0.22.0 milestone Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomic operations in PSRAM (esp. with 3rd party libraries) #2027

Atomic operations in PSRAM (esp. with 3rd party libraries) #2027

ProfFan commented Aug 28, 2024

bugadani commented Aug 29, 2024

MabezDev commented Aug 29, 2024

ivmarkov commented Aug 30, 2024

bjoernQ commented Aug 30, 2024

MabezDev commented Aug 30, 2024

ivmarkov commented Aug 30, 2024

ProfFan commented Aug 30, 2024

ivmarkov commented Aug 30, 2024

bugadani commented Aug 30, 2024

ivmarkov commented Aug 30, 2024 •

edited

Loading

ivmarkov commented Aug 30, 2024

ProfFan commented Aug 30, 2024

ivmarkov commented Aug 30, 2024

MabezDev commented Sep 11, 2024 •

edited

Loading

MabezDev commented Sep 13, 2024

bugadani commented Sep 13, 2024

MabezDev commented Sep 13, 2024

taiki-e commented Oct 6, 2024 •

edited

Loading

MabezDev commented Oct 8, 2024

taiki-e commented Oct 8, 2024 •

edited

Loading

MabezDev commented Oct 10, 2024

taiki-e commented Oct 22, 2024

MabezDev commented Oct 24, 2024

Atomic operations in PSRAM (esp. with 3rd party libraries) #2027

Atomic operations in PSRAM (esp. with 3rd party libraries) #2027

Comments

ProfFan commented Aug 28, 2024

bugadani commented Aug 29, 2024

MabezDev commented Aug 29, 2024

ivmarkov commented Aug 30, 2024

bjoernQ commented Aug 30, 2024

MabezDev commented Aug 30, 2024

ivmarkov commented Aug 30, 2024

ProfFan commented Aug 30, 2024

ivmarkov commented Aug 30, 2024

bugadani commented Aug 30, 2024

ivmarkov commented Aug 30, 2024 • edited Loading

ivmarkov commented Aug 30, 2024

ProfFan commented Aug 30, 2024

ivmarkov commented Aug 30, 2024

MabezDev commented Sep 11, 2024 • edited Loading

MabezDev commented Sep 13, 2024

bugadani commented Sep 13, 2024

MabezDev commented Sep 13, 2024

taiki-e commented Oct 6, 2024 • edited Loading

Footnotes

MabezDev commented Oct 8, 2024

taiki-e commented Oct 8, 2024 • edited Loading

MabezDev commented Oct 10, 2024

taiki-e commented Oct 22, 2024

MabezDev commented Oct 24, 2024

ivmarkov commented Aug 30, 2024 •

edited

Loading

MabezDev commented Sep 11, 2024 •

edited

Loading

taiki-e commented Oct 6, 2024 •

edited

Loading

taiki-e commented Oct 8, 2024 •

edited

Loading