Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RwLock and Mutex on Window theoretically allows undefined behavior in safe code #35836

Closed
retep998 opened this issue Aug 19, 2016 · 41 comments
Closed
Labels
C-bug Category: This is a bug. E-help-wanted Call for participation: Help is requested to fix this issue. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-windows Operating system: Windows P-medium Medium priority T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@retep998
Copy link
Member

retep998 commented Aug 19, 2016

According to this back and forth on a Microsoft blog post, it is currently undefined behavior to even try to acquire an SRWLock recursively, even recursive read locks.

It might (and probably will) cause the lock to fail to fulfil its contract in the future (e.g,. allow two simultaneous exclusive acquisitions). And since wait nodes are threaded on the stack, it can result in stack memory corruption.

Also apparently NT keyed events have no stability guarantee so the current implementation of parking_lot on Windows could theoretically break with a new version of Windows. No longer an issue as parking_lot uses the stable WaitOnAddress on newer Windows.

So uh, what do?

@nagisa
Copy link
Member

nagisa commented Aug 19, 2016

cc @rust-lang/libs

@Amanieu
Copy link
Member

Amanieu commented Aug 19, 2016

I don't see any easy way out of this. Consider this example:

let lock = RwLock::new();
mem::forget(lock.read());
lock.read();

To avoid undefined behavior we would have to either:

  • Have the RwLock track the thread IDs of all active readers so that we can detect a recursive lock.
  • Have each thread track the RwLocks that it has locked in thread local storage.
    Both of these solutions require O(n) space and a lot of overhead to manage it.

@nagisa
Copy link
Member

nagisa commented Aug 19, 2016

…or document recursive reader lock as UB for now?

@retep998
Copy link
Member Author

retep998 commented Aug 19, 2016

@nagisa So we're going to accept, even temporarily, that a safe function in std can invoke UB?

@Amanieu
Copy link
Member

Amanieu commented Aug 19, 2016

@nagisa But the code I've just written is entirely safe code. We can't allow UB in safe code.

@alexcrichton alexcrichton added I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Aug 20, 2016
@durka
Copy link
Contributor

durka commented Aug 20, 2016

Is there not a non-slim R/W lock primitive on Windows?

@nagisa
Copy link
Member

nagisa commented Aug 20, 2016

No, there isn't.

On Aug 20, 2016 5:44 AM, "Alex Burka" [email protected] wrote:

Is there not a non-slim R/W lock primitive on Windows?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#35836 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AApc0gCFIgk5JgZR7j6tqhtgee6hWulpks5qhmoWgaJpZM4Jo4pj
.

@andlabs
Copy link

andlabs commented Aug 20, 2016

Even though the Windows API might not have non-slim RW locks, it does have loads of other synchronization objects that you can combine to form a recursive RW lock implementation. I wouldn't know what the best ones to use are, sorry.

@yilangmok
Copy link

Boost.Interprocess provides a very complete mutex implementation that allows recursive lock acquisition and upgradeability. (I am not familiar with Rust, but I'd assume you have some way to call into C/C++ code if you're able to call Windows functions).

http://www.boost.org/doc/libs/1_61_0/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.sharable_upgradable_mutexes

@retep998
Copy link
Member Author

We are definitely not depending on Boost. We can however look at their implementation to create our own mutex inspired by it.

@nagisa
Copy link
Member

nagisa commented Aug 20, 2016

That page is for inter-process implementations of synchronisation primitives. It has considerably different design constraints compared to inter-thread stuff (documented here). The implementation of RWLocks in Boost seems to use a number (3) of semaphores in addition to some state.

@yilangmok
Copy link

Thanks for the correction @nagisa. I knew Boost had a RWLock, so I assumed the first page I found would be the right one. Oops!

@retep998, of course it's up to the Rust community to decide what to use or not use. However, I've found that for some areas (including concurrency and cryptography), it's best to use an existing library that has been battle tested. Boost provides primitives that are well tested, performant and cross-platform. Why not consider it?

@retep998
Copy link
Member Author

retep998 commented Aug 21, 2016

@yilangmok Because Boost is a massive dependency to pull in. Right now libstd is lightweight and pure Rust. Pulling in Boost would be hugely detrimental to libstd's weight. Also Boost is C++ so it would depend on the C++ runtime and standard library as well. Furthermore Rust is only able to easily work with C APIs, C++ is a massively complicated beast. Basically, I don't see any chances of Rust depending on Boost for something as simple as mutexes.

@alexcrichton
Copy link
Member

Discussion at @rust-lang/libs triage today concluded that this seems like a good bug to fix, but not high-enough priority for P-high

@alexcrichton alexcrichton added P-medium Medium priority E-help-wanted Call for participation: Help is requested to fix this issue. and removed I-nominated labels Aug 23, 2016
@nagisa
Copy link
Member

nagisa commented Aug 24, 2016

A nice suggestion from the comment section of the linked blog post:

Why can’t you keep using SRW locks and just keep a separate (possibly thread-local) variable to tell you whether you’ve already acquired the lock or not?

Extending the Windows RWLock structure as thus:

pub struct RWLock { 
    inner: UnsafeCell<c::SRWLOCK>,
    thread_local_reading: DWORD,
}

impl RWLock {
    pub /*const*/ fn new() -> RWLock { // cannot const anymore
        RWLock { 
           inner: UnsafeCell::new(c::SRWLOCK_INIT),
           thread_local_reading: c::TlsAlloc(), // free in drop!
        }
    }
    #[inline]
    pub unsafe fn read(&self) {
        if c::TlsGetValue(self.thread_local_reading) != 0 { panic!("recursive read") }
        TlsSetValue(self.thread_local_reading, 1);
        c::AcquireSRWLockShared(self.inner.get())
    }
    /* similar for try_read */

    #[inline]
    pub unsafe fn read_unlock(&self) {
        TlsSetValue(self.thread_local_reading, 0);
        c::ReleaseSRWLockShared(self.inner.get())
    }
}

would allow to fix the issue. Of course the TLS safeguard could be moved to any of the layers above plain wrapper; idea stays the same.

Alas, this limits the number of locks user could have at the same time to number of available TLS slots, which is 1088.

@mwinterb
Copy link

Is the license for the source code in David Butenhof's "Programming with POSIX Threads" book known? The "best" information that I could find was from the pthreads-win32 COPYING file which has this to say:

The file tests/rwlock7.c is derived from code written by
Dave Butenhof for his book 'Programming With POSIX(R) Threads'.
The original code was obtained by free download from his website
http://home.earthlink.net/~anneart/family/Threads/source.html
and did not contain a copyright or author notice. It is assumed to
be freely distributable.

Currently, the source is available on the Informit.com website, which appears to be associated with the current publisher:
http://www.informit.com/store/programming-with-posix-threads-9780201633924

If the license is acceptable for use in Rust, I've written a fairly direct translation of his rwlock_t to use the Windows Vista+ primitives. By source examination, as it prefers readers over writers, recursive read locks will succeed, recursive write locks will deadlock, and acquiring a write lock while the same thread holds a read lock will deadlock. The text for the book describes how to convert it to a writer preference, which should make most recursive operations deadlock eventually, but as that text is not monetarily freely available, I have not provided that as an alternative. There should be no memory corruptions during normal operations†.

As I'm not a Rust developer, it's in C (maybe with some accidental C++ constructs). Someone more experienced in Rust could likely translate it, but that task feels like a bad fit for "my first Rust program".

The translation is available here:
https://gist.github.com/mwinterb/fcf29c312950e2c51ffa47822c8c5241

It's likely not the most efficient implementation of a reader/writer mutex as it is 10 years old, but it's O(1) space and has no dynamic memory allocations, so presumably pub const fn new() -> RWLock can remain const.

† Internally, some counters use int, so if there are more than INT_MAX read acquisitions, pending readers, or pending writers, undefined behavior will technically occur in the C implementation.

@Amanieu
Copy link
Member

Amanieu commented Aug 26, 2016

@nagisa That could work with my thread_local crate which gives you per-object thread-local storage.

However I still think that the proper solution to this is to use the parking_lot crate which provides implementations of Mutex, RwLock and Condvar that work on all platforms including Windows XP. This was proposed in a RFC (rust-lang/rfcs#1632) but it in the end it was not accepted.

@nagisa
Copy link
Member

nagisa commented Aug 26, 2016

On 2016-08-25 18:26:57-0700, Amanieu wrote:

@nagisa That could work with my thread_local crate
which gives you per-object thread-local storage.

However I still think that the proper solution to this is to use the
parking_lot crate which provides implementations of
Mutex, RwLock and Condvar that work on all platforms including Windows XP. This was proposed
in a RFC (rust-lang/rfcs#1632) but it in the end it was not accepted.

I do not disagree. I’m just brainstorming for simple and quick-to-implement fixes which could paper
over immediate issue without us doing any sort of large scale change.

-- You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#35836 (comment)

@mattico
Copy link
Contributor

mattico commented Mar 12, 2017

I'm interested in fixing this.

I think the best long-term solution is to use parking_lot in the stdlib. For now though, @nagisa's solution (above) is much better than UB, no? It's very unlikely that anyone is using more than 1,088 TLS + RwLock, and we can at least panic if that happens.

I'll send in a PR to do that unless someone has a better idea.

@mattico
Copy link
Contributor

mattico commented Mar 13, 2017

Making RwLock::new() non-const shouldn't be an issue because:

  1. const_fn is unstable so who cares 😄
  2. Everyone should be using sync::RWLock which does not have a const constructor

except panicking.rs in libstd uses the underlying const-fn impl now that StaticRWLock has been removed.

Not an insurmountable problem but there's going to have to be some hackery, like an ad-hoc reimplementation of lazy_static! - which should be in std, IMO.

@alexcrichton
Copy link
Member

@mattico I think such a solution would limit a process to at most 1088 instances of RwLock, right? If that's so that seems... unfortunately too low :(

@mattico
Copy link
Contributor

mattico commented Mar 13, 2017

Oh that's per process, I thought it was per-thread. That might actually be an issue, yes.

@mattico
Copy link
Contributor

mattico commented Mar 13, 2017

Maybe allocate an array per thread (how big?) to use as a bit-set for the RwLock flags, and store a pointer to it in TLS. Then add an index into the set to each RwLock. Now we're only using one TLS per thread that has a RwLock, but it's a bit more of a hack 🙁 . I'll write it up and see how bad it is.

@Amanieu
Copy link
Member

Amanieu commented Mar 13, 2017

@mattico Have a look at the thread_local crate. It allows you to have per-object TLS without using up TLS indexes.

@durka
Copy link
Contributor

durka commented Feb 15, 2018

What is that supposed to mean? It's either defined or not. Do you mean it won't be tagged I-unsound unless someone demonstrates memory corruption?

@RalfJung
Copy link
Member

RalfJung commented Aug 7, 2018

Nobody has been able to actually demonstrate UB in safe code using this yet. It's only theoretical at the moment.

The burden of proof is on the code author to show that their code does not have UB. If the code has UB, it might do anything, including working perfectly well. One cannot show absence of UB by example.


The Windows Mutex implementation is also wrong (lack of proper initialization), and RwLock is UB on macOS as well (and possibly other non-Linux POSIX platforms) because macOS also says recursive locking of an RwLock is UB when a write lock is involved. (Only recursive write and read-write locking though. Making recursive reads UB is just... wtf, what were they thinking?? And the documentation for the relevant function does not even mention this "detail". Wow.)

Oh and the Mutex situation is horrible on all POSIX platforms because there is no const fn way to initialize a sane Mutex. So Mutex initialization is split into two parts in the std::sys layer, and using the "half-initialized" one (which we do in a couple places in libstd) makes recursive locking UB as well. Together with the restrictions about moving things, the API provided by sys::mutex is just plain horrible and I am somewhat nervous about the fact that it might be used incorrectly somewhere in libstd.

I implemented a reentrancy checker for Mutex (which is easier than RwLock) because there are no read locks), and it makes my microbenchmarks slower by ~20%. Not great.

@RalfJung
Copy link
Member

RalfJung commented Aug 7, 2018

Oh, and we are also using SRWLock for Mutex on Windows, so it will have the same UB.


We really just need to migrate to parking_lot already.

I have been thinking about implementing a lazily initialized const fn-capable Mutex for POSIX and maybe even rearrange things a bit to get rid of the restriction about not moving a Mutex around.

But, given all the other problems in particular around RwLock (where at least two major platforms just do not have a reasonable platform-specific implementation) -- it seems more and more reasonable to just roll our own implementation. I also hear that parking_lot is faster than the platform implementations even where those have reasonable semantics.

So, how realistic is it to use parking_lot? Alternatively, we might be able to rearrange the abstractions a bit, moving the (un)park stuff to sys (implemented directly on top of the platform APIs; whether mutexes are recursive should not be a concern here) and then implement Mutex, RwLock and Condvar on top of that ourselves... which I guess is what parking_lot does anyway?

@RalfJung RalfJung changed the title RwLock on Windows theoretically allows undefined behavior in safe code RwLock and Mutex on Window theoretically allows undefined behavior in safe code Aug 13, 2018
kennytm added a commit to kennytm/rust that referenced this issue Aug 24, 2018
Window Mutex: Document that we properly initialize the SRWLock

See rust-lang#35836
@alexcrichton
Copy link
Member

I've opened an internals post to continue more long-form discussion on the topic of continuing to fix these issues.

@Centril Centril added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label Jan 2, 2019
bors added a commit that referenced this issue Apr 3, 2019
Use the parking_lot locking primitives

This PR adds the [`parking_lot`](https://crates.io/crates/parking_lot) code to libstd and uses it for the `sys_common::{mutex,rwlock,condvar,remutex}` implementations.

This has been discussed in https://internals.rust-lang.org/t/standard-library-synchronization-primitives-and-undefined-behavior/8439/9

Thanks @Amanieu for mentoring when doing this, and basically all the code is his as well of course.

Fixes #35836
Fixes #53127
bors added a commit that referenced this issue Apr 19, 2019
Use the parking_lot locking primitives

This PR adds the [`parking_lot`](https://crates.io/crates/parking_lot) code to libstd and uses it for the `sys_common::{mutex,rwlock,condvar,remutex}` implementations.

This has been discussed in https://internals.rust-lang.org/t/standard-library-synchronization-primitives-and-undefined-behavior/8439/9

Thanks @Amanieu for mentoring when doing this, and basically all the code is his as well of course.

Fixes #35836
Fixes #53127
@RalfJung
Copy link
Member

I just looked at the SRW lock docs again and now it says explicitly

Exclusive mode SRW locks cannot be acquired recursively. If a thread tries to acquire a lock that it already holds, that attempt will fail (for TryAcquireSRWLockExclusive) or deadlock (for AcquireSRWLockExclusive)

So, uh, did they retroactively define behavior? And if yes, how far back in terms of platform support does this guarantee go?

@RalfJung
Copy link
Member

RalfJung commented May 4, 2020

Cc @retep998 @mati865 would be good if someone more knowledgeable in Windows matters could decide if there still is a bug here.

We have a direct contradiction between the statements in this post (including @retep998 asking "is it really actually UB" and the answer being "yes") and the docs change added here. The former says

It's a programming error. It is your responsibility as a programmer not to call Acquire­SRW­Lock­Shared or Acquire­SRW­Lock­Exclusive from a thread that has already acquired the lock. Failing to comply with this rule will result in undefined behavior.

and the latter now says (and thus presumably guarantees)

Exclusive mode SRW locks cannot be acquired recursively. If a thread tries to acquire a lock that it already holds, that attempt will fail (for TryAcquireSRWLockExclusive) or deadlock (for AcquireSRWLockExclusive)

Looks like Raymond Chen and whoever changed the docs have a different interpretation of what "cannot be acquired recursively" means. The docs version makes total sense; actually counting how often the current thread holds the write lock would require extra state so a simple naive ("slim") RwLock implementation will deadlock on recursive write locks.

I wonder if there is any way to ask Raymond Chen again in light of this new information?

@mati865
Copy link
Contributor

mati865 commented May 4, 2020

I don't know anyone other than retep998 who could know those parts of Windows API well enough.

@ghost
Copy link

ghost commented May 25, 2020

I wanted a simple mutex that is faster than std::sync::Mutex but didn't want to pull in the whole parking_lot dependency so I wrote this crate: https://github.com/stjepang/simple-mutex

The implementation is fairly simple (it should be reviewable at least). Not quite as efficient as parking_lot but still better than std. The overhead is two words + something allocated on the heap on first case of contention.

I wonder if we should consider implementing a simpler version of parking_lot with reasonable performance and get that into the standard library. Think something with 20% of its complexity and 80% of performance.

@Elinvynia Elinvynia added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jun 9, 2020
@LeSeulArtichaut LeSeulArtichaut removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jun 9, 2020
@RalfJung
Copy link
Member

@rustbot ping windows

If any one of you is familiar enough with the details of SRWLocks on Windows, would be great if you could look into the question posed here. :)

@rustbot rustbot added the O-windows Operating system: Windows label Jun 11, 2020
@rustbot
Copy link
Collaborator

rustbot commented Jun 11, 2020

Hey Windows Group! This bug has been identified as a good "Windows candidate".
In case it's useful, here are some [instructions] for tackling these sorts of
bugs. Maybe take a look?
Thanks! <3
[instructions]: https://rustc-dev-guide.rust-lang.org/notification-groups/windows.html

cc @arlosi @danielframpton @gdr-at-ms @kennykerr @luqmana @lzybkr @retep998 @rylev @sivadeilra @spastorino

@kennykerr
Copy link
Contributor

If the lock is already held by the calling thread, AcquireSRWLockExclusive will hang and TryAcquireSRWLockExclusive will return false. Internally, there is a lock bit and this is used to check whether the lock may be acquired irrespective of thread.

@m-ou-se
Copy link
Member

m-ou-se commented Oct 1, 2020

So this is solved then?

The official documentation is pretty clear now. And @kennykerr verified it. :)

And as @RalfJung already mentioned, making an SRW lock do anything other than deadlock on a recursive lock() call would require more effort. There's not really any space in such a lock to keep track of the thread that locked it.

The only thing left is the contradictory comments on that archived blog post. But those are a bit vague (and a bit aggressive 👀), and feels mostly like miscommunication. The author seems to misunderstand that a deadlock is the outcome we want, as opposed to succesfully recursively locking it: "The fact that they cannot be acquired recursively is what makes them slim!" (They are slim because they don't have to keep track of the thread id, which would mean recursive lock() deadlocks, which is good, that's what we want.) The author also says he isn't sure what Peter means with memory unsafe. There are not many languages where getting a deadlock in this case would be a good thing, so the response seems mostly something like 'just don't do this because it's always wrong, and since you shouldn't do this, we're not making any promises'. And that make sense from a C or C++ perspective, where there's not much reason to specify the difference between a deterministic deadlock and undefined behaviour, as both are unwanted and should be avoided anyway. Things are a bit different for Rust, where the distinction is very important to guard the unsafe/safe border. Luckily they decided to do make this promise after all and update the documentation.

@RalfJung
Copy link
Member

RalfJung commented Oct 3, 2020

Ah I did not realize we have a Windows engineer confirming our interpretation of the docs. :)

Indeed, based on all that, I am fine with closing this issue. But I'd leave the final decision to the libs team -- Cc @rust-lang/libs.

@Amanieu Amanieu closed this as completed Oct 3, 2020
@RalfJung
Copy link
Member

FWIW, there was another comment on the docs issue, that confirms that Rust is using SRWLocks correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. E-help-wanted Call for participation: Help is requested to fix this issue. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-windows Operating system: Windows P-medium Medium priority T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.