Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AtomicCondition compile error ziglang/zig#8051 #8637

Closed

Conversation

cescude
Copy link

@cescude cescude commented Apr 29, 2021

This also changes the wait(...) signature to take a Mutex.Impl.Held
object instead of a plain Mutex, which IIUC is clearer to the intent
of the function (ie., you should be passing in a locked mutex, not just
any mutex).

For example:

var mutex = Mutex{};
var cond = Condition{};

var held = mutex.acquire();
defer held.release();

while (bad_state == true) {
    // cond.wait(&mutex); // <-- Original
    cond.wait(&held);     // <-- Updated
}

Not really a breaking change for code using AtomicCondition (since it
didn't compile before), but this does affect the other implementations
and would presumably break anything using that code as well.

This also changes the `wait(...)` signature to take a `Mutex.Impl.Held`
object instead of a plain `Mutex`, which IIUC is clearer to the intent
of the function (ie., you should be passing in a locked mutex, not just
any mutex).

For example:

    var mutex = Mutex{};
    var cond = Condition{};

    var held = mutex.acquire();
    defer held.release();

    while (bad_state == true) {
        // cond.wait(&mutex); // <-- Original
        cond.wait(&held);     // <-- Updated
    }

Not really a breaking change for code using `AtomicCondition` (since it
didn't compile before), but this does affect the other implementations
and would presumably break anything using that code as well.
@jedisct1 jedisct1 added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. standard library This issue involves writing Zig code for the standard library. labels May 9, 2021
@LemonBoy
Copy link
Contributor

cc @kprotty Don't know if you're still plan on overhauling the whole set of synchronization primitives but I'd like to get this merged.

Can you please add some basic tests, preferably not using any sleep call.

@kprotty
Copy link
Member

kprotty commented Jun 11, 2021

@LemonBoy still planning to overhaul sync primitives once #9070 is merged. I can try to PR some tests to this branch today. To be clear, not using sleep (i.e. increase the chance of something) or relying on sleep (i.e. test fails if scheduling doesn't align with delays) ?

@LemonBoy
Copy link
Contributor

still planning to overhaul sync primitives once #9070 is merged.

Awesome news!

To be clear, not using sleep (i.e. increase the chance of something) or relying on sleep (i.e. test fails if scheduling doesn't align with delays) ?

The latter, VMs used in CI are slow and many tests where we tried to order the operations on two different threads using sleep started failing at random times.

cescude and others added 3 commits June 11, 2021 09:51
This also changes the `wait(...)` signature to take a `Mutex.Impl.Held`
object instead of a plain `Mutex`, which IIUC is clearer to the intent
of the function (ie., you should be passing in a locked mutex, not just
any mutex).

For example:

    var mutex = Mutex{};
    var cond = Condition{};

    var held = mutex.acquire();
    defer held.release();

    while (bad_state == true) {
        // cond.wait(&mutex); // <-- Original
        cond.wait(&held);     // <-- Updated
    }

Not really a breaking change for code using `AtomicCondition` (since it
didn't compile before), but this does affect the other implementations
and would presumably break anything using that code as well.
@kprotty
Copy link
Member

kprotty commented Jun 11, 2021

@cescude PR which Condition and Semaphore tests

@kprotty
Copy link
Member

kprotty commented Jun 11, 2021

@cescude You're right in that the extra signal() in Semaphore.wait() can be removed, although i'm not sure how ping would release two threads.

ping posts to request which increases the permits by 1 and does a wake up. Any previous waiters on request would sleep until this happens. One subsequent waiter on request would see the non-zero permits, decrement it, and not do the extra signal since only one permit was added.

pong can only post to reply after receiving a post() from request. And a post() to request must also wait for the acknowledgement of the reply before posting again. Therefore there should only ever be one thread running pong or ping at a given moment (concurrently). Do I understand this correctly?

@cescude
Copy link
Author

cescude commented Jun 11, 2021

Ah, ok, I think you're right.

On a different note, supposing that first call to relay.ping() executes on the main thread before any of the spawned threads are given a chance to run...could there be a problem of losing the first signal? (ie., main thread spawns threads, signals self.request, blocks on self.reply, and then the spawn thread get cpu time & block on self.request).

Nevermind, I'm not thinking clearly here—permits will increase which makes the first wait go through.

@andrewrk
Copy link
Member

I'd like to get rid of the Held type entirely. It seemed like a good idea at first to prevent mistakes but it ultimately hinders composeability and I think it ended up being an unfortunate instance of overengineering.

@cescude
Copy link
Author

cescude commented Jun 12, 2021

@andrewrk Curious as to what you're thinking wrt how this negatively affects composability--

When I think about "which operations are valid" on a locked vs unlocked mutex...there's no overlap (so it makes sense to me to have distinct types exposed for an API).

In getting rid of Held, would you consider releasing an unacquired mutex to be an error or a no-op?

@kprotty
Copy link
Member

kprotty commented Jun 12, 2021

Releasing a mutex without having an acquired handle is common for code which is split by asynchrony (e.g. lock in one function, unlock in another). Also, if you wish to acquire after releasing it, you'd need a reference to the initial Mutex. Having Held and *Mutex both be the same thing underneath feels awkward compared to *Mutex having release() along with acquire().

@cescude
Copy link
Author

cescude commented Jun 13, 2021

Releasing a mutex without having an acquired handle is common for code which is split by asynchrony (e.g. lock in one function, unlock in another).

🤔 Should it be common, though? pthread_mutex_unlock can return EPERM if you try to unlock from a different thread, and ReleaseSRWLockExclusive just says "don't.

IIUC splitting by asynchrony could only apply to a single-threaded async implementation; using a multi-threaded event loop and you might get unreachable w/ pthreads and whatever-the-heck win32 does when you violate a constraint.

Having Held and *Mutex both be the same thing underneath feels awkward compared to *Mutex having release() along with acquire().

Maybe; it's not like we have a different type for open vs closed files, so I get that Held is a bit of a cute implementation (API-wise, it's more akin to Dir yielding a File). I guess I just think it clarifies intent & expectations nicely for a primitive that's prone to being coded wrong.

@kprotty
Copy link
Member

kprotty commented Jun 13, 2021

pthread_mutex_unlock can return EPERM if you try to unlock from a different thread

Zig's stdlib Mutex doesn't (and shouldn't in the near future) have to be based on the OS' mutex implementation. Thread-bound critical section calls is mostly used internally for fair scheduling or priority inheritance, things our Mutex type currently don't guarantee.

using a multi-threaded event loop and you might get unreachable

You can bind an asynchronous task to a thread in a multi-threaded event loop. But irrespective that part, the "asynchrony" here doesn't imply Zig's async. Instead it implies one where control flow is split with a continuation. A single-threaded example is using a C library which scans a data structure and calls a user provided callback with void*. It also happens when crossing API boundaries for a library which invokes into a single-threaded scripting language API.

I just think it clarifies intent & expectations nicely

It does for simple, linear-lifetime, RAII-like use cases but becomes cumbersome otherwise IMO. The unlocking/locking scenario requires storing two pointers to the same data structure for example. A popular Rust mutex impl also has a release() which side-steps the RAII handle. I feel as though Held could be justified as an abstraction if it hosted more context other than the Mutex reference itself (like File to Dir).

@Vexu Vexu marked this pull request as draft June 13, 2021 15:08
@cescude
Copy link
Author

cescude commented Jun 13, 2021

A single-threaded example is using a C library which scans a data structure and calls a user provided callback with void*. It also happens when crossing API boundaries for a library which invokes into a single-threaded scripting language API.
👍

It does for simple, linear-lifetime, RAII-like use cases but becomes cumbersome otherwise IMO. The unlocking/locking scenario requires storing two pointers to the same data structure for example. A popular Rust mutex impl also has a release() which side-steps the RAII handle. I feel as though Held could be justified as an abstraction if it hosted more context other than the Mutex reference itself (like File to Dir).

FWIW, it's interesting that the parking_lot::Mutex implementation supports both a "held" style API (MutexGuard) in addition to the in-place raw_unlock().

Anyways, thanks for taking the time to discuss.

@kprotty
Copy link
Member

kprotty commented Jun 13, 2021

Np, this feels like it should be an issue so that more people could weight in on their preferences if they have any.

@andrewrk
Copy link
Member

Closing old draft

@andrewrk andrewrk closed this Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants