-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider using structured concurrency instead of being able to spawn processes at any point #759
Comments
This approach has a challenge: if |
The problem here is related to #750. I think we could do something like this to make things better:
Such borrows can be created for any type, though borrowing If we introduce structured concurrency, we can allow sharing For more complex cases you could use borrows that use runtime borrow counting, let's call those This basic setup is interesting, but introduces a problem: if a In addition, we can make things easier by introducing implicit RC borrows can't be shared between processes since they use regular reference counting. We could allow using those by simply exposing
|
A further enhancement of the above idea: not being able to store
To make such patterns possible, we could allow storing borrows in data if and only if the borrow doesn't escape the surrounding scope/function. This likely requires more complicated escape analysis though, so a simplification could be this: A borrow can be stored in another value, but only if the value is not assigned to a variable, and does not receive calls to The assignment restriction means we can keep the check local to the expression, while the call restriction prevents the borrow from escaping through instance methods that either mutate or move the receiver. This may be overly strict though. |
The store restriction for compile-time borrows would be enforced as follows:
Not being able to capture borrows is annoying though, but allowing that would require escape analysis on the closure to determine if it outlives the borrow or not. |
The borrowing proposal has a flaw: if a
Even if we did allow this somehow, there's the problem of borrowing: an iterator can't yield compile-time borrows because that would violate the "you can't store them" invariant. This means they'd have to produce RC borrows, which you can't capture in a Of course we can still introduce structured concurrency and tackle the borrowing issue separately. |
Here's another example of where structured concurrency is easier/requires less code:
This just illustrates a case where you have a bunch of values and want to compute something asynchronously, them collect the results in some way. This requires a bit of boilerplate (e.g. the
This approach has several benefits:
|
Openflow uses A heavily condensed version of the
If we throw
The main loop here is what I wanted to avoid by adding With that said, the |
To put it differently, there's some interesting opposites at play here: Structured concurrency using the proposed |
Another thing worth noting: when looking at code, |
Another thought: structured concurrency is nice, but if the only benefit is that after an In addition, channels were introduced to make fork-join workflows easier, but this sort of violates the general idea of actors. An alternative would be to (re)introduce the A fully async model would be nicer, but this won't work well for fork-join workflows as the capabilities/messages of the calling process aren't necessarily known, so there's no way for child processes to communicate back their results. |
I'm currently leaning towards keeping |
Using futures and processes, we can in fact implement something like
The caveat is that making it a fixed-size channel is probably a little more tricky, and that there would be some extra overhead due to the message sending. The way you'd use this for e.g. |
Also worth adding: futures would need a shared internal state. This would require making them use atomic reference counting. This in turn means that values can be written and received multiple times, meaning they're technically a promise instead of a future. This in turn also means they're just channels with a capacity of 1, though this allows them to be a bit more efficient. |
Futures/promises allowing multiple reads and writes might actually not be so bad: in the above example it means you can reuse the same future/promise for every test, instead of having to allocate one for each test. |
Using futures, using processes is a bit like iterators: every time you want them to produce a result, you have to "advance" them by sending a message. Similar to external iterators, this can make certain implementations tricky. For example, if you have a process that walks through the files in a directory, you have two choices:
If we compare just futures with channels, channels are strictly superior in terms of flexibility because:
The latter is also worth highlighting: if you have a bunch of work that needs to be performed across processes, a channel allows for that work to be balanced automatically (since it's just a shared FIFO queue). If all we had were futures, we'd need some sort of round-robin approach. This can result in worse performance if one process is performing a big job, because other processes can't steal the still pending messages. |
A benefit in favour of
Using channels we'd have to define a |
Regarding load balancing: this can be achieved by spawning a process for each job, i.e. M processes for M jobs, instead of mapping M jobs onto N processes where EDIT: actually, this isn't the case. The test suite has 1079 tests, and a bunch of these spawn Inko sub processes. Each of those uses a bunch of threads, which seems to trigger some thread/process count limit on my machine. In other words, there are times where you do need to limit the amount of concurrency and use channels to balance the load. |
I'm going to leave this be for the time being. At this stage I think it's premature to replace our concurrency setup with something else, and I can't really think of anything better either. We'll likely revisit this at some point in the future. |
Description
The premise of structured concurrency is simple: you have something that signals a join scope, and a construct to spawn processes in that scope. When the end of the scope is reached, all processes are joined. Conceptually this is the same as having a channel with space for N messages, N processes that use the channel and send it a message when they're join, and N calls to
Channel.receive
to wait for the processes.Structured concurrency makes reasoning about concurrency easier, because it's more clear where asynchronous work starts and ends. In addition, by making this part of the compiler/type system you can allow for interesting patterns such as sharing read-only access to data. This article is a good reference on the subject.
A hypothetical setup would be the following:
The idea is that
async
signals a scope in which asynchronous operations can happen.spawn
in turn spawns a new process in the inner mostasync
block. Variables defined outside anasync
scope are exposed as immutable references. Values of typeuni T
are moved into anasync
scope upon capture by it or aspawn
expression.spawn
expressions in turn can only capture variables that are value types (e.g.Int
) or defined outside theasync
expression. If aspawn
captures a variable of typeuni T
, the variable is moved into thespawn
. This means such variables need to be assigned a new value if captured inside aspawn
inside a loop.When reaching the end of the
async
expression, we discard any unused values as usual, then join all the spawned processes. The return value of anasync
is the last expression, just as with regular scopes.Processes in turn are given a
value
method. When called, it joins the process and returns whatever value thespawn
expression returned. This is done by generating a class for eachspawn
with the appropriate fields (one for each capture), and by generating saidvalue
method. We also generate a dropper that does the same thing, but discards the value. The value is stored by generating a dedicated field (= the first one), writing the return value into that field, and setting a bit somewhere to indicate that we've in fact written a value (we can't useNULL
/0
because then returning integer0
wouldn't work).If the value is a
T
it can be lifted into auni T
through arecover
. This is safe because upon termination anyT
returned can't have any references pointing to it in the old process.Process messages is removed (i.e. no more
fn async
). Channels remain and would be used instead if some form of communication protocol is necessary. The concept of sendability remains (i.e. you still can't stick aref User
in aChannel
).uni T
values in turn are used if you want to transfer ownership of some complex data from one process to another, either via it capturing the variable or through aChannel
.Compared to the current process setup, structured concurrency would allow for more efficient fork-join workflows as processes can capture immutable data, something that's not possible with the
class async
setup (as immutable borrows aren't sendable).For long-lived background processes, you simply create a top-level
async
of sorts, spawn the necessary processes in there, and include the rest of the logic in theasync
block, i.e:Related work
The text was updated successfully, but these errors were encountered: