-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Stackful vs stackless coroutines #1408
Comments
@pfeatherstone Certain blog posts I've read have indicated marginal benefits of stackless over stackful coroutines in terms of throughput and response times, but these are all use-case-dependent. Benchmark your fast-paths with the two. I can assume memory usage with stackful coroutines would be slightly higher as the heuristics to pre-allocate "stack" are often a bit eager, but it's possible that reduces total number of allocations. Additionally, custom allocators can be used in any asynchronous operation, of which a pool could help for stackless coroutines. The implicit strand is not a consequence of some internal magic, it's merely a result of continuations. Conceptually, an awaitable's continuation is the code below its I don't have experience with cobalt, as it just came out, but it uses 1 thread on io_context as the only executor which is disadvantageous in my uses. |
I am the author of boost.cobalt and boost.experimental.coro, so obviously a fan of C++20 coros. Overview of utilitiesThe stackful coroutines are implemented on top of
Context switchingRegarding performance: I benchmarked on linux and a context switch with context was 2.1x (gcc) or 5.2x (clang) slower than a C++20 coroutine. Plus boost.context is (necessarily) assemby, so it's opaque to the compiler. That is, it can't be optimized out, whereas C++20 coroutines can and are, although rarely. Code styleA major difference between the two kinds is how the suspension takes place. C++20 coroutines require the use of the awaitable<void> coro()
{
foo();
co_await bar(); // co_await screams that this is async
}
void coro(yield_context yield_)
{
foo();
bar(yield_); // idk, is this async?
} That also means you could make async optional with thread_local asio::yield_context * yield_ = nullptr;
std::size_t my_read(socket & s)
{
if (yield_)
return s.async_read(*yield_);
else
return s.read();
} Stack depthBecause stackful coroutines are on a stack it means you can nest calls & suspensions without any overhead. std::size_t do_read(socket & s, yield_context & ctx) // it's essentially a regular function
{
s.async_read(ctx);
} This is relevant for your API design. Not however that the coroutine's stack is not very large by default. Due to C++20 coros being stackless, you can have nested calls far deeper than a stackful one. asio::awaitable<void> recurse_for_no_reason(std::size_t n)
{
if (n-- >= 0)
co_await recurse_for_no_reason(n);
} Since the function frame lives on the stack this will lead to NO stack buildup. AllocationsThe downside of C++20 coros is the need to allocate their frame. This can be modified (with boost.cobalt) and is cached with asio::experimental::coro<void, std::size_t> do_read(socket & s)
{
co_return co_await s.async_read(asio::deferred);
} Now every You can workaround this by using yielding coros, so that you have one allocation upfront. asio::experimental::coro<std::size_t, std::size_t> reader(socket & s)
{
while (true)
co_yield co_await s.async_read(asio::deferred);
co_return -1;
} And if you need to update arguments half way through you can use push-args: asio::experimental::coro<std::size_t(socket &), std::size_t> reader(socket & s)
{
auto * s_ = s&;
while (true)
s_= & co_yield co_await s_->async_read(asio::deferred);
co_return -1;
} |
Wow, that was beautiful |
Since there isn't a discussion tab, I'm having to post an issue.
Does anyone have opinions on the use of stackful vs stackless coroutines in asio?
Some of the discussion points?
My guess is that C++20 coroutines are more lightweight. However they might do more allocations? So in a high performance server handling millions of HTTP requests per second, C++20 coroutines might not be a good choice? I don't know, I'm asking.
It looks like asio has really good tooling for C++20 coroutines. For example,
co_await
ing multiple awaitables. That's a nice feature, since those things can happen in parallel presumably. I'm not sure, but I don't think that's so effortlessly done with asio's stackful coroutines.awaitable
is an implicit strand. Does the same hold forbasic_yield_context
?Basically, does someone have a good story to tell regarding this?
Also, Boost::Cobalt provides some library coroutines. Has anyone tried using them with Asio? Why should i use them instead of asio's
awaitable
orexperimental::coro
coroutines?The text was updated successfully, but these errors were encountered: