Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cooperative multitasking for UnixDatagram #5967

Closed
wants to merge 6 commits into from

Conversation

maminrayej
Copy link
Member

@maminrayej maminrayej commented Aug 31, 2023

The recv function of UnixDatagram does not participate in cooperative multitasking as illustrated in issue #5946.

Motivation

Solution

UnixDatagram utilizes Registration's async_io function, which depends on the Readiness future. I've modified its poll function to be budget-aware.

NOTE: The async_io function is not exclusive to UnixDatagram but is also used in udp.rs. This PR might address potential starvation in those functions as well. I'll verify this through testing.

Closes #5946.

@hawkw
Copy link
Member

hawkw commented Aug 31, 2023

It looks like a number of tests are now failing on CI: https://github.com/tokio-rs/tokio/actions/runs/6037644879/job/16382367407?pr=5967

I believe the reason that we're seeing failures in tests for unrelated APIs such as TcpStream is that these types also use Registration::readiness, in addition to Registration::poll_ready.

@@ -411,3 +411,47 @@ async fn poll_ready() -> io::Result<()> {

Ok(())
}

#[tokio::test(flavor = "current_thread")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test macro will actually default to this


async move {
loop {
tokio::time::sleep(Duration::from_millis(250)).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would recommend avoiding time in tests like this, and instead using yield_now to ensure that tasks wait a certain number of ticks rather than an amount of time. timing makes things brittle

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how would I do that. Are there any existing tests in the code base that I could reference as a template?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'd recommend doing here is rewriting this test such that you have the main task use a pair of UDS sockets to message itself and read the datagram message in a for loop which runs just often enough that it should deplete its budget once, with a second task spawned before the loop which sets a boolean flag and exits to indicate that it run.

You'd then assert after the loop in the main task that the flag was flipped and the second task ran.

@Noah-Kennedy
Copy link
Contributor

It looks like a number of tests are now failing on CI: https://github.com/tokio-rs/tokio/actions/runs/6037644879/job/16382367407?pr=5967

I believe the reason that we're seeing failures in tests for unrelated APIs such as TcpStream is that these types also use Registration::readiness, in addition to Registration::poll_ready.

Yeah, I think we should either pull this out of there or move this up to the leaf functions.

@maminrayej
Copy link
Member Author

I believe the reason that we're seeing failures in tests for unrelated APIs such as TcpStream is that these types also use Registration::readiness, in addition to Registration::poll_ready.

Indeed. One notable example is writable in TcpStream becoming budget-aware and causing tests like try_read_write to fail. Of course I can rewrite the tests so they don't overuse the limited budget, but I believe some functions becoming budget-aware as a result of this PR will have noticeable effects on downstream crates.

Should I open a different issue for this matter, or discussing here is fine?

cc @Darksonn

@Noah-Kennedy
Copy link
Contributor

I believe the reason that we're seeing failures in tests for unrelated APIs such as TcpStream is that these types also use Registration::readiness, in addition to Registration::poll_ready.

Indeed. One notable example is writable in TcpStream becoming budget-aware and causing tests like try_read_write to fail. Of course I can rewrite the tests so they don't overuse the limited budget, but I believe some functions becoming budget-aware as a result of this PR will have noticeable effects on downstream crates.

Should I open a different issue for this matter, or discussing here is fine?

cc @Darksonn

The solution I had been planning on here was moving budgeting into leaf futures and calls rather than doing it in here.

Let's leave the behavior of the try_ functions as is for now.

@Darksonn Darksonn added A-tokio Area: The main tokio crate M-net Module: tokio/net M-coop Module: tokio/coop labels Sep 1, 2023
@maminrayej
Copy link
Member Author

We could introduce a new future named BudgetAwareReadiness that wraps the existing Readiness future. This would enable us to selectively render specific functions, such as those in UnixDatagram and UdpSocket (as outlined in #5946), budget-aware while preserving the current behavior in others, like TcpStream::writable. Does this approach seem like a balanced compromise?
Additionally, I can create a tracking issue to enumerate the tasks required for achieving a more consistent behavior throughout.

@Darksonn
Copy link
Contributor

Darksonn commented Sep 3, 2023

So, one important factor here is that we should only consume budget when we actually perform an action. Having writable return Ready is not really an action. So perhaps this makes sense?

  • Have readable/writable and friends return Pending if the budget is consumed, but don't consume any budget.
  • Have try_* methods consume a budget if they succeed, but they don't otherwise look at the budget, and won't fail if the budget is empty.

Thoughts? @Noah-Kennedy @carllerche

@carllerche
Copy link
Member

@Darksonn Yep, that seems right to me. The key to consider is 1 call to writable() may result in N calls to try_write, so it is important to consume the budget on the actual op.

Comment on lines +474 to +480
if !crate::runtime::coop::has_budget_remaining() {
// Wasn't ready, take the lock (and check again while locked).
let mut waiters = scheduled_io.waiters.lock();

let w = unsafe { &mut *waiter.get() };

if *is_waiter_registered {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already mentioned this on our Discord chat, but I'll put it here too. This code isn't correct because when the coop budget is empty, you're registering the waker with the IO driver. But in this case, we need to be woken up by the coop system, and not by the IO waker. So you should register for readiness with the coop system instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Thank you for pointing it out. I'm swamped right now with my job and university. I'll get to it as soon as I find some free time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. That's perfectly fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maminrayej I can take this over from you if you're busy, I'd like to get this change over the finish line relatively quickly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Noah-Kennedy Absolutely, feel free to take it over. Thanks for stepping in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem!

@Darksonn
Copy link
Contributor

Darksonn commented Nov 5, 2023

Since Noah has offered to take over this, I'll close the PR to remove it from the list of things to review. However, if something changes and you would still like to continue instead of Noah, then feel free to reopen.

@Darksonn Darksonn closed this Nov 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-coop Module: tokio/coop M-net Module: tokio/net
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Datagram sockets do not appear to participate in cooperative multitasking
5 participants