Time based fiber yield points #15

ysbaddaden · 2024-06-11T08:37:21Z

CPU bound fibers may never hit a cancellation point, and block a thread to run a fiber for an unbounded amount of time, blocking other fibers from running.

I'm not even talking about CPU heavy algorithms (e.g. cryptography): a regular sockets may prevent a fiber from yielding, for example when the socket's always ready for read or write (the fiber will never wait on the event loop); a buffered Channel won't suspend the current fiber until the buffer is full or empty, which may take a while to happen if the other end is pushing/consuming them quickly.

One goal of execution contexts is to limit this in practice, by taking advantage of the OS to preempt threads. Still, we should have more places places that can yield the current fiber.

For example Fiber#enqueue could check for how long the current fiber has been running and decide to yield when it ran for N milliseconds. Maybe IO methods could do just that (inside the EventLoop). Instead of checking Time.monotonic over and over again, we could have the monitoring thread do the check and mark the fiber (see #5).

The text was updated successfully, but these errors were encountered:

ysbaddaden · 2024-06-11T11:30:31Z

Note: more yielding points means that cooperative shutdown or cooperative stop-the-world, or fiber cancellations could happen faster.

ysbaddaden · 2024-06-20T13:26:38Z

I got the logic implemented:

Fiber.maybe_yield will yield if told to;
Fiber#enqueue always calls Fiber.maybe_yield (*);
start a monitoring thread along with the default execution context;
the monitoring thread wakes up every 10ms, iterates each thread/scheduler, collects the running fiber (with current schedule tick) and marks it for yield if it didn't change since the previous iteration —it should tell a fiber to yield after it ran between 10ms to 30ms;

It works. A busy loop that manually checks for Fiber.maybe_yield regularly yields 🎉

But I quickly get a libevent2 warning:

[warn] event_base_loop: reentrant invocation. Only one event_base_loop can run on each event_base at once.

And an eventual crash:

FATAL: can't resume a running fiber #<Fiber:0x7fa4608ebe40: DEFAULT:loop> (#<ExecutionContext::SingleThreaded:0x7fa4608f0f20 DEFAULT>)
  from src/single_threaded.cr:124:52 in 'resume'
  from src/single_threaded.cr:172:11 in 'run_loop'
  from src/single_threaded.cr:68:56 in '->'
  from src/core_ext/fiber.cr:151:11 in 'run'
  from src/core_ext/fiber.cr:57:34 in '->'
  from ???

(*) more places could eventually call Fiber.maybe_yield, for example uncontented IO methods, buffered channel, mutex, ... and so on.

ysbaddaden · 2024-06-20T13:28:01Z

Note about the crash: this is the EC::Scheduler#run_loop fiber trying to resume itself 🤨

EDIT: this may happen if:

the monitoring thread is trying to interrupt a thread's main fiber that is reserved to run the run loop or the main thread's run loop fiber (in both cases: it musn't);
the scheduler ends up calling Fiber#enqueue (it musn't);

NOTE: had it been a MT context, the thread would have deadlock, waiting for the current fiber to be resumable.

ysbaddaden · 2024-06-20T13:40:36Z

I think the issue is the monitoring thread telling the runloop fibers to interrupt then the event calling Fiber#enqueue to enqueue fibers.

Workaround: events should now call ExecutionContext::Scheduler.enqueue(fiber) instead of Fiber#enqueue;
Ideally: Crystal::EventLoop#run would return a list of fibers, that the scheduler would enqueue in a batch (instead of 1 by 1), minus one to be resumed immediately (skipping the queue).

ysbaddaden · 2024-06-20T13:53:06Z

Alt workaround: the monitoring should skip a scheduler's main fiber.

That fixed the issue.

ysbaddaden added the enhancement New feature or request label Jun 11, 2024

ysbaddaden mentioned this issue Jun 20, 2024

Fiber#status #16

Open

This was referenced Jun 20, 2024

Monitor thread #17

Merged

event_base_loop: reentrant invocation #19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time based fiber yield points #15

Time based fiber yield points #15

ysbaddaden commented Jun 11, 2024

ysbaddaden commented Jun 11, 2024

ysbaddaden commented Jun 20, 2024

ysbaddaden commented Jun 20, 2024 •

edited

Loading

ysbaddaden commented Jun 20, 2024 •

edited

Loading

ysbaddaden commented Jun 20, 2024

Time based fiber yield points #15

Time based fiber yield points #15

Comments

ysbaddaden commented Jun 11, 2024

ysbaddaden commented Jun 11, 2024

ysbaddaden commented Jun 20, 2024

ysbaddaden commented Jun 20, 2024 • edited Loading

ysbaddaden commented Jun 20, 2024 • edited Loading

ysbaddaden commented Jun 20, 2024

ysbaddaden commented Jun 20, 2024 •

edited

Loading

ysbaddaden commented Jun 20, 2024 •

edited

Loading