You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(1) When a callback is null, the full master/worker barrier is not needed, a worker-only (symmetric) barrier is enough and is probably quite a bit faster. It would be useful to implement that optimization.
Indeed, when operations are queued, the current implementation still makes use of the master-worker barrier and the callback mechanism, meaning the master must return to the event loop for queued items to be processed, and is actually holding up progress if it does not return to the main loop on a fairly prompt basis. Using the worker-only barrier would probably help remove that requirement (which is documented).
The way to implement that is probably with a level of indirection, where there are several complete task queues in the working memory (each with a next and limit pointer), where each queue may carry some indication about which barrier to use at the end. (A little tricky that, since the master must still unblock the workers if they finish available work before more work is ready. What we really want is a master/worker barrier where the master can register interest in control and/or callback or not, dynamically. It is possible that a way to resilience is for the barrier callback to pass a sequence number, so as to avoid confusion about earlier sent callbacks.)
(2) There is unnecessary lock overhead in having the single work queue (the pointer for the next item is hotly contended), that might be improved by having per-worker queues with work stealing or some sort of batch refilling. It should not matter too much if the grain is "right" (see later item on hinting) because then computation will dominate communication, but it would be useful to know, and cheaper communication would allow for better speedup of cheap computations.
The text was updated successfully, but these errors were encountered:
(1) When a callback is null, the full master/worker barrier is not needed, a worker-only (symmetric) barrier is enough and is probably quite a bit faster. It would be useful to implement that optimization.
Indeed, when operations are queued, the current implementation still makes use of the master-worker barrier and the callback mechanism, meaning the master must return to the event loop for queued items to be processed, and is actually holding up progress if it does not return to the main loop on a fairly prompt basis. Using the worker-only barrier would probably help remove that requirement (which is documented).
The way to implement that is probably with a level of indirection, where there are several complete task queues in the working memory (each with a next and limit pointer), where each queue may carry some indication about which barrier to use at the end. (A little tricky that, since the master must still unblock the workers if they finish available work before more work is ready. What we really want is a master/worker barrier where the master can register interest in control and/or callback or not, dynamically. It is possible that a way to resilience is for the barrier callback to pass a sequence number, so as to avoid confusion about earlier sent callbacks.)
(2) There is unnecessary lock overhead in having the single work queue (the pointer for the next item is hotly contended), that might be improved by having per-worker queues with work stealing or some sort of batch refilling. It should not matter too much if the grain is "right" (see later item on hinting) because then computation will dominate communication, but it would be useful to know, and cheaper communication would allow for better speedup of cheap computations.
The text was updated successfully, but these errors were encountered: