Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Par framework performance improvements #11

Open
lars-t-hansen opened this issue Jan 21, 2015 · 0 comments
Open

Par framework performance improvements #11

lars-t-hansen opened this issue Jan 21, 2015 · 0 comments

Comments

@lars-t-hansen
Copy link
Owner

(1) When a callback is null, the full master/worker barrier is not needed, a worker-only (symmetric) barrier is enough and is probably quite a bit faster. It would be useful to implement that optimization.

Indeed, when operations are queued, the current implementation still makes use of the master-worker barrier and the callback mechanism, meaning the master must return to the event loop for queued items to be processed, and is actually holding up progress if it does not return to the main loop on a fairly prompt basis. Using the worker-only barrier would probably help remove that requirement (which is documented).

The way to implement that is probably with a level of indirection, where there are several complete task queues in the working memory (each with a next and limit pointer), where each queue may carry some indication about which barrier to use at the end. (A little tricky that, since the master must still unblock the workers if they finish available work before more work is ready. What we really want is a master/worker barrier where the master can register interest in control and/or callback or not, dynamically. It is possible that a way to resilience is for the barrier callback to pass a sequence number, so as to avoid confusion about earlier sent callbacks.)

(2) There is unnecessary lock overhead in having the single work queue (the pointer for the next item is hotly contended), that might be improved by having per-worker queues with work stealing or some sort of batch refilling. It should not matter too much if the grain is "right" (see later item on hinting) because then computation will dominate communication, but it would be useful to know, and cheaper communication would allow for better speedup of cheap computations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant