Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python-style imap() #4601

Closed
wants to merge 5 commits into from
Closed

Python-style imap() #4601

wants to merge 5 commits into from

Conversation

bjarthur
Copy link
Contributor

Python provides a whole family of parallel map functions, one of which (imap in the multi-processing module) is non-blocking and returns an iterable list. It comes in very handy when the results wouldn't all fit in memory at once. Rather, each result is stored on the workers until de-referenced in the main thread. I provide this same functionality in julia with the new ipmap() function.

@JeffBezanson
Copy link
Member

We want to have this, but it should not be implemented by copying and pasting the code for pmap. Maybe an option to pmap to make it return an array of RemoteRefs instead of results?

@tanmaykm
Copy link
Member

Something similar was required to implement distributed DataFrames using Blocks.

The pmap implemented for Blocks (here: https://github.com/tanmaykm/Blocks.jl/blob/master/src/pmap.jl) takes an additional keyword fetch_results to achieve this by either calling remotecall_wait or remotecall_fetch. Its structure closely resembles Base.pmap and can be easily merged into base if acceptable.

@bjarthur
Copy link
Contributor Author

I understand not wanting to duplicate code, but there is added benefit in ipmap not blocking the command line. That way subsequent code in the calling function can start processing the results as they are finished. @JeffBezanson and @tanmaykm, your solutions both block, no?

How about re-writing pmap to call ipmap? I have done this, and pushed. To show up here the pull request needs to be re-opened I think. If not, let me know if you want to submit a new pull request.

bjarthur@4bdec64

@amitmurthy
Copy link
Contributor

Since julia multitasking is cooperative and not pre-emptive, it may not be a good idea to support processing results as they finish since

  1. if the post-processing is compute heavy, it may block the pmap task from scheduling pending jobs from the input array
  2. say, one of the results is on remote worker N, but if N is currently busy executing a different index from the input array, the fetch from N too will block rendering the design moot.

As Jeff suggested probably just supporting a kw argument specifying returning an array of RemoteRefs is a good compromise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants