-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure when run inside module #105
Comments
Not an expert on the internals so take this with a grain of salt, but my understanding is that Distributed by design expects all processes in the cluster to have the same global state. And defining new modules or importing things changes that state, which causes the errors you see. |
The problem is that when you create new workers with |
You should generally not be performing side-effects like this at the top-level of a module - our module top-level is basically "compile-time", if that helps think about whether this is a good or bad idea. You could maybe do this in |
I have a slightly different setup from the topic starter, and I can say that the problem is not related to his code being in the module's top level. If I have a module with a function that creates a new process with |
@andreyz4k can you provide an MWE? |
Ok, here it goes
The first test is what I would prefer — clean code, no manual initialization. Here is the result
In the second test, I tried to adapt the trick from
|
Well, it looks like I found a partial workaround
When the called function is exported, it can be accessed in the global scope, where the worker does the evaluation, so it succeeds.
But if I'm trying to use a macro like |
Ahh ok that makes sense - this is related to the discussion we had in the JuliaHPC call about having |
I'm not sure if it can be solved by just running some user-defined startup code. I've also tried to add an In any case, the main problem here is that serialization is done in the module scope, and deserialization in the worker is done in Main. I don't think that there is any way to change that in the initialization script. Furthermore, module context can be different for different invocations sent to the same worker, so it should be an invocation-level setting. |
I did some more experiments and the temporary fix is to use
It still requires the same initialization hack, but I won't need to export everything I want to run. I will use this method in my project for now, but it would be great if this issue could be fixed in the main Distributed codebase. |
I am trying to run some parallel code defined inside a module via
include_string
, and I can't quite figure out how it works or how it's supposed to work.It seems like a bug for such a simple case to not work, so I'm making an issue here. But it may be something that can't be fixed with Julia's current module system.
Main question: Is there a way to use a sandbox module without breaking Distributed.jl?
Short failing code
A simple remote call works when run in module Main:
But doesn't work when in a different module:
Test scripts
I made a test script to figure out how to work with this. I run the script in a sandbox module and in the Main module.
Driver script
mfe_driver.jl
.Escape-to-main method:
mfe.jl
.The second is a script that escapes the sandbox module to get the function to run remotely:
mfe.jl
.mfe.jl
remoteprocess_call
works as expected.@everywhere
. And I don't understand how that is different from defining the function on the driver process without@everywhere
.Output of "julia mfe.jl"
remotecall_wait
works only if the function is defined in Main for each process.Output of "julia mfe_driver.jl mfe.jl"
Define-sandbox-everywhere method:
mfe_define_module.jl
I don't want to escape the sandbox to define things in the Main scope, so I tried to work within the sandbox.
This file tries to define the sandbox module for the worker processes.
mfe_define_module.jl
remotecall_wait
.Output of "julia mfe_driver.jl mfe_define_module.jl"
Version info
I tested this with Julia installed by juliaup version 1.13.0, with Julia versions below.
and
The text was updated successfully, but these errors were encountered: