-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Treat python_app input/output default arguments like other magic keywords, and like bash_app #3489
Conversation
…arsl into python_bash_app_parity
|
I'm not able to run the entire test suite locally, but I tried running the individual test that fails and it seems to run fine locally. |
From the logs it appears that |
For future, the error that @astro-friedel is referring to that has appeared a few times looks like this below paste. The non-obvious bit of this (to me) is that the code is trying to serialize a I'll try to dig into debugging this more seriously in the next few days.
|
one thing that is suspicious from that stack is that ...
... it looks like the code is trying to serialize a module (probably the containing module for a defined function) which is almost always not what should be happening in Parsl, and that can sometimes lead to bringing in way more spurious stuff into the object graph... So that's probably the first place I'll start poking. |
i'm able to recreate this locally by running these two test cases in a specific sequence:
something happening in this newly introduced test_inputs_default test is interacting badly with what's happening in that MPI test, to do with module and function definitions... |
another pairing that fails for me locally: |
ok, this is some "interesting" serialization stuff - @arhag23 I would leave this PR as it is and go do something else for now, and then let's look at this again when I've looked deeper, rather than you waiting on this. |
I made this change to a copy of your branch and it seems to make things work. I mostly understand what's going on there (and it's something that shouldn't have been happening in parsl, that just happens to have never apparently broken before...) I'll write up a proper pull request for just this change with more details. |
This addresses behaviour subtle enough we've gone years without noticing but is now causing problems for a test introduced in apparently-unrelated PR #3489. What's happening in the removed line is that update_wrapper is being used accidentally in two ways: i) a functional style: return a remote_side_bash_executor that looks like func and ii) modify remote_side_bash_executor to look like func That second step is problematic: it happens to every single bash_app decoration - so remote_side_bash_executor accumulates "look-alike" context (attributes?) from every function that is being decorated. $ cat uw1.py from functools import update_wrapper def a(): pass def b(): return 7 print("b looks like this:") print(repr(b)) print("update_wrapper of b to look like a:") print(update_wrapper(b, a)) print("b looks like this:") print(repr(b)) $ python3 uw1.py b looks like this: <function b at 0x70fd0ff1e2a0> update_wrapper of b to look like a: <function a at 0x70fd0ff1e2a0> b looks like this: <function a at 0x70fd0ff1e2a0> In the case of PR #3489, that PR introduces a bash_app that cannot be serialized. That's fine, because it only tries to run it in a ThreadPoolExecutor, and executor-specific apps are fine (see, for example, the concept of join apps, which must run in the main DFK process) However, context from that cannot-be-serialized app is accumulated in remote_side_bash_executor, and once that has happened, remote_side_bash_executor can never be serialized again in the same process, meaning that any later defined bash_apps can never run in a serilalized context. In the PR #3489 vs CI case, with randomised test orders, that results in `--config local` tests randomly failing, when they happen to be the first bash_app+remote execution test that runs after the newly introduced PR #3489 tests. Other problems with this update wrapper: it changes the name of the function: parsl has multiple @wraps decorations, for a similar purpose, to disguise wrapped functions as the original app. This is not so necessary in modern times (although is in some subtle cases) - in most cases, this is (I think, @yadudoc?) to get the overridden object to have a name attribute that matches the underlying app because it was sometimes used as the "name" of the app for example in logs. In this case, that isn't so relevant: the remote_side_bash_executor wrapped code is then immediately re-wrapped in a `partial` object, which has a __name__ attribute set on it right after, based on the original func. So there's no need to do anything to the remote_side_bash_executor. Renaming the remote_side_bash_executor globally also causes very confusing object representations when trying to debug: the remote_side_bash_executor now claims to be an app function, and much more confusingly, it can claim to be a *different* app function than is being invoked.
tl;dr bash_app unexpectedly mutates the global remote_side_bash_executor function, which is bad. This PR makes it not do that. This addresses behaviour subtle enough we've gone years without noticing but is now causing problems for a test introduced in apparently-unrelated PR #3489. What was happening before this PR, in the removed line is that update_wrapper is being used accidentally in two ways: i) a functional style: return a remote_side_bash_executor that looks like self.func and ii) modify remote_side_bash_executor to look like self.func That second step is problematic: it modifies a global object (the remote_side_bash_executor callable object) on every bash_app decoration, and so leaves it finally looking like the most recent bash_app decoration. $ cat uw1.py from functools import update_wrapper def a(): pass def b(): return 7 print("b looks like this:") print(repr(b)) print("update_wrapper of b to look like a:") print(update_wrapper(b, a)) print("b looks like this:") print(repr(b)) $ python3 uw1.py b looks like this: <function b at 0x70fd0ff1e2a0> update_wrapper of b to look like a: <function a at 0x70fd0ff1e2a0> b looks like this: <function a at 0x70fd0ff1e2a0> In the case of PR #3489, that PR introduces a bash_app that cannot be serialized. That's fine, because it only tries to run it in a ThreadPoolExecutor, and executor-specific apps are fine (see, for example, the concept of join apps, which must run in the main DFK process) However, context from that cannot-be-serialized app is accumulated in remote_side_bash_executor, and once that has happened, remote_side_bash_executor can never be serialized again in the same process, meaning that any later defined bash_apps can never run in a serilalized context. The "context" here is specifically the __wrapped__ attribute of remote_side_bash_executor: when remote_side_bash_executor is serialized (as a code object, because it does not have a matching global __module__.__name__ due to the decoration), the attributes are serialized including the __wrapped__ attribute, which contains the underlying function of the most recent bash decoration (not the underlying function of the bash app currently being invoked). That function is, in this case, the function that cannot be serialized. In the PR #3489 vs CI case, with randomised test orders, that results in `--config local` tests randomly failing, when they happen to be the first bash_app+remote execution test that runs after the newly introduced PR #3489 tests. Other problems with this update wrapper: it changes the name of the function: parsl has multiple @wraps decorations, for a similar purpose, to disguise wrapped functions as the original app. This is not so necessary in modern times (although is in some subtle cases) - in most cases, this is (I think, @yadudoc?) to get the overridden object to have a name attribute that matches the underlying app because it was sometimes used as the "name" of the app for example in logs. In this case, that isn't so relevant: the remote_side_bash_executor wrapped code is then immediately re-wrapped in a `partial` object, which has a __name__ attribute set on it right after, based on the original func. So there's no need to do anything to the remote_side_bash_executor. Renaming the remote_side_bash_executor globally also causes very confusing object representations when trying to debug: the remote_side_bash_executor now claims to be an app function, and much more confusingly, it can claim to be a *different* app function than is being invoked. Removing this rename probably has some effect on serialization: if the remote_side_bash_executor has a __module__.__name__ that matches the global definition of the same name, then it can be send as a pickle-style name-of-function, rather than a dill-style code object. That's probably good for efficiency. TODO: test that serialization changes by manual observation. I wonder if there is an automated test to check that these parsl internal functions definitely get sent by reference? and are there other top-level functions that are wrapped and so broken this way? (incidentally this is an argument against using wrapper style coding in parsl vs partial style function application... maybe theres a separate issue to note there?) What's the test assertion here for "how things are serialized?" - plug something into dill or the parsl serializer library to say that test serialization must follow a particular code path? this is getting out of scope for this PR, though - so maybe turn this TODO into an issue?
tl;dr bash_app unexpectedly mutates the global remote_side_bash_executor function, which is bad. This PR makes it not do that. This addresses behaviour subtle enough we've gone years without noticing but is now causing problems for a test introduced in apparently-unrelated PR #3489. What was happening before this PR, in the removed line is that update_wrapper is being used in two ways: i) Deliberately, a functional style: return a remote_side_bash_executor that looks like self.func and ii) Accidentally, modify the global remote_side_bash_executor to look like self.func That second step is problematic: it modifies a global object (the remote_side_bash_executor callable object) on every bash_app decoration, and so leaves it finally looking like the most recent bash_app decoration. For example: ``` $ cat uw1.py from functools import update_wrapper def a(): pass def b(): return 7 print("b looks like this:") print(repr(b)) print("update_wrapper of b to look like a:") print(update_wrapper(b, a)) print("b looks like this:") print(repr(b)) $ python3 uw1.py b looks like this: <function b at 0x70fd0ff1e2a0> update_wrapper of b to look like a: <function a at 0x70fd0ff1e2a0> b looks like this: <function a at 0x70fd0ff1e2a0> ``` PR #3489 introduces a bash_app that cannot be serialized. That's fine in the context of that PR, because it only tries to run it in a ThreadPoolExecutor where serialization does not happen, and executor-specific apps are fine - see, for example, the concept of join apps, which must run in the main DFK process in a ThreadPoolExecutor. However, in certain test case orders, the `__wrapped__` value of remote_side_bash_executor points to that "bad" bash_app, and when that has happened, remote_side_bash_executor cannot be serialized as part of an app invocation to a remote worker in a different (for example, htex-using) test. This PR removes that update_wrapper, causing a few changes: because __wrapped__ is now not set, the function for the last-decorated bash_app is no longer sent along side every invocation of every other bash_app. This removes the error in PR #3489. Because the __name__ of remote_side_bash_executor is no longer mutated, the default pickle pass-by-name serialization can happen, as pickle is able to assume that it can import the function as a global on the remote side rather than sending the modified definition of remote_side_bash_executor. These two changes result in a reduction of the serialized form of an example bash_app (measured in DillCallableSerializer.serialize) from 6940 bytes to 2305 bytes. Issue #3941 contains a feature request to look at those remaining 2305 bytes to see if anything else can be removed here. This change also removes some confusing repr() behaviour when debugging: when update_wrapper is used, any reference in reprs to remote_side_bash_executor are output as references to the most recently decorated bash_app. After this PR, when update_wrapper is removed, references are output correctly.
@arhag23 @astro-friedel I'm hoping that PR #3492 fixes this test failure. |
tl;dr bash_app unexpectedly mutates the global remote_side_bash_executor function, which is bad. This PR makes it not do that. This addresses behaviour subtle enough we've gone years without noticing but is now causing problems for a test introduced in apparently-unrelated PR #3489. What was happening before this PR, in the removed line is that update_wrapper is being used in two ways: i) Deliberately, a functional style: return a remote_side_bash_executor that looks like self.func and ii) Accidentally, modify the global remote_side_bash_executor to look like self.func That second step is problematic: it modifies a global object (the remote_side_bash_executor callable object) on every bash_app decoration, and so leaves it finally looking like the most recent bash_app decoration. For example: ``` $ cat uw1.py from functools import update_wrapper def a(): pass def b(): return 7 print("b looks like this:") print(repr(b)) print("update_wrapper of b to look like a:") print(update_wrapper(b, a)) print("b looks like this:") print(repr(b)) $ python3 uw1.py b looks like this: <function b at 0x70fd0ff1e2a0> update_wrapper of b to look like a: <function a at 0x70fd0ff1e2a0> b looks like this: <function a at 0x70fd0ff1e2a0> ``` PR #3489 introduces a bash_app that cannot be serialized. That's fine in the context of that PR, because it only tries to run it in a ThreadPoolExecutor where serialization does not happen, and executor-specific apps are fine - see, for example, the concept of join apps, which must run in the main DFK process in a ThreadPoolExecutor. However, in certain test case orders, the `__wrapped__` value of remote_side_bash_executor points to that "bad" bash_app, and when that has happened, remote_side_bash_executor cannot be serialized as part of an app invocation to a remote worker in a different (for example, htex-using) test. This PR removes that update_wrapper, causing a few changes: because __wrapped__ is now not set, the function for the last-decorated bash_app is no longer sent along side every invocation of every other bash_app. This removes the error in PR #3489. Because the __name__ of remote_side_bash_executor is no longer mutated, the default pickle pass-by-name serialization can happen, as pickle is able to assume that it can import the function as a global on the remote side rather than sending the modified definition of remote_side_bash_executor. These two changes result in a reduction of the serialized form of an example bash_app (measured in DillCallableSerializer.serialize) from 6940 bytes to 2305 bytes. Issue #3941 contains a feature request to look at those remaining 2305 bytes to see if anything else can be removed here. This change also removes some confusing repr() behaviour when debugging: when update_wrapper is used, any reference in reprs to remote_side_bash_executor are output as references to the most recently decorated bash_app. After this PR, when update_wrapper is removed, references are output correctly. Co-authored-by: Kevin Hunter Kesling <[email protected]>
Yes, I'm happy it in this state. |
@arhag23 Can you amend your description "In a previous PR, some fields where the default values of outputs and inputs for AppBase were removed." to link to the PR that you are talking about? |
I amended it. |
Description
In a previous PR (#3485), some fields where the default values of
outputs
andinputs
for AppBase were removed. These fields were not being used in any way. However, like the other parsl reserved parameters, the defaults should have been stored in theAppBase.kwargs
dict which is used to ensure that the default values of these special reserved parameters are ininvocation_kwargs
which is sent to the dataflow kernel and used to resolve the corresponding special behaviors. The correct behavior was only observed inBashApp
since it does additional checking to store all default args (even the non-reserved ones) toinvocation_kwargs
. Now it should be consistent for both types of apps.Changed Behaviour
PythonApp
will now correctly resolve default arguments for theinputs
andoutputs
special parameters.Type of change