From 3ddf68fe40d4a18418cb70daa5de1bc2618cb1f7 Mon Sep 17 00:00:00 2001 From: Ben Clifford Date: Tue, 18 Jun 2024 18:18:00 +0000 Subject: [PATCH] Remove incorrectly-global-mutating update_wrapper in bash_app tl;dr bash_app unexpectedly mutates the global remote_side_bash_executor function, which is bad. This PR makes it not do that. This addresses behaviour subtle enough we've gone years without noticing but is now causing problems for a test introduced in apparently-unrelated PR #3489. What was happening before this PR, in the removed line is that update_wrapper is being used in two ways: i) Deliberately, a functional style: return a remote_side_bash_executor that looks like self.func and ii) Accidentally, modify the global remote_side_bash_executor to look like self.func That second step is problematic: it modifies a global object (the remote_side_bash_executor callable object) on every bash_app decoration, and so leaves it finally looking like the most recent bash_app decoration. For example: ``` $ cat uw1.py from functools import update_wrapper def a(): pass def b(): return 7 print("b looks like this:") print(repr(b)) print("update_wrapper of b to look like a:") print(update_wrapper(b, a)) print("b looks like this:") print(repr(b)) $ python3 uw1.py b looks like this: update_wrapper of b to look like a: b looks like this: ``` PR #3489 introduces a bash_app that cannot be serialized. That's fine in the context of that PR, because it only tries to run it in a ThreadPoolExecutor where serialization does not happen, and executor-specific apps are fine - see, for example, the concept of join apps, which must run in the main DFK process in a ThreadPoolExecutor. However, in certain test case orders, the `__wrapped__` value of remote_side_bash_executor points to that "bad" bash_app, and when that has happened, remote_side_bash_executor cannot be serialized as part of an app invocation to a remote worker in a different (for example, htex-using) test. This PR removes that update_wrapper, causing a few changes: because __wrapped__ is now not set, the function for the last-decorated bash_app is no longer sent along side every invocation of every other bash_app. This removes the error in PR #3489. Because the __name__ of remote_side_bash_executor is no longer mutated, the default pickle pass-by-name serialization can happen, as pickle is able to assume that it can import the function as a global on the remote side rather than sending the modified definition of remote_side_bash_executor. These two changes result in a reduction of the serialized form of an example bash_app (measured in DillCallableSerializer.serialize) from 6940 bytes to 2305 bytes. Issue #3941 contains a feature request to look at those remaining 2305 bytes to see if anything else can be removed here. This change also removes some confusing repr() behaviour when debugging: when update_wrapper is used, any reference in reprs to remote_side_bash_executor are output as references to the most recently decorated bash_app. After this PR, when update_wrapper is removed, references are output correctly. --- parsl/app/bash.py | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/parsl/app/bash.py b/parsl/app/bash.py index 4ab0add68b..36212c172f 100644 --- a/parsl/app/bash.py +++ b/parsl/app/bash.py @@ -1,5 +1,5 @@ import logging -from functools import partial, update_wrapper +from functools import partial from inspect import Parameter, signature from parsl.app.app import AppBase @@ -123,11 +123,10 @@ def __init__(self, func, data_flow_kernel=None, cache=False, executors='all', ig if sig.parameters[s].default is not Parameter.empty: self.kwargs[s] = sig.parameters[s].default - # update_wrapper allows remote_side_bash_executor to masquerade as self.func # partial is used to attach the first arg the "func" to the remote_side_bash_executor # this is done to avoid passing a function type in the args which parsl.serializer # doesn't support - remote_fn = partial(update_wrapper(remote_side_bash_executor, self.func), self.func) + remote_fn = partial(remote_side_bash_executor, self.func) remote_fn.__name__ = self.func.__name__ self.wrapped_remote_function = wrap_error(remote_fn)