-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX-#4507: Do not call 'ray.get()' inside of the kernel executing call queues #6633
Conversation
…executing call queues Signed-off-by: Dmitry Chigarev <[email protected]>
@@ -66,6 +66,43 @@ def __init__(self, data, length=None, width=None, ip=None, call_queue=None): | |||
) | |||
) | |||
|
|||
@staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want us to keep the code for Ray engine in sync with unidist. If we enable unidist on Ray, then we will have an issue similar to one this PR resolves. So please add the respective changes to unidist engine. Perf measurements would also be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, I can't test this with unidist because of modin-project/unidist#354
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was fixed on unidist side and the fix will be available in unidist 0.5.0. I suppose you could test the changes without .wait method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also release unidist 0.4.2 with that fix if it is really necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also release unidist 0.4.2 with that fix if it is really necessary.
I think I'll just test with the master branch of unidist, so no urgent for the release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think it makes sense to port them to the unidist backend even if the problem is not reproducible with unidist?
I think it makes sense to port the changes to the unidist backend because as I said if we enable unidist on Ray in future, then we will have an issue similar to one this PR resolves.
I've just looked at the code of unidist partitioning, and it looks 100% identical to the Ray's partitions code. Maybe it's better to come-up with some unified classes for both ray and unidist instead of copying the code?
We already had a similar discussion when integrating unidist into Modin. We ported changes from Ray engine to unidist and just replaced ray.* call to unidist.* calls. If we came up with some unified classes for both ray and unidist instead of copying the code, I assume we would have criss-crossed dependencies for Ray and unidist, which we don't want. Since unidist has Ray-like API, it looks like code duplication. If unidist had different API from Ray, we wouldn't see code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... it is strange. Ok, let's then file an issue for porting the changes to unidist engine and also describe the perf problem there. For this PR we will leave the change for Ray only. Thanks for the measurements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also tried to compare these two approaches with pure unidist and have been able to reproduce the deadlock caused by calling unidist.get()
inside a kernel... Strange that it didn't happen with modin
import unidist
import pandas as pd
from timeit import default_timer as timer
import warnings
unidist.init()
def gen_df():
return pd.DataFrame(range(10))
@unidist.remote
def func1(*args):
return args
@unidist.remote
def func2(args):
res = []
for a in args:
res.append(unidist.get(a))
return res
remote_objs = [unidist.put(gen_df()) for _ in range(3)]
warnings.warn("running func1...")
t1 = timer()
unidist.wait(func1.remote(*remote_objs)) # works fine
print(timer() - t1)
remote_objs = [unidist.put(gen_df()) for _ in range(3)]
warnings.warn("running func2...")
t1 = timer()
unidist.wait(func2.remote(remote_objs)) # deadlock
print(timer() - t1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange that it didn't happen with modin
unidist on MPI internally deconstructs all nested object refs and pass those into a kernel so data is materialized in the kernel. However, your example looks as a problem in unidist anyway. Can you file an issue in unidist. We will take a look at it.
@modin-project/modin-core, please take a look |
modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition.py
Show resolved
Hide resolved
modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition.py
Outdated
Show resolved
Hide resolved
modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition.py
Show resolved
Hide resolved
modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition.py
Outdated
Show resolved
Hide resolved
for func, f_args, f_kwargs in call_queue: | ||
func = deserialize(func) | ||
args = deserialize(f_args) | ||
kwargs = deserialize(f_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we are passing anything at all through this parameter that could lead to a hang? I believe objects created explicitly using ray.put
should work correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we are passing anything at all through this parameter that could lead to a hang?
I think that we're indeed not passing anything like that via func
nor via f_kwargs
, but I thought that it might be a good idea to unify the approach across f_args, funcs and f_kwargs
to avoid such problems in the future
Signed-off-by: Dmitry Chigarev <[email protected]>
Signed-off-by: Dmitry Chigarev <[email protected]>
Signed-off-by: Dmitry Chigarev <[email protected]>
was_iterable = False | ||
value = [value] | ||
unfolded_queue.extend(value) | ||
value_lengths.append({"len": len(value), "was_iterable": was_iterable}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe using tuple (len(value), was_iterable)
here will be faster. Consider only if unboxing/packing performance is not sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, it may be faster for a couple of nanoseconds, but I would prefer readability of the code to saving not sufficient amount of time here
Signed-off-by: Dmitry Chigarev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…executing call queues (modin-project#6633) Signed-off-by: Dmitry Chigarev <[email protected]>
What do these changes do?
The PR was brought to eliminate
ray.get()
calls from inside the ray kernel that executes the partitions call queue, which sometimes triggered a deadlock. Before this PR, materialization of all the futures was explicitly triggered inside the kernel usingray.get()
, and now all the futures are being passed to the kernel as a variable-length parameter, allowing for Ray to materialize them implicitly beforehand.Testing performance shows no major difference for other cases:
case 1 (getitem + transpose); 29 elements in queue
case 2 (various funcs); 4 elements in the queue
case 3 (various funcs, including those that has modin df as a parameter); 6 elements in the queue
case 4 (various funcs, including those that has modin df as a parameter); 60 elements in the queue
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-date