-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] Returning from multi-thread. TypeError: a bytes-like object is required, not 'dict' #15246
Comments
I tried another parallel mechanism and a similar error appers. The new code:
The error message:
|
I use |
I'm glad you were able to get the issue resolved in your case! That said, it does look like you're highlighting a real issue with using cudf.pandas objects in multiprocessing, so I'm going to reopen this issue for now. Here's a MWE for future investigation indicating that it's also sensitive to how the process is created. Since fork works while spawn does not, we're probably relying on some implicit state being preserved that is lost when a new process is spawned.
|
This problem exhibits because when using Consider: import sys
from concurrent.futures import ProcessPoolExecutor as Pool
from multiprocessing import set_start_method
def f():
print(sys.meta_path)
def main():
for method in ['fork', 'spawn', 'forkserver']:
print(method)
set_start_method(method, force=True)
with Pool(max_workers=1) as pool:
result = pool.submit(f).result()
if __name__ == "__main__":
main() When run with
The way one can work around this is to use the functional interface to cudf.pandas and install manually at the start of the file. Note that this must be done before an import of pandas. So:
Will work for all three cases. |
We should probably add this as a known limitation in the FAQ. |
We need to arrange that cudf.pandas.install() is run on the workers, this requires that we programmatically install the metapath loader in our script. Unfortunately, passing an initializer function to the pool startup is not sufficient if any part of the script transitively loads pandas at the top level. - Closes rapidsai#15246
We need to arrange that cudf.pandas.install() is run on the workers, this requires that we programmatically install the metapath loader in our script. Unfortunately, passing an initializer function to the pool startup is not sufficient if any part of the script transitively loads pandas at the top level. - Closes #15246 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #15940
When running my code with
cudf
, I gotTypeError: a bytes-like object is required, not 'dict'
in the multi-thread returning part.-m cudf.pandas
option is fine.This is the code message:
Here is my code.
Relevant dependencies:
The text was updated successfully, but these errors were encountered: