-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Viz hook is broken with ParallelRunner [Blocked by Framework] #1801
Comments
This is very similar to #1797 |
leave a comment here, this is a specific case for |
To clarify, is this a fundamental limitation of |
I don't think it's a So I'd say it's a hook implementation problem, but it's also a general case because I think most kedro plugins would break with So it's an interesting problem, we should probably fix it in See also: (edited: or Hey! Let's wait for GIL removal and pray Python work well with multiprocessing in the future) |
Hi @noklam , Thanks for raising the issue. In the steps to reproduce - create a new project and run - Do you have any starter project where we can run the pipeline completely using INFO Running node: train_model_node: train_model([X_train;y_train]) -> [regressor] node.py:340
ERROR Node train_model_node: train_model([X_train;y_train]) -> failed with error: node.py:365
cannot set WRITEABLE flag to True of this array
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/runner/parallel_runner.py", line 91, in _run_node_synchronization
return run_node(node, catalog, hook_manager, is_async, session_id)
File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/runner/runner.py", line 331, in run_node
node = _run_node_sequential(node, catalog, hook_manager, session_id)
File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/runner/runner.py", line 424, in _run_node_sequential
outputs = _call_node_run(
File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/runner/runner.py", line 390, in _call_node_run
raise exc
File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/runner/runner.py", line 380, in _call_node_run
outputs = node.run(inputs)
File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/pipeline/node.py", line 371, in run
raise exc
File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/pipeline/node.py", line 357, in run
outputs = self._run_with_list(inputs, self._inputs)
File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/pipeline/node.py", line 402, in _run_with_list
return self._func(*(inputs[item] for item in node_inputs))
File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/spaceflights-pandas/src/spaceflights_pandas/pipelines/data_science/nodes.py", line 38, in train_model
regressor.fit(X_train, y_train)
File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/sklearn/base.py", line 1474, in wrapper
return fit_method(estimator, *args, **kwargs)
File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/sklearn/linear_model/_base.py", line 578, in fit
X, y = self._validate_data(
File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/sklearn/base.py", line 650, in _validate_data
X, y = check_X_y(X, y, **check_params)
File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/sklearn/utils/validation.py", line 1279, in check_X_y
y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric, estimator=estimator)
File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/sklearn/utils/validation.py", line 1289, in _check_y
y = check_array(
File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/sklearn/utils/validation.py", line 1097, in check_array
array.flags.writeable = True
ValueError: cannot set WRITEABLE flag to True of this array
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/bin/kedro", line 8, in <module>
sys.exit(main())
File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/framework/cli/cli.py", line 233, in main
cli_collection()
File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/framework/cli/cli.py", line 130, in main
super().main(
File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/framework/cli/project.py", line 225, in run
session.run(
File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/framework/session/session.py", line 395, in run
run_result = runner.run(
File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/runner/runner.py", line 117, in run
self._run(pipeline, catalog, hook_or_null_manager, session_id) # type: ignore[arg-type]
File "/Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro/kedro/runner/parallel_runner.py", line 314, in _run
node = future.result()
File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/concurrent/futures/_base.py", line 433, in result
return self.__get_result()
File "/Users/Ravi_Kumar_Pilla/opt/anaconda3/envs/kedro-viz-py39/lib/python3.9/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
ValueError: cannot set WRITEABLE flag to True of this array |
There is no specific starter, it should work with any of it. I believe the CI also run this as an end to end test. This maybe a scikit learn problem, can you try downgrade the library? |
Hi @noklam As discussed, I will be moving this ticket to backlog, as we cannot access the SyncManager instance from the hooks to register a shared dict with the manager that is started with ParallelRunner. So, we need some way of exposing the manager (either through the catalog or runner in Kedro) and make it mutable for the custom hooks. Note: For now, the DatasetStatsHook in Kedro-Viz works for Sequential Runner. Thank you |
Opened a discussion about that kedro-org/kedro#3776 |
We discussed a similar ticket in the framework grooming (kedro-org/kedro#4078). We decided that it requires more investigation on the Framework side. For the time being it was suggested we can lower the logging level to |
Description
Short description of the problem here.
Context
How has this bug affected you? What were you trying to accomplish?
Left: ParallelRunner
Right: SequentialRunner
Steps to Reproduce
create a new project and run
kedro run --runner=ParallelRunner
Expected Result
Tell us what should happen.
No warnings from viz
Actual Result
Tell us what happens instead.
warnings
datasets
does not existYour Environment
Include as many relevant details as possible about the environment you experienced the bug in:
Checklist
The text was updated successfully, but these errors were encountered: