You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have encountered for some versioned datasets that Kedro throws an error kedro.io.core.DatasetError: Cannot save versioned dataset, even if there is no not-versioned dataset with the same name in the expected path. It actually creates the folder where to save the versioned dataset. In the image you can see the created folder that causes the error and a similar versioned dataset
Context
This error prevents me from being able to save some versioned datasets.
The containing folder is created but it raises and error instead of creating the dataset
INFO Saving data to data_catalog.py:525
X_train_batched_energy_pca_as_target_dataset_preprocessed
(PickleDataset)...
Traceback (most recent call last):
File "C:\Path\to\my\windows\python\lib\site-packages\kedro\io\core.py", line 614, in save
super().save(data)
File "C:\Path\to\my\windows\python\lib\site-packages\kedro\io\core.py", line 214, in save
self._save(data)
File "C:\Path\to\my\windows\python\lib\site-packages\kedro_datasets\pickle\pickle_dataset.py", line 225, in _save
with self._fs.open(save_path, **self._fs_open_args_save) as fs_file:
File "C:\Path\to\my\windows\python\lib\site-packages\fsspec\spec.py", line 1295, in open
f = self._open(
File "C:\Path\to\my\windows\python\lib\site-packages\fsspec\implementations\local.py", line 180, in _open
return LocalFileOpener(path, mode, fs=self, **kwargs)
File "C:\Path\to\my\windows\python\lib\site-packages\fsspec\implementations\local.py", line 302, in __init__
self._open()
File "C:\Path\to\my\windows\python\lib\site-packages\fsspec\implementations\local.py", line 307, in _open
self.f = open(self.path, mode=self.mode)
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Path/to/my/kedroproject/energy-market-forecast/data/05_model_input/X_train_batched_energy_pca_as_target_dataset_preprocessed.pkl/2024-02-19T08.33.20.180Z/X_train_batched_energy_pca_as_target_dataset_preprocessed.pkl'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Path\to\my\windows\python\lib\site-packages\kedro\runner\sequential_runner.py", line 75, in _run
run_node(node, catalog, hook_manager, self._is_async, session_id)
File "C:\Path\to\my\windows\python\lib\site-packages\kedro\runner\runner.py", line 331, in run_node
node = _run_node_sequential(node, catalog, hook_manager, session_id)
File "C:\Path\to\my\windows\python\lib\site-packages\kedro\runner\runner.py", line 444, in _run_node_sequential
catalog.save(name, data)
File "C:\Path\to\my\windows\python\lib\site-packages\kedro\io\data_catalog.py", line 532, in save
dataset.save(data)
File "C:\Path\to\my\windows\python\lib\site-packages\kedro\io\core.py", line 618, in save
raise DatasetError(
kedro.io.core.DatasetError: Cannot save versioned dataset 'X_train_batched_energy_pca_as_target_dataset_preprocessed.pkl' to 'C:/Path/to/my/kedroproject/energy-market-forecast/data/05_model_input' because a file with the same name already exists in the directory. This is likely because versioning was enabled on a dataset already saved previously. Either remove 'X_train_batched_energy_pca_as_target_dataset_preprocessed.pkl' from the directory or manually convert it into a versioned dataset by placing it in a versioned directory (e.g. with default versioning format 'C:/Path/to/my/kedroproject/energy-market-forecast/data/05_model_input/X_train_batched_energy_pca_as_target_dataset_preprocessed.pkl/YYYY-MM-DDThh.mm.ss.sssZ/X_train_batched_energy_pca_as_target_dataset_preprocessed.pkl').
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Path\to\my\windows\python\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Path\to\my\windows\python\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Path\to\my\windows\python\Scripts\kedro.exe\__main__.py", line 7, in <module>
File "C:\Path\to\my\windows\python\lib\site-packages\kedro\framework\cli\cli.py", line 198, in main
cli_collection()
File "C:\Path\to\my\windows\python\lib\site-packages\click\core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "C:\Path\to\my\windows\python\lib\site-packages\kedro\framework\cli\cli.py", line 127, in main
super().main(
File "C:\Path\to\my\windows\python\lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
File "C:\Path\to\my\windows\python\lib\site-packages\click\core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Path\to\my\windows\python\lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Path\to\my\windows\python\lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "C:\Path\to\my\windows\python\lib\site-packages\kedro\framework\cli\project.py", line 225, in run
session.run(
File "C:\Path\to\my\windows\python\lib\site-packages\kedro\framework\session\session.py", line 392, in run
run_result = runner.run(
File "C:\Path\to\my\windows\python\lib\site-packages\kedro\runner\runner.py", line 117, in run
self._run(pipeline, catalog, hook_or_null_manager, session_id) # type: ignore[arg-type]
File "C:\Path\to\my\windows\python\lib\site-packages\kedro\runner\sequential_runner.py", line 78, in _run
self._suggest_resume_scenario(pipeline, done_nodes, catalog)
File "C:\Path\to\my\windows\python\lib\site-packages\kedro\runner\runner.py", line 206, in _suggest_resume_scenario
start_p_persistent_ancestors = _find_persistent_ancestors(
File "C:\Path\to\my\windows\python\lib\site-packages\kedro\runner\runner.py", line 249, in _find_persistent_ancestors
if _has_persistent_inputs(current_node, catalog):
File "C:\Path\to\my\windows\python\lib\site-packages\kedro\runner\runner.py", line 290, in _has_persistent_inputs
if isinstance(catalog._datasets[node_input], MemoryDataset):
KeyError: 'pca_target_regression.trained_pca_target_regression'
Your Environment
Kedro version used (pip show kedro or kedro -V): kedro, version 0.19.2
Python version used (python -V): Python 3.10.13
Operating system and version: Microsoft Windows [Version 10.0.22621.1928]
Thank you for your help and your work, I really like Kedro!
The text was updated successfully, but these errors were encountered:
I am not 100% sure, but it is related to the length of the filename since it works when changing to short names but fails equally with random long names. Maybe the thrown error should be more explicit on this subject.
Description
I have encountered for some versioned datasets that Kedro throws an error
kedro.io.core.DatasetError: Cannot save versioned dataset
, even if there is no not-versioned dataset with the same name in the expected path. It actually creates the folder where to save the versioned dataset. In the image you can see the created folder that causes the error and a similar versioned datasetContext
This error prevents me from being able to save some versioned datasets.
Steps to Reproduce
Expected Result
Not raising the error and creating the dataset
Actual Result
The containing folder is created but it raises and error instead of creating the dataset
Your Environment
pip show kedro
orkedro -V
): kedro, version 0.19.2python -V
): Python 3.10.13Thank you for your help and your work, I really like Kedro!
The text was updated successfully, but these errors were encountered: