[WIP] Azureml v2-datasets local execution using fsspec #61
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi all,
as mentioned by @tomasvanpottelbergh in kedro-org/kedro-plugins#60 here is a proposal on enabling local execution of kedro pipelines that have aml datasets in the catalog as input and intermediate datasets. In essence it converts all AML datasets that are not root input datasets to a pipeline run into Pickle datasets and saves them in a
local-run
folder. It currently depends on "activating"azureml-fsspec
inkedro-datasets
see this comment: kedro-org/kedro#4314 but other than that works for me in some local tests.As mentioned in the other PR the guiding idea here is that during local execution there is only
read
and nowrite
to AML datasets. This is to "guarantee" proper metadata flow and traceability. There would be an option to also upload from local runs but this would be more involved and should maybe be controlled via a CLI argument? Still probably prefer local executions not overwriting AML data.This can easily be adopted to work with @tomasvanpottelbergh PR for AML folder datasets as well. Looking forward to some thoughts.