-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug using AzureMLAssetDataset locally #147
Comments
An alternative for local development would be to set local catalogs under |
After a bit more digging into why when running kedro-azureml/kedro_azureml/cli.py Lines 41 to 55 in d5c2011
Which sets the catalog env location as |
Regarding the original question: Regarding the workaround using |
Hi Tomas, thanks for the response. For the workaround, I do not seem to be able to specify the environment using the
|
I guess this should be clarified in the docs, but the |
aha I see. Yes I agree this should definitely be clarified - that is an unusual functionality. Currently there isnt even a mention of the --env as an option in the docs, apart from a small mention on the Could I ask, why is the --env argument set up in this way? couldnt it be added as an argument to the |
You'd have to ask @marrrcin as he implemented this, but I'd guess it was because it's the easiest way of adding the option to every |
👆🏻👆🏻👆🏻 It's most likely that, but TBH it was super long time ago and I don't remember exactly :D This implementation is consistent across Getindata's Kedro plugins, e.g. https://github.com/getindata/kedro-vertexai/blob/5ee3304054dc1f913fb962ed1424d0fb42c7c08c/kedro_vertexai/cli.py#L36 BTW @robertmcleod2 there's an unwritten approach among Kedro users that |
When using the AzureMLAssetDataset it all works fine when deployed. However, I get an error locally when one pipeline outputs an AzureMLAssetDataset, and another pipeline tries to consume this asset. Here is a reproducible example:
The first pipeline:
The second pipeline:
and the catalog:
When running the first pipeline locally with
kedro run --pipeline test
, it creates a local file atdata/00_azurelocals/test_raw/local/test_raw.csv
. Then when running the second pipeline withkedro run --pipeline copy_test
, I get the following stack trace:So it seems like it is trying to find a version of the file on Azure, rather than using the local copy. When there is a version on Azure that exists, it puts the version number of the Dataset on azure in the directory path rather than
local
, i.e. it will look for a file atdata/00_azurelocals/test_raw/4/test_raw.csv
I'm not sure why it is trying to find the dataset on Azure, but I would expect the behaviour would be to just look at the local files instead. This error only happens when using an AzureMLAssetDataset as an input locally. Any help is appreciated, thanks.
The text was updated successfully, but these errors were encountered: