Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieving pipeline id in cicd pipelines #76

Closed
Gabriel2409 opened this issue Sep 22, 2023 · 4 comments
Closed

Retrieving pipeline id in cicd pipelines #76

Gabriel2409 opened this issue Sep 22, 2023 · 4 comments

Comments

@Gabriel2409
Copy link
Contributor

I am having a bit of trouble registering a model after launching a pipeline with kedro azureml.
Indeed, to find the actual id of the pipeline that trained the model, I currently take the most recent (with a few additional filters to make sure i get the correct one) but I don't think that is a great solution.

I noticed that the pipeline id is available in AzureMLPipelinesClient.run (it is just pipeline_job.name).

Is there an intelligent way to retrieve it?

I could imagine logging it to the output then do something like
pipeline_text=$(kedro azureml run ...)
and then filter pipeline_text to get the actual value but it seems very ugly.
Plus you would need to make sure your pipeline log level is set correctly so I don't think it is ideal.

@marrrcin
Copy link
Contributor

There's a callback that allows to plug-in some behaviour after the job is scheduled (on_job_scheduled):

lambda job: click.echo(job.studio_url),

I think you can write your own CLI for Kedro, that will replicate what is being done in our kedro azureml run - from there you can use this callback to e.g. save the id you want to have.

Another way would be saving the pipeline id from within the pipeline (as a Kedro dataset) and read it later from CICD pipeline. I think it's available in some AzureML-set environment variable.

@Gabriel2409
Copy link
Contributor Author

Thanks @marrrcin this seems like a great solution.

I think I can do even simpler as the job studio studio url contains the pipeline name (which i did not notice before):
https://ml.azure.com/runs/<pipeline_name>?wsid=/subscriptions/.... so we can extract it here

However, would you also be open to a modification of the callback so that the intent is a bit clearer? I was thinking of something

like

lambda job: click.echo(f"AzureML Studio URL: {job.studio_url}"), 

or maybe using a more detailed callback function such as:

def echo_job_info(job):
    click.echo(f"Job studio url: {job.studio_url}")
    click.echo(f"Azure ML Pipeline name: {job.name}")

@marrrcin
Copy link
Contributor

Sure :)

@Gabriel2409
Copy link
Contributor Author

Closing the issue following merge of #78

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants