In this code pattern, we will learn how to automatically generate a Jupyter notebook that will contain Python code of a machine learning model using AutoAI. We will explore, modify and retrain this model pipeline using python code. Lastly, we will deploy this model in Watson Machine Learning using WML APIs.
AutoAI is a graphical tool available within Watson Studio that analyzes your dataset, generates several model pipelines and ranks them based on the metric chosen for the problem. This code pattern shows extended features of AutoAI. More basic AutoAI exploration for the same dataset is covered in the Generate machine learning model pipelines to choose the best model for your problem article.
When you have completed this code pattern, you will understand how to:
- Run an AutoAI experiment.
- Generate and save a Python notebook.
- Execute notebook and analyse results.
- Make changes and retrain model using Watson Machine Learning SDKs.
- Deploy model using Watson Machine Learning from within notebook .
- The user submits an AutoAI experiment using default settings.
- Multiple pipeline models are generated. A pipeline model of choice from the leaderboard is saved as Jupyter notebook.
- The Jupyter notebook is executed and a modified pipeline model is generated within the notebook.
- Pipeline model is deployed in Watson Machine Learning using WML APIs.
- IBM Watson Studio - IBM Watson® Studio helps data scientists and analysts prepare data and build models at scale across any cloud.
- IBM Watson Machine Learning - IBM Watson® Machine Learning helps data scientists and developers accelerate AI and machine-learning deployment.
- Machine Learning - Science of predicting values by analysing historic data.
- Python - Python is an interpreted, object-oriented, high-level programming language.
- Jupyter notebook - Open-source web application to help build live code.
- scikit-learn - Python based machine learning library.
- lale - Python library compatible with scikit-learn for semi-automated data science used in AutoAI SDK
- IBM Cloud account This code pattern assumes you have an IBM Cloud account. Sign up for a no-charge trial account - no credit card required.
Instructions to get through the list of prerequistes are covered in this prequel.
- Create a Cloud Object Storage service instance.
- Create a Watson Studio service instance.
- Create a Watson Machine Learning service instance.
- Create a Watson Studio project and load data.
- Open the project created within Watson Studio. Click
Add to project +
button on the right top and then clickAutoAI Experiment
. - Give the experiment a name(Credit Risk Analysis), associate a Watson Machine Learning service from the drop down and click
Create
. - On the Add data source screen, click
Select from project
and check german_credit_data.csv and clickSelect asset
. - Under the Configure details section, Click on the
What do you want to predict?
drop down and selectResult
from the list. If you are using a different dataset, select the column that you want AutoAI to run predictions on. ClickRun experiment
on the right bottom.
You will see a notification that indicates AutoAI experiment has started. Depending on the size of the data set, this step will take a few minutes to complete.
The experiment notebook
provides annotated code so you can:
- Interact with trained model pipelines
- Access model details programmatically (including feature importance and machine learning metrics)
- Visualize each pipeline as a graph, with each node documented, to provide transparency
- Download selected pipelines and test locally
- Create a deployment and score the model
- Get the experiment configuration, which you can use for automation or integration with other applications
To generate an experiment notebook, perform the following steps :
-
Once the AutoAI experiment completes, click on the
Save experiment code
button indicated by the floppy icon. -
In the
Save experiment code
prompt, modify the default Name if needed and clickSave
. A pop up will show up that indicates that the notebook was saved successfully. You will now see this notebook under the Notebooks section within the the Assets tab.
Spend some time looking through the sections of the notebook to get an overview. A notebook is composed of text (markdown or heading) cells and code cells. The markdown cells provide comments on what the code is designed to do.
You will run cells individually by highlighting each cell, then either click the Run
button at the top of the notebook or hitting the keyboard short cut to run the cell (Shift + Enter but can vary based on platform). While the cell is running, an asterisk ([*]) will show up to the left of the cell. When that cell has finished executing a sequential number will show up (i.e. [17]). *
The notebook generated is pre filled with Python code and is divided into 4 main sections as follows.
This section contains credentials to Cloud Object Storage through which the current AutoAI pipeline is retrieved. The cell contains code prefilled to extract the training data used to create the pipeline and the pipeline results.
Also this section contains the metadata of the current pipeline that were used to run the experiment.
To be able to access the WML instance, the user will need to generate an api key through the cloud account and paste it in the cell as shown in the cell below. The instructions to acquire the cloud api key is described in the markdown section of the screenshot shown below.
To compare all the pipelines that gets generated, call the summary()
method on the pipeline object. The best performing model is stored under the best_pipeline_name
variable
By passing the variable name within the get_pipeline()
method, all the feature importance generated with that particular pipeline is listed.
Within this section of the notebook, there is code to visualize the stages within the model as graph using Watson Machine Learning's AutoAI APIs.
This section also contains code that extracts the current model and prints it as Python code.
This section of the notebook contains code that deploys the pipeline model as a web service using Watson Machine Learning. This section requires users to enter credentials to be able to identify the right WML instance and deployment space.
To create a deployment space and get the target_space_id:
- Click on the hamburger menu on the top-left corner of the Watson Studio home page.
- Click on
Deployment Spaces
from the list and selectView all spaces
- Click
New deployment space
, selectCreate an empty space
option.- Provide a name, select a machine learning service that was previously created and click
Create
- Click
View new space
and switch to theSettings
tab and copy thespace id
Acquire the target_space_id as shown in the steps above and paste within the create deployment section. The Watson Machine Learning API uses the wml_credentials
and the target_space_id
to deploy the machine learning model as a web service.
Once the cells are executed, the model is promoted to the deployment space and is now available as a web service and can be verified from within the UI as shown below.
Scoring the web service can either be done from the UI by switching to the test
tab shown in the screenshot above. Alternatively, the score() method from the WML API can be be used to submit a sample test payload. The results are returned as shown in the screenshot below.
The pipeline notebook
provides annotated code that allow you to:
- View the Scikit-learn pipeline definition
- See the transformations applied for pipeline training
- Preview hyper-parameters values found in HPO phase
- Review the pipeline evaluation
- Refine the pipeline definition
- Re-fit and re-evaluate
To generate a pipeline notebook, perform the following steps :
-
Hover over the pipeline that you wish to save as notebook and click on the
Save as
dropdown on the right side and selectNotebook
from the drop down. -
In the
Save as
prompt, you will notice that there are two types of assets that can be generated, namely Model and Notebook. We will select theNotebook
option. -
From the
Define details
section on the right, change the default Name if needed and clickCreate
. A pop up will show up that indicates that the notebook was saved successfully. You will now see this notebook under the Notebooks section within the the Assets tab.
Edit and execute each of the cells as shown in section 2.0
. The note contains more information in its markdown cells.
This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.