Final Submission: Executable Tutorial - CI for ML using CML library #1879

vishalned · 2022-04-30T12:20:08Z

Assignment Proposal

Title

CI for ML using CML library

Names and KTH ID

Simone Bonato ([email protected])
Vishal Nedungadi ([email protected])

Deadline

Task 3

Description

CI is also becoming part of the deployment of ML models into production, in order to make the whole process smoother.
In this tutorial we would like to show how the CML library, built specifically to implement CI/CD in ML projects,
can be used to make CI much better, by implementing a pipeline that automatically trains the model when someone makes a PR, and returns a report
directly in the GitHub page with some graphs containing some useful metrics that are necessary to understand how well the training of the model is proceeding.

Final Submission

Katacoda link

Github Repo

dd2482-bot · 2022-04-30T12:21:00Z

Readme is not correctly formatted
Need exactly: ['Assignment Proposal', 'Title', 'Names and KTH ID', 'Deadline', 'Category', 'Description']

Got: ['Assignment Proposal', 'Title', 'Names and KTH ID', 'Deadline', 'Category', 'Description', 'Final Submission']

vishalned · 2022-04-30T12:41:33Z

Feedback is still incoming for this.

Neproxx · 2022-05-03T09:17:33Z

@vishalned May we start working on the feedback?

vishalned · 2022-05-03T11:35:48Z

Yup please do.

Neproxx · 2022-05-03T17:01:42Z

Feedback

High level feedback

Positive

Highly useful, interesting and easy to implement topic.
Easy to follow, as the tutorial focuses only on the aspects it wants to teach. As a consequence, the main point comes across and sticks.
Good and comprehensive explanations of flags / aspects encountered in a workflow file
Minimal effort for the user who wants to just understand and copy the code, although he may choose to type it himself if he feels the need to.

Negative

Typos and bugs
Using Katacoda is a little bit of an overkill, as you do not use the terminal. Instead, a readme in the Github repository would have been sufficient.

Typos

Although your English and grammar is obviously very good, we encountered several typos, which we highlight below:

Step 2: "which are lies of code that..."
Step 2: "...what matters is that the extension is json.". Note: the extension is in fact .yaml
Step 3: "...it is triggered anytime a some changes are pushed..."
Step 4: "...wheter or not you want you edits to be merged..."

Bugs

In step 4, we are given the following code snippet:

Echo "Model Metrics"
cat metrics.txt

There are two problems with it that cause the workflow to fail. Firstly, echo must not be capitalized. Secondly, main.py creates a file results.txt while the above code tries to print metrics.txt. Therefore, the code snippet should be changed to:

echo "Model Metrics"
cat results.txt

In step 5, the workflow failed for me, because of a permission problem that is related to the Github token. I fixed it by adding another property to the job:

jobs:
  run:
    permissions: write-all
    ...

You can find more info on this problem and how to solve it more elegantly here

Improvements

We think it would fit the tutorial well if you provided a bigger picture of how Github actions, workflows and the market place work together. Such an overview could have looked as follows: "In a workflow, a server is provided by Github on which code can be executed that is specified in the workflow file. You may either write the code there or create a re-usable action in another repository. You can publish this action for others on the so-called Github market place and also use other persons' actions."
A short explanation of Github environment variables and how they come into play in this tutorial would have been great, as the line repo_token: ${{ secrets.GITHUB_TOKEN }} was not further elaborated on.
In step 4, one of us got a little bit lost in what we were supposed to do. Please make it clearer that we should open a PR, not merge the PR and then we would see the workflow being executed. The jump from the PR explanation to the "You will probably see something like this:" is where you can easily get lost.
When opening a PR, the default base repository is the original repo i.e. yours, Vishal and Simone. Please give a hint that the user has to specify his own main branch.

Additional resources

In order to trigger workflows, we commit and push code that is really just debug code and intuitively bad practice. However, if we want to debug without messing up the commit history, there is a valuable tool called act that can be used to run workflows locally.
We recommend looking into data versioning approaches like dvc, as ML workflows can involve very large data sets that should not be stored in GitHub itself. A good idea would be to put it into a cloud bucket on e.g. AWS and add data versioning to it so that every training run is reproducible.
We have also found out that, instead of reporting the .png files inside the pull request, it is possible to integrate Tensorboard with CML as in (this example). This approach would lead to a cleaner and more complete view of the results of the model, on a proper Tensorboard webpage.
In your example, the training takes a relatively short time to execute. However, in most of the cases the training might require more time and more computational power, hence it needs to be executed using a cloud computing instance with several GPUs. For this reason, there is a great feature of CML that is called Advanced GPU Case. This feature allows you to connect your workflow to your preferred cloud computing provider, in order to run the training there. This will lead to a much faster computation of the results.

vishalned · 2022-05-03T19:31:13Z

Thanks for the feedback. We have implemented the feedback as far as possible.

javierron · 2022-05-06T11:10:57Z

Thanks for the submission!

vishalned added 7 commits April 16, 2022 23:40

added Final essay submission

fabf523

Delete devops-essay-MLOps-past-present-future.pdf

8d7af3d

Added Final essay submission

04a018f

Delete devops-essay-MLOps-past-present-future.pdf

a00fbef

Add files via upload

b5c9c65

Merge branch 'KTH:2022' into 2022

cc17b0f

Update README.md

a4f6a4f

Update README.md

ad7a023

Neproxx mentioned this pull request May 1, 2022

Add feedback proposal for tutorial "CI for ML using CML library" #1880

Merged

javierron self-assigned this May 6, 2022

javierron added final_submission The final submission of a task tutorial One of the task categories listed in README.md labels May 6, 2022

javierron merged commit f746252 into KTH:2022 May 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Final Submission: Executable Tutorial - CI for ML using CML library #1879

Final Submission: Executable Tutorial - CI for ML using CML library #1879

vishalned commented Apr 30, 2022

dd2482-bot commented Apr 30, 2022

vishalned commented Apr 30, 2022

Neproxx commented May 3, 2022

vishalned commented May 3, 2022

Neproxx commented May 3, 2022 •

edited

Loading

vishalned commented May 3, 2022

javierron commented May 6, 2022

Final Submission: Executable Tutorial - CI for ML using CML library #1879

Final Submission: Executable Tutorial - CI for ML using CML library #1879

Conversation

vishalned commented Apr 30, 2022

Assignment Proposal

Title

Names and KTH ID

Deadline

Category

Description

Final Submission

dd2482-bot commented Apr 30, 2022

vishalned commented Apr 30, 2022

Neproxx commented May 3, 2022

vishalned commented May 3, 2022

Neproxx commented May 3, 2022 • edited Loading

Feedback

High level feedback

Positive

Negative

Typos

Bugs

Improvements

Additional resources

vishalned commented May 3, 2022

javierron commented May 6, 2022

Neproxx commented May 3, 2022 •

edited

Loading