Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final Submission: Executable Tutorial - CI for ML using CML library #1879

Merged
merged 8 commits into from
May 6, 2022

Conversation

vishalned
Copy link

Assignment Proposal

Title

CI for ML using CML library

Names and KTH ID

Deadline

Task 3

Category

Executable tutorial

Description

CI is also becoming part of the deployment of ML models into production, in order to make the whole process smoother.
In this tutorial we would like to show how the CML library, built specifically to implement CI/CD in ML projects,
can be used to make CI much better, by implementing a pipeline that automatically trains the model when someone makes a PR, and returns a report
directly in the GitHub page with some graphs containing some useful metrics that are necessary to understand how well the training of the model is proceeding.

Final Submission

Katacoda link

Github Repo

@dd2482-bot
Copy link
Collaborator

Readme is not correctly formatted
Need exactly: ['Assignment Proposal', 'Title', 'Names and KTH ID', 'Deadline', 'Category', 'Description']

Got: ['Assignment Proposal', 'Title', 'Names and KTH ID', 'Deadline', 'Category', 'Description', 'Final Submission']

@vishalned
Copy link
Author

Feedback is still incoming for this.

@Neproxx
Copy link

Neproxx commented May 3, 2022

@vishalned May we start working on the feedback?

@vishalned
Copy link
Author

Yup please do.

@Neproxx
Copy link

Neproxx commented May 3, 2022

Feedback

High level feedback

Positive

  • Highly useful, interesting and easy to implement topic.
  • Easy to follow, as the tutorial focuses only on the aspects it wants to teach. As a consequence, the main point comes across and sticks.
  • Good and comprehensive explanations of flags / aspects encountered in a workflow file
  • Minimal effort for the user who wants to just understand and copy the code, although he may choose to type it himself if he feels the need to.

Negative

  • Typos and bugs
  • Using Katacoda is a little bit of an overkill, as you do not use the terminal. Instead, a readme in the Github repository would have been sufficient.

Typos

Although your English and grammar is obviously very good, we encountered several typos, which we highlight below:

  • Step 2: "which are lies of code that..."
  • Step 2: "...what matters is that the extension is json.". Note: the extension is in fact .yaml
  • Step 3: "...it is triggered anytime a some changes are pushed..."
  • Step 4: "...wheter or not you want you edits to be merged..."

Bugs

In step 4, we are given the following code snippet:

Echo "Model Metrics"
cat metrics.txt

There are two problems with it that cause the workflow to fail. Firstly, echo must not be capitalized. Secondly, main.py creates a file results.txt while the above code tries to print metrics.txt. Therefore, the code snippet should be changed to:

echo "Model Metrics"
cat results.txt

In step 5, the workflow failed for me, because of a permission problem that is related to the Github token. I fixed it by adding another property to the job:

jobs:
  run:
    permissions: write-all
    ...

You can find more info on this problem and how to solve it more elegantly here

Improvements

  • We think it would fit the tutorial well if you provided a bigger picture of how Github actions, workflows and the market place work together. Such an overview could have looked as follows: "In a workflow, a server is provided by Github on which code can be executed that is specified in the workflow file. You may either write the code there or create a re-usable action in another repository. You can publish this action for others on the so-called Github market place and also use other persons' actions."
  • A short explanation of Github environment variables and how they come into play in this tutorial would have been great, as the line repo_token: ${{ secrets.GITHUB_TOKEN }} was not further elaborated on.
  • In step 4, one of us got a little bit lost in what we were supposed to do. Please make it clearer that we should open a PR, not merge the PR and then we would see the workflow being executed. The jump from the PR explanation to the "You will probably see something like this:" is where you can easily get lost.
  • When opening a PR, the default base repository is the original repo i.e. yours, Vishal and Simone. Please give a hint that the user has to specify his own main branch.

Additional resources

  • In order to trigger workflows, we commit and push code that is really just debug code and intuitively bad practice. However, if we want to debug without messing up the commit history, there is a valuable tool called act that can be used to run workflows locally.
  • We recommend looking into data versioning approaches like dvc, as ML workflows can involve very large data sets that should not be stored in GitHub itself. A good idea would be to put it into a cloud bucket on e.g. AWS and add data versioning to it so that every training run is reproducible.
  • We have also found out that, instead of reporting the .png files inside the pull request, it is possible to integrate Tensorboard with CML as in (this example). This approach would lead to a cleaner and more complete view of the results of the model, on a proper Tensorboard webpage.
  • In your example, the training takes a relatively short time to execute. However, in most of the cases the training might require more time and more computational power, hence it needs to be executed using a cloud computing instance with several GPUs. For this reason, there is a great feature of CML that is called Advanced GPU Case. This feature allows you to connect your workflow to your preferred cloud computing provider, in order to run the training there. This will lead to a much faster computation of the results.

@vishalned
Copy link
Author

Thanks for the feedback. We have implemented the feedback as far as possible.

@javierron javierron self-assigned this May 6, 2022
@javierron javierron added final_submission The final submission of a task tutorial One of the task categories listed in README.md labels May 6, 2022
@javierron
Copy link
Collaborator

Thanks for the submission!

@javierron javierron merged commit f746252 into KTH:2022 May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final_submission The final submission of a task tutorial One of the task categories listed in README.md
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants