Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blog: Converting a Jupyter Notebook to a DVC Project #3624

Closed
wants to merge 21 commits into from

Conversation

flippedcoder
Copy link
Contributor

@flippedcoder flippedcoder commented Jun 6, 2022

Tentative publish date: 07/19/22 07/28/22 08/24/22

Things left to update:

  • image (make sure to use https://tinypng.com/)
  • comments link
  • technical style (code blocks formatted correctly with dvc, python, yaml, etc)
  • update with main branch

@shcheklein shcheklein temporarily deployed to dvc-org-blog-vscode-dvc-8nffaj June 6, 2022 20:41 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-blog-vscode-dvc-8nffaj June 14, 2022 17:04 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-blog-vscode-dvc-8nffaj July 6, 2022 20:11 Inactive
@flippedcoder flippedcoder force-pushed the blog/vscode-dvc-tutorial branch from b37a154 to 9bc45ea Compare July 6, 2022 20:14
@shcheklein shcheklein temporarily deployed to dvc-org-blog-vscode-dvc-8nffaj July 6, 2022 20:15 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-blog-vscode-dvc-8nffaj July 18, 2022 21:11 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-blog-vscode-dvc-8nffaj July 19, 2022 13:53 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-blog-vscode-dvc-8nffaj July 19, 2022 18:57 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-blog-vscode-dvc-8nffaj July 19, 2022 19:00 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-blog-vscode-dvc-8nffaj July 21, 2022 19:37 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-blog-vscode-dvc-8nffaj July 21, 2022 19:52 Inactive
@flippedcoder flippedcoder marked this pull request as ready for review July 21, 2022 19:54
@github-actions
Copy link
Contributor

github-actions bot commented Jul 21, 2022

Link Check Report

All 6 links passed!

@jendefig jendefig added the C: blog TEMPORARY Content of /blog label Jul 27, 2022
metrics, and then save the model. That's what we're doing in the
`bicycle_experiments.ipynb` file.

![Jupyter notebook cells](/uploads/images/2022-07-28/jupyter-notebook.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about starting with the parameters inside the notebook instead of reading them from params.yaml? I know it's harder to then convert to DVC, but it feels a bit unrealistic to be starting in a notebook but already have a parameters file configured.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this was not addressed. And I'm finding that for some reason when the notebook opens, all the cells are in Markdown. I thought this was a problem in VS Code, but when I open in Jupyter in a browser it's the same. @RCdeWit we are going to have to figure out how to fix this notebook to make it more representative of real life.

Next, we'll make a `evaluate.py` file that will take a saved model and get the
metrics for how well it performs. This file will have the `Set test variables`,
`Load model and test data`, `Get model predictions`,
`Calculate model performance metrics`, and `Save model performance metrics`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to params.yaml above, it seems unlikely that someone is already saving the model and especially the performance metrics to a file if they are working in a notebook.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one also needs to be addressed in the notebook.

@dberenbaum
Copy link
Contributor

Great subject! I think it's worth spending a little time iterating on this to make it as helpful as possible since it's a common topic.

@rogermparent rogermparent temporarily deployed to dvc-org-blog-vscode-dvc-8nffaj August 1, 2022 19:34 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-blog-vscode-dvc-8nffaj August 8, 2022 20:59 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-blog-vscode-dvc-8nffaj August 9, 2022 20:55 Inactive
Copy link
Contributor

@casperdcl casperdcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth mentioning (as an alternative) https://papermill.readthedocs.io/en/latest/usage-execute.html#execute-via-cli, i.e. instead of:

python src/train.py ./data/ ./models/model.pkl

you do:

papermill src/train.ipynb -r data ./data/ -r model ./models/model.pkl

with a param cell in the src/train.ipynb notebook:

# parameters
data = "./data/"
model = "./models/model.pkl"

@shcheklein shcheklein had a problem deploying to dvc-org-blog-vscode-dvc-8nffaj August 16, 2022 19:27 Failure
Copy link
Contributor

@RCdeWit RCdeWit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. Overall I think it's a good post and clear to follow! 😄

extension, we can show you how to make those experiments reproducible with the
addition of the DVC VS Code extension.
picture: 2022-07-28/jupyter-to-dvc.png
pictureComment: Using the DVC VS Code Extension with a Jupyter Notebook
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pictureComment: Using the DVC VS Code Extension with a Jupyter Notebook
pictureComment: Using the DVC Extension for VS Code with a Jupyter Notebook


Each of these stages has a `cmd` that executes the Python scripts we wrote with
the required arguments. They both have defined dependencies in `deps` that let
DVC know what needs to be available for a stage to execute before it starts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it even more explicit, maybe add something along the lines here of

"As you can see, for example, the training stage is listed as a requirement for the evaluation stage. This ensures that the latter will only start once the first has been completed."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe as an admonition somewhere saying that when using dvc exp run only the stages downstream from your changes are triggered?

@shcheklein shcheklein had a problem deploying to dvc-org-blog-vscode-dvc-8nffaj August 19, 2022 21:20 Failure
@shcheklein shcheklein had a problem deploying to dvc-org-blog-vscode-dvc-8nffaj August 19, 2022 21:22 Failure
@shcheklein shcheklein temporarily deployed to dvc-org-blog-vscode-dvc-8nffaj August 19, 2022 21:24 Inactive
@yathomasi
Copy link
Contributor

Closing in favor of https://github.com/iterative/iterative.ai/pull/550.
We are thinking of migration once the blog is complete. It looks like main is already merged so the blog will not appear on the preview deployment.

@yathomasi yathomasi closed this Aug 22, 2022
@jendefig
Copy link
Contributor

@yathomasi I noticed as I was going over comments and checking Milecia's repo that her link to her blog post in this repo is broken. I thought they would redirect to the new site.

@yathomasi
Copy link
Contributor

The link was missing. I have added the link. Readme

@yathomasi yathomasi deleted the blog/vscode-dvc-tutorial branch July 11, 2023 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: blog TEMPORARY Content of /blog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants