-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blog: Converting a Jupyter Notebook to a DVC Project #3624
Conversation
b37a154
to
9bc45ea
Compare
Link Check Report
All 6 links passed! |
content/blog/2022-07-28-switching-to-dvc-from-jupyter-vscode.md
Outdated
Show resolved
Hide resolved
content/blog/2022-07-28-switching-to-dvc-from-jupyter-vscode.md
Outdated
Show resolved
Hide resolved
metrics, and then save the model. That's what we're doing in the | ||
`bicycle_experiments.ipynb` file. | ||
|
||
![Jupyter notebook cells](/uploads/images/2022-07-28/jupyter-notebook.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about starting with the parameters inside the notebook instead of reading them from params.yaml
? I know it's harder to then convert to DVC, but it feels a bit unrealistic to be starting in a notebook but already have a parameters file configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this was not addressed. And I'm finding that for some reason when the notebook opens, all the cells are in Markdown. I thought this was a problem in VS Code, but when I open in Jupyter in a browser it's the same. @RCdeWit we are going to have to figure out how to fix this notebook to make it more representative of real life.
Next, we'll make a `evaluate.py` file that will take a saved model and get the | ||
metrics for how well it performs. This file will have the `Set test variables`, | ||
`Load model and test data`, `Get model predictions`, | ||
`Calculate model performance metrics`, and `Save model performance metrics` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to params.yaml
above, it seems unlikely that someone is already saving the model and especially the performance metrics to a file if they are working in a notebook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one also needs to be addressed in the notebook.
content/blog/2022-07-28-switching-to-dvc-from-jupyter-vscode.md
Outdated
Show resolved
Hide resolved
Great subject! I think it's worth spending a little time iterating on this to make it as helpful as possible since it's a common topic. |
Co-authored-by: Rob de Wit <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth mentioning (as an alternative) https://papermill.readthedocs.io/en/latest/usage-execute.html#execute-via-cli, i.e. instead of:
python src/train.py ./data/ ./models/model.pkl
you do:
papermill src/train.ipynb -r data ./data/ -r model ./models/model.pkl
with a param cell in the src/train.ipynb
notebook:
# parameters
data = "./data/"
model = "./models/model.pkl"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. Overall I think it's a good post and clear to follow! 😄
extension, we can show you how to make those experiments reproducible with the | ||
addition of the DVC VS Code extension. | ||
picture: 2022-07-28/jupyter-to-dvc.png | ||
pictureComment: Using the DVC VS Code Extension with a Jupyter Notebook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pictureComment: Using the DVC VS Code Extension with a Jupyter Notebook | |
pictureComment: Using the DVC Extension for VS Code with a Jupyter Notebook |
|
||
Each of these stages has a `cmd` that executes the Python scripts we wrote with | ||
the required arguments. They both have defined dependencies in `deps` that let | ||
DVC know what needs to be available for a stage to execute before it starts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it even more explicit, maybe add something along the lines here of
"As you can see, for example, the training stage is listed as a requirement for the evaluation stage. This ensures that the latter will only start once the first has been completed."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe as an admonition somewhere saying that when using dvc exp run
only the stages downstream from your changes are triggered?
Closing in favor of https://github.com/iterative/iterative.ai/pull/550. |
@yathomasi I noticed as I was going over comments and checking Milecia's repo that her link to her blog post in this repo is broken. I thought they would redirect to the new site. |
The link was missing. I have added the link. Readme |
Tentative publish date:
07/19/2207/28/2208/24/22Things left to update: