-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update starter tempalte #12
Conversation
…motion logic, added two trainins runs etc
WalkthroughThe project underwent a significant streamlining process, focusing on a simpler ZenML starter template and refining its associated GitHub workflows and actions. It removed extraneous integrations to emphasize Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on X ? TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files ignored due to filter (5)
- copier.yaml
- template/configs/feature_engineering.yaml
- template/configs/inference.yaml
- template/configs/training_rf.yaml
- template/configs/training_sgd.yaml
Files selected for processing (30)
- .github/actions/starter_template_test/action.yml (1 hunks)
- .github/workflows/ci.yml (3 hunks)
- .github/workflows/image-optimizer.yml (1 hunks)
- .gitignore (2 hunks)
- README.md (4 hunks)
- requirements.txt (1 hunks)
- template/README.md (1 hunks)
- template/license_header (1 hunks)
- template/pipelines/init.py (1 hunks)
- template/pipelines/feature_engineering.py (1 hunks)
- template/pipelines/inference.py (1 hunks)
- template/pipelines/training.py (1 hunks)
- template/quickstart.ipynb (1 hunks)
- template/requirements.txt (1 hunks)
- template/run.py (1 hunks)
- template/steps/init.py (1 hunks)
- template/steps/data_loader.py (1 hunks)
- template/steps/data_preprocessor.py (1 hunks)
- template/steps/data_splitter.py (1 hunks)
- template/steps/inference_predict.py (1 hunks)
- template/steps/inference_preprocessor.py (1 hunks)
- template/steps/model_evaluator.py (1 hunks)
- template/steps/model_promoter.py (1 hunks)
- template/steps/model_trainer.py (1 hunks)
- template/utils/init.py (1 hunks)
- template/utils/preprocess.py (1 hunks)
- template/{% if open_source_license %}LICENSE{% endif %} (1 hunks)
- template/{{ _copier_conf.answers_file }} (1 hunks)
- tests/conftest.py (1 hunks)
- tests/test_starter_template.py (3 hunks)
Files skipped from review due to trivial changes (6)
- .gitignore
- template/license_header
- template/pipelines/init.py
- template/requirements.txt
- template/utils/init.py
- template/{{ _copier_conf.answers_file }}
Additional comments: 26
.github/actions/starter_template_test/action.yml (1)
- 69-73: The
Concatenate requirements
step has been simplified to only includesklearn
in the requirements. Ensure that this aligns with the project's dependency simplification strategy and that no other integrations are needed..github/workflows/ci.yml (3)
5-13: Input descriptions for
ref-template
andref-zenml
have been added to theworkflow_dispatch
section, improving the clarity of the workflow file.40-40: The
fail-fast
attribute has been set tofalse
under thestrategy
section, allowing all jobs to run even if one fails. This change could be beneficial for identifying multiple failures in a single workflow run.58-59: The
ref-zenml
andref-template
inputs now have default values, which could be useful for running the workflow with default references without specifying them each time..github/workflows/image-optimizer.yml (1)
- 1-26: A new GitHub Actions workflow named "Compress Images" has been added, which is triggered on pull requests that include image files. This is a good practice for optimizing repository assets and reducing the size of the codebase.
README.md (1)
- 3-15: The
README.md
file has been updated to reflect the repository's shift from a collection of templates to a single starter template for ZenML projects. This change should be communicated clearly to users who may be familiar with the previous structure.requirements.txt (1)
- 1-5: The
requirements.txt
file has been updated with a new version constraint forscikit-learn
and additional dependencieszenml[server]>=0.52.0
andnotebook
. Ensure that these changes are compatible with the project's requirements and do not introduce any version conflicts.template/README.md (1)
- 1-212: A comprehensive guide for building MLOps pipelines with ZenML has been added to the
template/README.md
file. This guide includes an overview, instructions, and detailed explanations, which can be very helpful for new users.template/pipelines/feature_engineering.py (1)
- 1-59: The
feature_engineering
pipeline is well-defined with clear documentation and parameterization. It's structured to load data, process it, and split it into train and test sets, which is a common pattern in MLOps pipelines.template/pipelines/inference.py (1)
- 1-46: The
inference
pipeline is well-defined with clear documentation and parameterization. It's structured to load inference data, process it with a preprocessing pipeline, and run inference with a trained model.template/pipelines/training.py (1)
- 1-58: The
training
pipeline is well-defined with clear documentation and parameterization. It's structured to load data from a preprocessing pipeline, train a model on it, and evaluate the model.template/quickstart.ipynb (1)
- 1-1117: The
quickstart.ipynb
Jupyter notebook has been added to provide a hands-on introduction to MLOps using ZenML. It demonstrates the setup and execution of ML workflows, which can be very beneficial for new users to get started with ZenML.template/run.py (1)
- 1-221: The
template/run.py
file introduces a command-line interface for running different pipelines, enhancing the usability of the project. The CLI is well-structured with clear options and help messages.template/steps/__init__.py (1)
- 1-26: The
template/steps/__init__.py
file is well-organized, importing various step modules that define the different stages of the MLOps pipelines. This centralizes the step definitions and makes them easily accessible.template/steps/data_loader.py (1)
- 1-47: The
data_loader
step is well-defined with clear documentation and parameterization. It's structured to load the Breast Cancer dataset and prepare it for further processing, which is a common requirement in MLOps pipelines.template/steps/data_preprocessor.py (1)
- 1-74: The
data_preprocessor
step is well-defined with clear documentation and parameterization. It's structured to prepare the data for model training, including options to drop NA values, normalize data, and drop specific columns.template/steps/data_splitter.py (1)
- 1-45: The
data_splitter
step is well-defined with clear documentation and parameterization. It's structured to split the dataset into train and test sets, which is a standard procedure in preparing data for machine learning models.template/steps/inference_predict.py (1)
- 1-56: The
inference_predict
step is well-defined with clear documentation and parameterization. It's structured to take a trained model and inference dataset to produce predictions.template/steps/inference_preprocessor.py (1)
- 1-49: The
inference_preprocessor
step is well-defined with clear documentation and parameterization. It's structured to prepare the inference dataset using a pretrained preprocessing pipeline.template/steps/model_evaluator.py (1)
- 1-86: The
model_evaluator
step is well-defined with clear documentation and parameterization. It's structured to evaluate a trained model's performance on the train and test datasets and log the model's accuracy.template/steps/model_promoter.py (1)
- 1-61: The
model_promoter
step is well-defined with clear documentation and parameterization. It's structured to conditionally promote a model based on its accuracy, which is a critical step in the model deployment lifecycle.template/steps/model_trainer.py (1)
- 1-54: The
model_trainer
step is well-defined with clear documentation and parameterization. It's structured to configure and train a model on the training dataset, supporting different types of models.template/utils/preprocess.py (1)
- 1-41: The
template/utils/preprocess.py
file adds support classes for data preprocessing, which are likely to be used in scikit-learn Pipelines. These utility classes are well-documented and provide functionality for dropping NA values, specific columns, and casting data types.template/{% if open_source_license %}LICENSE{% endif %} (1)
- 1-1: The template for including a license file is a standard practice for open-source projects. Ensure that the correct license is included based on the project's licensing strategy.
tests/conftest.py (1)
- 31-36: The
configure_stack
function intests/conftest.py
has been updated to remove configurations for MLflow and Evidently components. Ensure that this change aligns with the updated testing strategy and that the necessary components are still being tested.tests/test_starter_template.py (1)
- 55-104: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [16-123]
The
test_starter_template.py
file has been updated with functions to generate and run a project with different options, including a custom product name. Ensure that these tests cover the new functionality introduced in the PR and that they are passing.
Reborn of #11 due to branch rename
Summary by CodeRabbit
New Features
Improvements
requirements.txt
to specify newer versions of dependencies and added additional required packages.fail-fast
strategy option.Documentation
template/README.md
.Bug Fixes
tests/conftest.py
to align with updated stack setup.Refactor
sklearn
integration.Chores
.gitignore
to include additional directories and file types for better development experience.Tests