Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify champion-challenger approach #7

Merged
merged 55 commits into from
May 9, 2023
Merged
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
74ea424
Remove TFDV and add model monitoring for XGBoost
felix-datatonic Apr 26, 2023
cf248f9
Add custom batch prediction component
felix-datatonic Apr 26, 2023
be4171f
Add custom wait GCP component
felix-datatonic Apr 27, 2023
6d318bc
Persist training on GCS in model folder
felix-datatonic Apr 27, 2023
01199dc
Add model monitoring for tensorflow
felix-datatonic Apr 28, 2023
c6398ba
Only sync assets if folder exists
felix-datatonic Apr 28, 2023
fd40562
Only sync assets if folder exists
felix-datatonic Apr 28, 2023
96c5122
Update XGBoost prediction pipeline to match new component inputs
felix-datatonic Apr 28, 2023
0cebdc5
Update docstrings
felix-datatonic Apr 28, 2023
9eff36b
Update and remove outdated docs
felix-datatonic Apr 28, 2023
115e35e
Fix unit tests
felix-datatonic Apr 28, 2023
fcace39
Remove unused helper components
felix-datatonic Apr 30, 2023
e63d644
Remove unused helper components
felix-datatonic Apr 30, 2023
262b587
Update E2E tests
felix-datatonic Apr 30, 2023
ed9c19c
Remove unused helper components
felix-datatonic Apr 30, 2023
f41c871
Resolve minor issue
felix-datatonic May 2, 2023
fff51e0
Remove unused container
felix-datatonic May 2, 2023
dbcceb9
Remove unused container
felix-datatonic May 2, 2023
7d0ee5a
Update load_dataset_to_bq
felix-datatonic May 2, 2023
b403e2f
Update pipelines pip dependencies
felix-datatonic May 2, 2023
b8ab093
Merge model_batch_predict and wait_gcp_resources
felix-datatonic May 2, 2023
817043c
Restore fail_on_model_not_found in lookup_model
felix-datatonic May 2, 2023
550d8e4
Update training pipelines with new lookup_model outputs
felix-datatonic May 2, 2023
67f9539
Create and fix unit tests
felix-datatonic May 2, 2023
4696770
Restore asset folders
felix-datatonic May 2, 2023
785eb45
Add missing dep in aiplatform components
felix-datatonic May 2, 2023
5e82559
Update e2e tests
felix-datatonic May 2, 2023
7468755
Minor fixes
felix-datatonic May 2, 2023
da47c1b
Minor fixes
felix-datatonic May 2, 2023
de789eb
Reduce assertions in e2e tests
felix-datatonic May 2, 2023
66dd5e2
Add working train pipelines with new champion-challenger approach
felix-datatonic May 1, 2023
41f9a72
Update xgboost training pipeline
felix-datatonic May 1, 2023
fdada28
Minor fixes in XGBoost training
felix-datatonic May 2, 2023
10e40af
Update tensorflow training
felix-datatonic May 3, 2023
36ce2e3
Remove unused tensorflow components
felix-datatonic May 3, 2023
b0ff1ef
Add tensorflow training script
felix-datatonic May 3, 2023
6d9b3df
Add prediction pipelines
felix-datatonic May 3, 2023
60a204d
Remove debug pipelines
felix-datatonic May 3, 2023
3f2178e
Update docs
felix-datatonic May 3, 2023
d5e55d6
Add unit test for update_best_model component
felix-datatonic May 3, 2023
5a96065
Update model names in pipelines
felix-datatonic May 3, 2023
7799263
Fix test_dataset_uri condition in training pipelines
felix-datatonic May 3, 2023
f705b89
Fix test_dataset_uri condition in training pipelines
felix-datatonic May 3, 2023
408c073
Address PR review comments
felix-datatonic May 3, 2023
52ae013
Remove hard-coded staging bucket
felix-datatonic May 3, 2023
cb965c9
Fix assets destination in e2e cloudbuild trigger
felix-datatonic May 4, 2023
7fde5d2
Fix assets destination in e2e cloudbuild trigger
felix-datatonic May 4, 2023
c981e2e
Use lookup model in training pipelines
felix-datatonic May 4, 2023
4ee3bcb
Update display names of pipeline steps
felix-datatonic May 4, 2023
df7c9aa
Fix model batch predict
felix-datatonic May 4, 2023
01581ac
Change tensorflow prediction pipeline from JSONL to BQ inputs/outputs
felix-datatonic May 5, 2023
1cd5efc
Enable caching when exporting to GCS
felix-datatonic May 5, 2023
f556484
Update tensorflow training pipeline
felix-datatonic May 5, 2023
efd8b9f
Address minor comments and remove TODOs
felix-datatonic May 5, 2023
2486246
Fix tensorflow training pipeline
felix-datatonic May 5, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 2 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,16 +116,15 @@ We use End-to-end (E2E) pipeline tests to ensure that our pipelines are running
- That common tasks(components), which are stored in a dictionary object (`common_tasks`), occurred in the pipeline
felix-datatonic marked this conversation as resolved.
Show resolved Hide resolved
- That if any task in a conditional tasks dictionary object occurred in the pipeline, the remaining tasks based on that condition should have all occurred as well
- That these pipeline tasks output the correct artifacts, by checking whether they have been saved to a GCS URI or have been generated successfully in Vertex AI.

Note:
These dictionary objects (`common_tasks`, `conditional_tasks`) are defined in `test_e2e.py` in each pipeline folder e.g (`./pipelines/tests/xgboost/training/test_e2e.py`).
The E2E test only allows one common tasks group but the number of conditional tasks group is not limited. To define the correct task group,
please go to pipeline job on Vertex AI for more information.
For example, in the XGBoost training pipeline, we have two conditional tasks groups that are bounded in the dashed frame.
Thus, in `./pipelines/tests/xgboost/training/test_e2e.py`, there are two dictionaries of two conditional tasks group.


![Conditional tasks in XGB](docs/images/conditional_tasks_snippet.png)
- Optionally check for executed tasks and created output artifacts.

#### How to run end-to-end (E2E) pipeline tests
E2E tests are run on each PR that is merged to the main branch. You can also run them on your local machine:
Expand Down Expand Up @@ -282,6 +281,3 @@ To make sure that assets are available while running the ML pipelines, `make run
### Common assets

Within the [assets](./assets/) folder, there are common files stored which need to be uploaded to Google Cloud Storage so that the pipelines running Vertex AI can consume such assets, namely:

- TFDV schema for [detecting input data anomalies](https://www.tensorflow.org/tfx/guide/tfdv#schema_based_example_validation): This schema file can be created using a [sample notebook](pipelines/schema_creation.ipynb) to ensure that new training data complies with our data assumptions and constraints as part of the training pipeline.
- TFDV schema for [detecting data skew](https://www.tensorflow.org/tfx/guide/tfdv#training-serving_skew_detection): This schema file is used to detect training-serving skew in the prediction pipeline. It can be created similarly to other schema files. However, it will need to include [skew detection settings](https://www.tensorflow.org/tfx/data_validation/get_started#checking_data_skew_and_drift).
4 changes: 3 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,9 @@ test-all-components: ## Run unit tests for all pipeline components
done

sync-assets: ## Sync assets folder to GCS. Must specify pipeline=<training|prediction>
@gsutil -m rsync -r -d ./pipelines/pipelines/${PIPELINE_TEMPLATE}/$(pipeline)/assets ${PIPELINE_FILES_GCS_PATH}/$(pipeline)/assets
if [ -d "./pipelines/pipelines/${PIPELINE_TEMPLATE}/$(pipeline)/assets/" ] ; then \
gsutil -m rsync -r -d ./pipelines/pipelines/${PIPELINE_TEMPLATE}/$(pipeline)/assets ${PIPELINE_FILES_GCS_PATH}/$(pipeline)/assets ; \
fi ;

run: ## Compile pipeline, copy assets to GCS, and run pipeline in sandbox environment. Must specify pipeline=<training|prediction>. Optionally specify enable_pipeline_caching=<true|false> (defaults to default Vertex caching behaviour)
@ $(MAKE) compile-pipeline && \
Expand Down
11 changes: 2 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,6 @@ When triggering ad hoc runs in your dev/sandbox environment, or when running the
### Assets

In each pipeline folder, there is an `assets` directory (`pipelines/pipelines/<xgboost|tensorflow>/<training|prediction>/assets/`). This can be used for any additional files that may be needed during execution of the pipelines.
For the example pipelines, it may contain data schemata (for Data Validation) or training scripts. This [notebook](pipelines/schema_creation.ipynb) gives an example on schema generation.
This directory is rsync'd to Google Cloud Storage when running a pipeline in the sandbox environment or as part of the CD pipeline (see [CI/CD setup](cloudbuild/README.md)).

## Testing
Expand Down Expand Up @@ -254,11 +253,11 @@ Below is a diagram of how the files are published in each environment in the `e2
└── TAG_NAME or GIT COMMIT HASH <-- Git tag used for the release (release.yaml) OR git commit hash (e2e-test.yaml)
├── prediction
│ ├── assets
│ │ └── tfdv_schema_prediction.pbtxt
│ │ └── some_useful_file.json
│ └── prediction.json <-- compiled prediction pipeline
└── training
├── assets
│ └── tfdv_schema_training.pbtxt
│ └── training_task.py
└── training.json <-- compiled training pipeline
```

Expand All @@ -268,9 +267,3 @@ Below is a diagram of how the files are published in each environment in the `e2
For more details on setting up CI/CD, see the [separate README](cloudbuild/README.md).

For a full walkthrough of the journey from changing the ML pipeline code to having it scheduled and running in production, please see the guide [here](docs/PRODUCTION.md).

### Using Dataflow

The `generate_statistics` pipeline component generates statistics about a given dataset (using the [`generate_statistics_from_csv`](https://www.tensorflow.org/tfx/data_validation/api_docs/python/tfdv/generate_statistics_from_csv) function in the [TensorFlow Data Validation](https://www.tensorflow.org/tfx/guide/tfdv) package) can optionally be run using [DataFlow](https://cloud.google.com/dataflow/) to scale to huge datasets.

For instructions on how to do this, see the [README](pipeline_components/_tfdv/generate_statistics.md) for this component.
22 changes: 0 additions & 22 deletions containers/tfdv/Dockerfile

This file was deleted.

32 changes: 0 additions & 32 deletions containers/tfdv/README.md

This file was deleted.

Binary file removed docs/images/conditional_tasks_snippet.png
Binary file not shown.
Binary file removed docs/images/prediction_pipeline_example.png
Binary file not shown.
Binary file removed docs/images/tensorflow_component_championmodel.png
Binary file not shown.
Binary file not shown.
Binary file removed docs/images/tensorflow_component_schema.png
Binary file not shown.
Binary file removed docs/images/tensorflow_component_training.png
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file removed docs/images/tensorflow_prediction_component_skew.png
Binary file not shown.
Binary file removed docs/images/training_pipeline_example.png
Binary file not shown.
Binary file not shown.
Binary file removed docs/images/xgboost_component_training.png
Binary file not shown.
1 change: 0 additions & 1 deletion pipeline_components/_tensorflow/.python-version

This file was deleted.

18 changes: 0 additions & 18 deletions pipeline_components/_tensorflow/Pipfile

This file was deleted.

Loading