Simplify champion-challenger approach #7

felix-datatonic · 2023-05-02T16:31:50Z

Description

Add Vertex AI services:

custom training job
model evaluation import
model versioning

Change of logic:

upload train script before triggering training pipeline
always upload a new model version after training
don't re-evaluate an existing champion, rather use historic evaluation results

How has this been tested?

make test-all-components
make e2e-tests

Checklist

I have commented my code, particularly in hard-to-understand areas
I have successfully run the E2E tests, and have included the links to the pipeline runs below
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have updated any relevant documentation to reflect my changes
I have assigned a reviewer and messaged them

Pipeline run links:

README.md

pipeline_components/aiplatform/aiplatform/custom_train_job/component.py

pipeline_components/aiplatform/aiplatform/import_model_evaluation/component.py

pipeline_components/aiplatform/aiplatform/model_batch_predict/component.py

pipeline_components/aiplatform/aiplatform/update_best_model/component.py

pipeline_components/bigquery/bigquery/upload_prediction/component.py

pipelines/Pipfile

pipelines/pipelines/tensorflow/training/assets/task_tf.py

pipelines/pipelines/xgboost/training/assets/task.py

pipelines/tests/tensorflow/prediction/test_e2e.py

Pipfile

Pipfile.lock

CONTRIBUTING.md

pipeline_components/aiplatform/aiplatform/import_model_evaluation/component.py

pipelines/pipelines/xgboost/training/assets/task.py

pipeline_components/aiplatform/aiplatform/custom_train_job/component.py

ghost · 2023-05-04T09:07:50Z

pipeline_components/aiplatform/aiplatform/custom_train_job/component.py

+        requirements=requirements,
+        model_serving_container_image_uri=serving_container_uri,
+    )
+    cmd_args = [


I think we should be less prescriptive with the command-line arguments. Ideally these would be provided as a list by the user as input to the component (but I see that is tricky given that it contains artifact paths). Could we output the paths as strings from previous components, construct the args in the pipeline definition and then pass into this components as args: List[str] ?

unfortunately can't use op outputs (or pipeline params) in nested objects, leads to compilation error.

tested:

train_dataset_uri = ( extract_bq_to_dataset( #... ) .after(data_cleaning) .set_display_name("Extract train data to storage") ).outputs['dataset_gcs_uri'] train_args = [ "--train_data", train_dataset_uri, # ... ] train_model = custom_train_job( train_script_uri=train_script_uri, args=train_args, # ... ).set_display_name("Train model")

Error: TypeError: Object of type PipelineParam is not JSON serializable

another option - pass in args as a string instead of List[str]? I think this would then work

ghost · 2023-05-04T09:10:31Z

pipeline_components/aiplatform/aiplatform/update_best_model/component.py

+    eval_challenger = aip.model_evaluation.ModelEvaluation(challenger_evaluation)
+    metrics_champion = MessageToDict(eval_champion._gca_resource._pb)["metrics"]
+    metrics_challenger = MessageToDict(eval_challenger._gca_resource._pb)["metrics"]
+    metrics_challenger[eval_metric] -= 0.001  # TODO fake


don't forget this one!

pipeline_components/aiplatform/aiplatform/update_best_model/component.py

ghost · 2023-05-04T09:20:53Z

cloudbuild/e2e-test.yaml

@@ -26,7 +26,7 @@ steps:
        mkdir -p ${COMMIT_SHA}/prediction/assets && \
        cp -r pipelines/pipelines/${_PIPELINE_TEMPLATE}/training/assets ${COMMIT_SHA}/training/ && \
        cp -r pipelines/pipelines/${_PIPELINE_TEMPLATE}/prediction/assets ${COMMIT_SHA}/prediction/ && \
-        gsutil cp -r ${COMMIT_SHA} ${_PIPELINE_PUBLISH_GCS_PATH}
+        gsutil cp -r ${COMMIT_SHA} ${_PIPELINE_PUBLISH_GCS_PATH}/${COMMIT_SHA}


@felix-datatonic are you sure about this? Won't this create a nested directory?

yes, tested successfully. in the prior case only the contents of ${COMMIT_SHA} where copied to ${_PIPELINE_PUBLISH_GCS_PATH}. feel free to test it different combinations and values in Cloud Build though.

pipeline_components/aiplatform/aiplatform/custom_train_job/component.py

pipeline_components/aiplatform/aiplatform/update_best_model/component.py

pipelines/pipelines/xgboost/training/pipeline.py

pipeline_components/aiplatform/aiplatform/custom_train_job/component.py

pipeline_components/aiplatform/aiplatform/lookup_model/component.py

pipeline_components/aiplatform/aiplatform/update_best_model/component.py

pipeline_components/aiplatform/aiplatform/model_batch_predict/component.py

pipeline_components/aiplatform/aiplatform/update_best_model/component.py

felix-datatonic added 27 commits April 26, 2023 19:58

Remove TFDV and add model monitoring for XGBoost

74ea424

Add custom batch prediction component

cf248f9

Add custom wait GCP component

be4171f

Persist training on GCS in model folder

6d318bc

Add model monitoring for tensorflow

01199dc

Only sync assets if folder exists

c6398ba

Only sync assets if folder exists

fd40562

Update XGBoost prediction pipeline to match new component inputs

96c5122

Update docstrings

0cebdc5

Update and remove outdated docs

9eff36b

Fix unit tests

115e35e

Remove unused helper components

fcace39

Remove unused helper components

e63d644

Update E2E tests

262b587

Remove unused helper components

ed9c19c

Resolve minor issue

f41c871

Remove unused container

fff51e0

Remove unused container

dbcceb9

Update load_dataset_to_bq

7d0ee5a

Update pipelines pip dependencies

b403e2f

Merge model_batch_predict and wait_gcp_resources

b8ab093

Restore fail_on_model_not_found in lookup_model

817043c

Update training pipelines with new lookup_model outputs

550d8e4

Create and fix unit tests

67f9539

Restore asset folders

4696770

Add missing dep in aiplatform components

785eb45

Update e2e tests

5e82559

felix-datatonic changed the title ~~Feature/simple cc approach~~ Simplify champion-challenger approach May 2, 2023

felix-datatonic added 2 commits May 2, 2023 18:54

Minor fixes

7468755

Minor fixes

da47c1b

felix-datatonic requested review from a user May 3, 2023 12:59

felix-datatonic self-assigned this May 3, 2023

felix-datatonic added the enhancement New feature or request label May 3, 2023

felix-datatonic added 2 commits May 3, 2023 15:05

Fix test_dataset_uri condition in training pipelines

7799263

Fix test_dataset_uri condition in training pipelines

f705b89

felix-datatonic commented May 3, 2023

View reviewed changes

felix-datatonic marked this pull request as ready for review May 3, 2023 13:29

ghost suggested changes May 3, 2023

View reviewed changes

Address PR review comments

408c073

felix-datatonic commented May 3, 2023

View reviewed changes

pipeline_components/aiplatform/aiplatform/custom_train_job/component.py Outdated Show resolved Hide resolved

felix-datatonic added 3 commits May 3, 2023 16:54

Remove hard-coded staging bucket

52ae013

Fix assets destination in e2e cloudbuild trigger

cb965c9

Fix assets destination in e2e cloudbuild trigger

7fde5d2

ghost suggested changes May 4, 2023

View reviewed changes

felix-datatonic added 2 commits May 4, 2023 13:24

Use lookup model in training pipelines

c981e2e

Update display names of pipeline steps

4ee3bcb

ghost reviewed May 4, 2023

View reviewed changes

pipeline_components/aiplatform/aiplatform/custom_train_job/component.py Show resolved Hide resolved

ghost reviewed May 4, 2023

View reviewed changes

pipeline_components/aiplatform/aiplatform/custom_train_job/component.py Show resolved Hide resolved

efbbrown-dt reviewed May 4, 2023

View reviewed changes

pipeline_components/aiplatform/aiplatform/lookup_model/component.py Show resolved Hide resolved

Fix model batch predict

df7c9aa

ghost reviewed May 4, 2023

View reviewed changes

pipeline_components/aiplatform/aiplatform/update_best_model/component.py Show resolved Hide resolved

felix-datatonic added 3 commits May 5, 2023 08:43

Change tensorflow prediction pipeline from JSONL to BQ inputs/outputs

01581ac

Enable caching when exporting to GCS

1cd5efc

Update tensorflow training pipeline

f556484

ghost suggested changes May 5, 2023

View reviewed changes

felix-datatonic added 2 commits May 5, 2023 10:55

Address minor comments and remove TODOs

efd8b9f

Fix tensorflow training pipeline

2486246

felix-datatonic merged commit 484ea38 into main May 9, 2023

felix-datatonic mentioned this pull request Nov 15, 2023

Release 2.0 GoogleCloudPlatform/vertex-pipelines-end-to-end-samples#69

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify champion-challenger approach #7

Simplify champion-challenger approach #7

felix-datatonic commented May 2, 2023 •

edited by TLipede

Loading

ghost May 4, 2023

felix-datatonic May 4, 2023

felix-datatonic May 4, 2023

ghost May 4, 2023

ghost May 4, 2023

ghost May 4, 2023

felix-datatonic May 4, 2023

Simplify champion-challenger approach #7

Simplify champion-challenger approach #7

Conversation

felix-datatonic commented May 2, 2023 • edited by TLipede Loading

Description

How has this been tested?

Checklist

Pipeline run links:

ghost May 4, 2023

Choose a reason for hiding this comment

felix-datatonic May 4, 2023

Choose a reason for hiding this comment

felix-datatonic May 4, 2023

Choose a reason for hiding this comment

ghost May 4, 2023

Choose a reason for hiding this comment

ghost May 4, 2023

Choose a reason for hiding this comment

ghost May 4, 2023

Choose a reason for hiding this comment

felix-datatonic May 4, 2023

Choose a reason for hiding this comment

felix-datatonic commented May 2, 2023 •

edited by TLipede

Loading