feat: Added AutoMLTablesTrainingJob and tests #62

ivanmkc · 2020-11-13T05:29:59Z

Added support for AutoMLTablesTraining.

Fixes https://b.corp.google.com/issues/172282518
Depends on #49

Could possibly add more client-side validation, but currently deferring validation to the backend services pending more discussion.

ivanmkc · 2020-11-17T10:36:53Z

google/cloud/aiplatform/training_jobs.py

+            "transformations": self._column_transformations,
+            "trainBudgetMilliNodeHours": budget_milli_node_hours,
+            # optional inputs
+            "weightColumnName": weight_column,


In google3/google/cloud/aiplatform/publicfiles/trainingjob/definition/automl_tables.proto, this column is referred to as "weight_column_name".

However, in gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_tabular_1.0.0.yaml, it gives the name as "weightColumn". This is incorrect and results in an error at training time.

@sasha-gitg why is there a discrepancy here?

I imagine the protos are the source-of-truth and the yaml is just out-of-date? If so, what's our plan to mitigate the users from dealing with this, since they don't have access to the protos AFAIK.

Following up on this.

Synced with the team. The yaml is incorrect and will be updated. The field should be weightColumnName. If users are using our Model Builder SDK then hopefully we have caught these issues early during our development. This should become less of an issue for the service as we move forward because it will support all previously versioned yaml schemas.

ivanmkc · 2020-11-17T13:51:42Z

google/cloud/aiplatform/training_jobs.py

+    def __init__(
+        self,
+        display_name: str,
+        optimization_objective: str,


According to the protos and yaml, the optimization_objective is optional since there is a default if it's not supplied.

Also, the design doc doesn't have the optimization_prediction_type parameter either. Sure, we can support this by inferring that info based on the optimization_objective, but is that what you intended?

My personal preferred way is to create an OptimizationObjective abstract class and create subclasses of each for each optimization_objective type. That would encapsulate the optimization_objective, optimization_prediction_type , optimization_objective_recall_value and optimization_objective_precision_value parameters together into one.

However, from our other convos it seems like you might prefer just passing in a string for optimization_objective and having the other parameters be optional.

Let me know what you prefer, regarding the type for optimization_objective and whether or not to include a optimization_prediction_type parameter.

@sasha-gitg

With respect to optimization_prediction_type we received feedback to include that as the top level argument because it requires less knowledge overhead than optimization_objective. Additionally, we should include all the values flattened in the API surface and validate or ignore arguments based on valid combinations. There's precedence for this pattern:

https://github.com/scikit-learn/scikit-learn/blob/0fb307bf3/sklearn/linear_model/_logistic.py#L1011

So the current state of the input arguments LGTM just elevate optimization_prediction_type over optimization_objective and make optimization_objective optional but add comments to explain the defaults when selecting optimization_prediction_type.

Thanks for the context, sgtm!

ivanmkc · 2020-11-18T12:40:42Z

google/cloud/aiplatform/__init__.py

@@ -31,4 +34,11 @@
 """
 init = initializer.global_config.init

-__all__ = ("gapic", "CustomTrainingJob", "Model", "Dataset", "Endpoint")
+__all__ = (


The linter did this

looks good.

sasha-gitg

LGTM! Minor comments.

google/cloud/aiplatform/training_jobs.py

ivanmkc · 2020-11-24T13:19:05Z

google/cloud/aiplatform/training_jobs.py

+        validation_fraction_split: float,
+        test_fraction_split: float,
+        weight_column: Optional[str] = None,
+        budget_milli_node_hours: int = 1000,


I'm guessing setting this to less than 1000 hours will cause the backend to use 1000 hours instead. Probably same with the maximum value.

I have no idea if this is true though and this seems opaque to users.

Ideally, the backend should respond with a warning if the parameters are out of bounds.

Added the training job subclass for tables

Added the training job subclass for tables Co-authored-by: Ivan Cheung <[email protected]>

google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Nov 13, 2020

ivanmkc force-pushed the imkc--trainingjob-automl-tables-training-job branch 2 times, most recently from 338063f to 2802dc8 Compare November 17, 2020 07:30

ivanmkc commented Nov 17, 2020

View reviewed changes

ivanmkc force-pushed the imkc--trainingjob-automl-tables-training-job branch from 2759662 to 36d6b7c Compare November 17, 2020 10:38

ivanmkc commented Nov 17, 2020

View reviewed changes

ivanmkc force-pushed the imkc--trainingjob-automl-tables-training-job branch 2 times, most recently from 5831100 to c175714 Compare November 18, 2020 12:29

ivanmkc commented Nov 18, 2020

View reviewed changes

sirtorry self-requested a review November 18, 2020 21:30

ivanmkc force-pushed the imkc--trainingjob-automl-tables-training-job branch 2 times, most recently from c12d2b9 to 7858dfd Compare November 19, 2020 18:53

sasha-gitg approved these changes Nov 19, 2020

View reviewed changes

ivanmkc commented Nov 24, 2020

View reviewed changes

ivanmkc force-pushed the imkc--trainingjob-automl-tables-training-job branch 3 times, most recently from 2e4dc7b to caf7dc4 Compare November 24, 2020 13:52

feat: Added AutoMLTablesTrainingJob

072850f

Added the training job subclass for tables

ivanmkc force-pushed the imkc--trainingjob-automl-tables-training-job branch from caf7dc4 to 072850f Compare November 24, 2020 14:07

ivanmkc merged commit aa5e15d into googleapis:dev Nov 24, 2020

dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Nov 30, 2020

feat: Added AutoMLTablesTrainingJob (googleapis#62)

e1a7845

Added the training job subclass for tables Co-authored-by: Ivan Cheung <[email protected]>

dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Nov 30, 2020

feat: Added AutoMLTablesTrainingJob (googleapis#62)

ee3af38

Added the training job subclass for tables Co-authored-by: Ivan Cheung <[email protected]>

dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Nov 30, 2020

feat: Added AutoMLTablesTrainingJob (googleapis#62)

5c692c2

Added the training job subclass for tables Co-authored-by: Ivan Cheung <[email protected]>

dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Nov 30, 2020

feat: Added AutoMLTablesTrainingJob (googleapis#62)

633bb18

Added the training job subclass for tables Co-authored-by: Ivan Cheung <[email protected]>

dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Nov 30, 2020

feat: Added AutoMLTablesTrainingJob (googleapis#62)

cccf486

Added the training job subclass for tables Co-authored-by: Ivan Cheung <[email protected]>

dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Dec 22, 2020

feat: Added AutoMLTablesTrainingJob (googleapis#62)

239fedb

Added the training job subclass for tables Co-authored-by: Ivan Cheung <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Added AutoMLTablesTrainingJob and tests #62

feat: Added AutoMLTablesTrainingJob and tests #62

ivanmkc commented Nov 13, 2020 •

edited

Loading

ivanmkc Nov 17, 2020 •

edited

Loading

sasha-gitg Nov 18, 2020

sasha-gitg Nov 18, 2020

ivanmkc Nov 17, 2020 •

edited

Loading

ivanmkc Nov 17, 2020

sasha-gitg Nov 18, 2020

ivanmkc Nov 18, 2020

ivanmkc Nov 18, 2020

sirtorry Nov 18, 2020

sasha-gitg left a comment

ivanmkc Nov 24, 2020

feat: Added AutoMLTablesTrainingJob and tests #62

feat: Added AutoMLTablesTrainingJob and tests #62

Conversation

ivanmkc commented Nov 13, 2020 • edited Loading

ivanmkc Nov 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivanmkc Nov 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sasha-gitg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivanmkc commented Nov 13, 2020 •

edited

Loading

ivanmkc Nov 17, 2020 •

edited

Loading

ivanmkc Nov 17, 2020 •

edited

Loading