-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Added AutoMLTablesTrainingJob and tests #62
feat: Added AutoMLTablesTrainingJob and tests #62
Conversation
338063f
to
2802dc8
Compare
"transformations": self._column_transformations, | ||
"trainBudgetMilliNodeHours": budget_milli_node_hours, | ||
# optional inputs | ||
"weightColumnName": weight_column, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In google3/google/cloud/aiplatform/publicfiles/trainingjob/definition/automl_tables.proto, this column is referred to as "weight_column_name".
However, in gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_tabular_1.0.0.yaml
, it gives the name as "weightColumn". This is incorrect and results in an error at training time.
@sasha-gitg why is there a discrepancy here?
I imagine the protos are the source-of-truth and the yaml is just out-of-date? If so, what's our plan to mitigate the users from dealing with this, since they don't have access to the protos AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synced with the team. The yaml is incorrect and will be updated. The field should be weightColumnName
. If users are using our Model Builder SDK then hopefully we have caught these issues early during our development. This should become less of an issue for the service as we move forward because it will support all previously versioned yaml schemas.
2759662
to
36d6b7c
Compare
def __init__( | ||
self, | ||
display_name: str, | ||
optimization_objective: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the protos and yaml, the optimization_objective
is optional since there is a default if it's not supplied.
Also, the design doc doesn't have the optimization_prediction_type
parameter either. Sure, we can support this by inferring that info based on the optimization_objective
, but is that what you intended?
My personal preferred way is to create an OptimizationObjective abstract class and create subclasses of each for each optimization_objective
type. That would encapsulate the optimization_objective
, optimization_prediction_type
, optimization_objective_recall_value
and optimization_objective_precision_value
parameters together into one.
However, from our other convos it seems like you might prefer just passing in a string for optimization_objective
and having the other parameters be optional.
Let me know what you prefer, regarding the type for optimization_objective
and whether or not to include a optimization_prediction_type
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With respect to optimization_prediction_type
we received feedback to include that as the top level argument because it requires less knowledge overhead than optimization_objective
. Additionally, we should include all the values flattened in the API surface and validate or ignore arguments based on valid combinations. There's precedence for this pattern:
So the current state of the input arguments LGTM just elevate optimization_prediction_type
over optimization_objective
and make optimization_objective
optional but add comments to explain the defaults when selecting optimization_prediction_type
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context, sgtm!
5831100
to
c175714
Compare
google/cloud/aiplatform/__init__.py
Outdated
@@ -31,4 +34,11 @@ | |||
""" | |||
init = initializer.global_config.init | |||
|
|||
__all__ = ("gapic", "CustomTrainingJob", "Model", "Dataset", "Endpoint") | |||
__all__ = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linter did this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good.
c12d2b9
to
7858dfd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Minor comments.
validation_fraction_split: float, | ||
test_fraction_split: float, | ||
weight_column: Optional[str] = None, | ||
budget_milli_node_hours: int = 1000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing setting this to less than 1000 hours will cause the backend to use 1000 hours instead. Probably same with the maximum value.
I have no idea if this is true though and this seems opaque to users.
Ideally, the backend should respond with a warning if the parameters are out of bounds.
2e4dc7b
to
caf7dc4
Compare
Added the training job subclass for tables
caf7dc4
to
072850f
Compare
Added the training job subclass for tables Co-authored-by: Ivan Cheung <[email protected]>
Added the training job subclass for tables Co-authored-by: Ivan Cheung <[email protected]>
Added the training job subclass for tables Co-authored-by: Ivan Cheung <[email protected]>
Added the training job subclass for tables Co-authored-by: Ivan Cheung <[email protected]>
Added the training job subclass for tables Co-authored-by: Ivan Cheung <[email protected]>
Added the training job subclass for tables Co-authored-by: Ivan Cheung <[email protected]>
Added support for AutoMLTablesTraining.
Fixes https://b.corp.google.com/issues/172282518
Depends on #49
Could possibly add more client-side validation, but currently deferring validation to the backend services pending more discussion.