Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allowed_model_families doesn't work in AutoMLSearch initilisation #4437

Closed
enfeizhan opened this issue May 30, 2024 · 2 comments
Closed

allowed_model_families doesn't work in AutoMLSearch initilisation #4437

enfeizhan opened this issue May 30, 2024 · 2 comments
Labels
bug Issues tracking problems with existing features.

Comments

@enfeizhan
Copy link

This argument is supposed to confine the model family search scope. However, all model families will be searched no matter what input for this argument.

Version: 0.83.0

Code Sample, a copy-pastable example to reproduce your bug.

import evalml
X, y = evalml.demos.load_breast_cancer()
automl_with_ensembling = evalml.AutoMLSearch(X_train=X, y_train=y,
                                      problem_type="binary",
                                      allowed_model_families=['linear_model'],
                                      max_batches=4,
                                      ensembling=True)
print(automl_with_ensembling.allowed_model_families)
automl_with_ensembling.search()

automl_with_ensembling.allowed_model_families returns a blank list instead of the list of linear models.
automl_with_ensembling.search() returns 5 models, which is not limited to linear models:

{1: {'Random Forest Classifier w/ Label Encoder + Imputer + RF Classifier Select From Model': 2.085165023803711,
'Total time of batch': 2.204948902130127},
2: {'Elastic Net Classifier w/ Label Encoder + Imputer + Standard Scaler + Select Columns Transformer': 1.0889050960540771,
'Logistic Regression Classifier w/ Label Encoder + Imputer + Standard Scaler + Select Columns Transformer': 3.366680145263672,
'Total time of batch': 4.698328256607056},
3: {'Stacked Ensemble Classification Pipeline': 2.254487991333008,
'Total time of batch': 2.36545991897583},
4: {'Logistic Regression Classifier w/ Label Encoder + Imputer + Standard Scaler + Select Columns Transformer': 1.7583808898925781,
'Random Forest Classifier w/ Label Encoder + Imputer + Select Columns Transformer': 4.039682149887085,
'Total time of batch': 248.39868783950806},
5: {'Stacked Ensemble Classification Pipeline': 2.4489309787750244,
'Total time of batch': 2.5626749992370605}}

@enfeizhan enfeizhan added the bug Issues tracking problems with existing features. label May 30, 2024
@enfeizhan
Copy link
Author

Is there anyone still active here?

@eccabay
Copy link
Contributor

eccabay commented Jun 13, 2024

Hi @enfeizhan, this is the correct behavior. From the documentation:

        allowed_model_families (list(str, ModelFamily)): The model families to search. ... For default algorithm, this only applies to estimators in the non-naive batches.

The example you provided uses the default algorithm, meaning that the naive batch containing the Random Forest Classifier is still run, which is that first batch you have in your output. The second batch is the first non-naive batch run, which does only include the linear model family estimators. You can see that the allowed models are maintained in automl_with_ensembling.automl_algorithm.allowed_model_families.

Note that both Elastic Net and Logistic Regression are linear models.

@eccabay eccabay closed this as completed Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues tracking problems with existing features.
Projects
None yet
Development

No branches or pull requests

2 participants