-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove pipeline_parameters
and custom_hyperparameters
and replace with search_parameters
#3373
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3373 +/- ##
=======================================
- Coverage 99.7% 99.6% -0.0%
=======================================
Files 329 329
Lines 32405 32380 -25
=======================================
- Hits 32276 32249 -27
- Misses 129 131 +2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just have some nitpicky doc comments for now, I'll come back and do a full review later!
for ( | ||
name, | ||
component_instance, | ||
) in pipeline.component_graph.component_instances.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bchen1116 This code block is doing two things:
- Getting random values from the skopt spaces so that the parameters used in the first batch are in the space the tuner is tuning over
- Making sure the the _pipeline_parameters are correctly added to the parameters so that Drop Columns etc get the right parameters
I think this would be simpler if 1
was a tuner method, like get_starting_parameters
?
… bc_search_parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bchen1116 Thank you for your work on this! I left some suggestions for testing improvements. This is looking pretty good though.
@@ -652,12 +646,15 @@ def __init__( | |||
self.sampler_method, | |||
self.sampler_balanced_ratio, | |||
) | |||
if self._sampler_name not in parameters and self._sampler_name is not None: | |||
parameters[self._sampler_name] = { | |||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be moved to the AutoMLAlgorithm? It's kind of awkward that there parameters are set in AutoMLSearch while the rest are set in the AutoMLAlgorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton I think this would be a weird move. We use a lot of information that isn't massed to the AutoMLAlgorithm
to determine whether we use a sampler and which sampler to use. We would need to pass all of this relevant data to the AutoMLAlgorithm
in order to move this logic, and I'm not sure if that's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence on this:
On one side, I made the decision to move pipeline building into the algorithms and this certainly falls under that category. On the other side, I do understand @bchen1116's concern about bloat in AutoMLAlgorithm
. @bchen1116 can you file an issue and use this discussion as context for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea If the long term plan is to move pipeline building logic to the algorithms then I think the logic for determining whether or not to add a sampler should move to the algorithms. I think there are some unused parameters in the automl algos right now that can be cleaned up too, e.g. number_features
. We can do that in a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed issue here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!
assert aml._tuners.keys() == aml_add_pipelines._tuners.keys() | ||
assert aml._tuner_class == aml_add_pipelines._tuner_class | ||
aml.next_batch() | ||
aml._transform_parameters(None, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this line do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov would raise errors if I didn't have calls to the next_batch
and _transform_parameters
methods. This was to satisfy that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great work @bchen1116, a big value add in cleaning up the internal API as well as the external parameters API. Appreciate the cleanup in DefaultAlgo
as well! Just left some general comments.
@@ -652,12 +646,15 @@ def __init__( | |||
self.sampler_method, | |||
self.sampler_balanced_ratio, | |||
) | |||
if self._sampler_name not in parameters and self._sampler_name is not None: | |||
parameters[self._sampler_name] = { | |||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence on this:
On one side, I made the decision to move pipeline building into the algorithms and this certainly falls under that category. On the other side, I do understand @bchen1116's concern about bloat in AutoMLAlgorithm
. @bchen1116 can you file an issue and use this discussion as context for it?
fix #3153 and fix #3150
Design doc in confluence