Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove pipeline_parameters and custom_hyperparameters and replace with search_parameters #3373

Merged
merged 42 commits into from
Mar 24, 2022

Conversation

bchen1116
Copy link
Contributor

fix #3153 and fix #3150

Design doc in confluence

@bchen1116 bchen1116 self-assigned this Mar 14, 2022
@codecov
Copy link

codecov bot commented Mar 14, 2022

Codecov Report

Merging #3373 (2847634) into main (da8f266) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3373     +/-   ##
=======================================
- Coverage   99.7%   99.6%   -0.0%     
=======================================
  Files        329     329             
  Lines      32405   32380     -25     
=======================================
- Hits       32276   32249     -27     
- Misses       129     131      +2     
Impacted Files Coverage Δ
...sts/test_automl_search_classification_iterative.py 100.0% <ø> (ø)
evalml/automl/automl_algorithm/automl_algorithm.py 100.0% <100.0%> (ø)
...valml/automl/automl_algorithm/default_algorithm.py 100.0% <100.0%> (ø)
...lml/automl/automl_algorithm/iterative_algorithm.py 97.4% <100.0%> (-1.0%) ⬇️
evalml/automl/automl_search.py 99.6% <100.0%> (-0.1%) ⬇️
...ts/automl_tests/parallel_tests/test_automl_dask.py 96.3% <100.0%> (ø)
evalml/tests/automl_tests/test_automl.py 99.5% <100.0%> (+0.1%) ⬆️
evalml/tests/automl_tests/test_automl_algorithm.py 98.6% <100.0%> (+0.5%) ⬆️
...ts/automl_tests/test_automl_iterative_algorithm.py 100.0% <100.0%> (ø)
.../automl_tests/test_automl_search_classification.py 96.5% <100.0%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update da8f266...2847634. Read the comment docs.

@bchen1116 bchen1116 marked this pull request as ready for review March 14, 2022 20:52
@bchen1116 bchen1116 requested a review from a team March 14, 2022 20:52
Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have some nitpicky doc comments for now, I'll come back and do a full review later!

docs/source/release_notes.rst Show resolved Hide resolved
docs/source/user_guide/automl.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/automl.ipynb Outdated Show resolved Hide resolved
for (
name,
component_instance,
) in pipeline.component_graph.component_instances.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 This code block is doing two things:

  1. Getting random values from the skopt spaces so that the parameters used in the first batch are in the space the tuner is tuning over
  2. Making sure the the _pipeline_parameters are correctly added to the parameters so that Drop Columns etc get the right parameters

I think this would be simpler if 1 was a tuner method, like get_starting_parameters ?

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 Thank you for your work on this! I left some suggestions for testing improvements. This is looking pretty good though.

docs/source/release_notes.rst Outdated Show resolved Hide resolved
evalml/automl/automl_algorithm/automl_algorithm.py Outdated Show resolved Hide resolved
@@ -652,12 +646,15 @@ def __init__(
self.sampler_method,
self.sampler_balanced_ratio,
)
if self._sampler_name not in parameters and self._sampler_name is not None:
parameters[self._sampler_name] = {
if (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved to the AutoMLAlgorithm? It's kind of awkward that there parameters are set in AutoMLSearch while the rest are set in the AutoMLAlgorithm.

Copy link
Contributor Author

@bchen1116 bchen1116 Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton I think this would be a weird move. We use a lot of information that isn't massed to the AutoMLAlgorithm to determine whether we use a sampler and which sampler to use. We would need to pass all of this relevant data to the AutoMLAlgorithm in order to move this logic, and I'm not sure if that's worth it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence on this:
On one side, I made the decision to move pipeline building into the algorithms and this certainly falls under that category. On the other side, I do understand @bchen1116's concern about bloat in AutoMLAlgorithm. @bchen1116 can you file an issue and use this discussion as context for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea If the long term plan is to move pipeline building logic to the algorithms then I think the logic for determining whether or not to add a sampler should move to the algorithms. I think there are some unused parameters in the automl algos right now that can be cleaned up too, e.g. number_features. We can do that in a separate issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed issue here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

evalml/tests/automl_tests/test_automl.py Show resolved Hide resolved
evalml/tuners/tuner.py Show resolved Hide resolved
evalml/automl/automl_algorithm/iterative_algorithm.py Outdated Show resolved Hide resolved
evalml/tests/automl_tests/test_automl.py Show resolved Hide resolved
docs/source/user_guide/automl.ipynb Show resolved Hide resolved
assert aml._tuners.keys() == aml_add_pipelines._tuners.keys()
assert aml._tuner_class == aml_add_pipelines._tuner_class
aml.next_batch()
aml._transform_parameters(None, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this line do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codecov would raise errors if I didn't have calls to the next_batch and _transform_parameters methods. This was to satisfy that

evalml/tests/automl_tests/test_automl.py Show resolved Hide resolved
Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great work @bchen1116, a big value add in cleaning up the internal API as well as the external parameters API. Appreciate the cleanup in DefaultAlgo as well! Just left some general comments.

evalml/automl/automl_algorithm/default_algorithm.py Outdated Show resolved Hide resolved
evalml/automl/automl_search.py Show resolved Hide resolved
@@ -652,12 +646,15 @@ def __init__(
self.sampler_method,
self.sampler_balanced_ratio,
)
if self._sampler_name not in parameters and self._sampler_name is not None:
parameters[self._sampler_name] = {
if (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence on this:
On one side, I made the decision to move pipeline building into the algorithms and this certainly falls under that category. On the other side, I do understand @bchen1116's concern about bloat in AutoMLAlgorithm. @bchen1116 can you file an issue and use this discussion as context for it?

evalml/tests/automl_tests/test_automl.py Outdated Show resolved Hide resolved
@bchen1116 bchen1116 merged commit b442453 into main Mar 24, 2022
@chukarsten chukarsten mentioned this pull request Mar 25, 2022
chukarsten added a commit that referenced this pull request Mar 28, 2022
… replace with `search_parameters` (#3373)"

This reverts commit b442453.
freddyaboulton pushed a commit that referenced this pull request Mar 28, 2022
… replace with `search_parameters`" (#3410)

* Revert "Remove `pipeline_parameters` and `custom_hyperparameters` and replace with `search_parameters` (#3373)"

This reverts commit b442453.

* Release notes.
@chukarsten chukarsten mentioned this pull request Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants