alteryx · bchen1116 · Mar 24, 2022 · Mar 14, 2022 · Mar 14, 2022 · Mar 14, 2022
diff --git a/docs/source/release_notes.rst b/docs/source/release_notes.rst
@@ -5,9 +5,11 @@
     * Enhancements
         * Added ``TimeSeriesFeaturizer`` into ARIMA-based pipelines :pr:`3313`
         * Added caching capability for ensemble training during ``AutoMLSearch`` :pr:`3257`
+        * Replaced ``pipeline_parameters`` and ``custom_hyperparameters`` with ``search_parameters`` in ``AutoMLSearch`` :pr:`3373`
         * Added new error code for zero unique values in ``NoVarianceDataCheck`` :pr:`3372`
     * Fixes
         * Fixed ``get_pipelines`` to reset pipeline threshold for binary cases :pr:`3360`
+        * Simplified internal ``AutoMLSearch`` API to rely on ``search_parameters`` :pr:`3373`
     * Changes
         * Update maintainers :pr:`3365`
     * Documentation Changes

diff --git a/docs/source/user_guide/automl.ipynb b/docs/source/user_guide/automl.ipynb
@@ -477,11 +477,11 @@
    "metadata": {},
    "source": [
     "## Limiting the AutoML Search Space\n",
-    "The AutoML search algorithm first trains each component in the pipeline with their default values. After the first iteration, it then tweaks the parameters of these components using the pre-defined hyperparameter ranges that these components have. To limit the search over certain hyperparameter ranges, you can specify a `custom_hyperparameters` argument with your `AutoMLSearch` parameters. These parameters will limit the hyperparameter search space. \n",
+    "The AutoML search algorithm first trains each component in the pipeline with their default values. After the first iteration, it then tweaks the parameters of these components using the pre-defined hyperparameter ranges that these components have. To limit the search over certain hyperparameter ranges, you can specify a `search_parameters` argument with your `AutoMLSearch` parameters. These parameters will limit the hyperparameter search space or pipeline parameter space. \n",
     "\n",
-    "Hyperparameter ranges can be found through the [API reference](https://evalml.alteryx.com/en/stable/api_reference.html) for each component. Parameter arguments must be specified as dictionaries, but the associated values can be single values or `skopt.space` Real, Integer, Categorical values.\n",
+    "Hyperparameter ranges can be found through the [API reference](https://evalml.alteryx.com/en/stable/api_reference.html) for each component. Parameter arguments must be specified as dictionaries, but the associated values must be `skopt.space` Real, Integer, Categorical values for setting hyperparameter values.\n",
     "\n",
-    "If however you'd like to specify certain values for the initial batch of the AutoML search algorithm, you can use the `pipeline_parameters` argument. This will set the initial batch's component parameters to the values passed by this argument."
+    "If however you'd like to specify certain values for the initial batch of the AutoML search algorithm, you can use the `search_parameters` argument with non `skopt.space` objects. This will set the initial batch's component parameters to the values passed by this argument."
    ]
   },
   {
@@ -499,30 +499,52 @@
     "X, y = load_fraud(n_rows=1000)\n",
     "\n",
     "# example of setting parameter to just one value\n",
-    "custom_hyperparameters = {'Imputer': {\n",
+    "search_parameters = {'Imputer': {\n",
     "    'numeric_impute_strategy': 'mean'\n",
     "}}\n",
     "\n",
     "\n",
     "# limit the numeric impute strategy to include only `median` and `most_frequent`\n",
     "# `mean` is the default value for this argument, but it doesn't need to be included in the specified hyperparameter range for this to work\n",
-    "custom_hyperparameters = {'Imputer': {\n",
+    "search_parameters = {'Imputer': {\n",
     "    'numeric_impute_strategy': Categorical(['median', 'most_frequent'])\n",
     "}}\n",
-    "# set the initial batch numeric impute strategy strategy to 'median'\n",
-    "pipeline_parameters = {'Imputer': {\n",
-    "    'numeric_impute_strategy': 'median'\n",
-    "}}\n",
     "\n",
     "# using this custom hyperparameter means that our Imputer components in these pipelines will only search through\n",
-    "# 'median' and 'most_frequent' strategies for 'numeric_impute_strategy', and the initial batch parameter will be\n",
-    "# set to 'median'\n",
+    "# 'median' and 'most_frequent' strategies for 'numeric_impute_strategy'\n",
     "automl_constrained = AutoMLSearch(X_train=X, y_train=y, problem_type='binary', \n",
-    "                                  pipeline_parameters=pipeline_parameters,\n",
-    "                                  custom_hyperparameters=custom_hyperparameters, \n",
+    "                                  search_parameters=search_parameters,\n",
     "                                  verbose=True)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`search_parameters` can set both hyperparameter ranges and pipeline parameters. To set the hyperparameter space, an `skopt.space` Integer, Real, or Categorical object must be used. All other values will be associated to setting the pipeline parameters directly.\n",
+    "\n",
+    "Let's walk through some examples to explain this. For instance,\n",
+    "```python\n",
+    "search_parameters = {'Imputer': {\n",
+    "    'numeric_impute_strategy': 'mean'\n",
+    "}}\n",
+    "```\n",
+    "then in the initial search, the algorithm would use `mean` as the impute strategy in batch 1. However, since `Imputer.numeric_impute_strategy` has a valid hyperparameter range, if the algorithm suggests a different strategy, it can and will change this value. To limit this to using `mean` only for the duration of the search, it is necessary to use the `skopt.space`:\n",
+    "```python\n",
+    "search_parameters = {'Imputer': {\n",
+    "    'numeric_impute_strategy': Categorical(['mean'])\n",
+    "}}\n",
+    "```\n",
+    "\n",
+    "However, if a value has no hyperparameter range associated, then the algorithm will use this value as the only parameter. For instance,\n",
+    "```python\n",
+    "search_parameters = {'Label Encoder': {\n",
+    "    'positive_label': True\n",
+    "}}\n",
+    "```\n",
+    "Since `Label Encoder.positive_label` has no associated hyperparameter range, the algorithm will use this parameter for the entire duration of the search."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -579,16 +601,16 @@
     "# for the oversampler, we don't want to oversample this class, so class 0 (majority) will have a ratio of 1 to itself\n",
     "# for the minority class 1, we want to oversample it to have a minority/majority ratio of 0.5, which means we want minority to have 1/2 the samples as the minority\n",
     "sampler_ratio_dict = {0: 1, 1: 0.5}\n",
-    "pipeline_parameters = {\"Oversampler\": {\"sampler_balanced_ratio\": sampler_ratio_dict}}\n",
-    "automl_auto_ratio_dict = AutoMLSearch(X_train=X, y_train=y, problem_type='binary', pipeline_parameters=pipeline_parameters, automl_algorithm='iterative')\n",
+    "search_parameters = {\"Oversampler\": {\"sampler_balanced_ratio\": sampler_ratio_dict}}\n",
+    "automl_auto_ratio_dict = AutoMLSearch(X_train=X, y_train=y, problem_type='binary', search_parameters=search_parameters, automl_algorithm='iterative')\n",
     "automl_auto_ratio_dict.allowed_pipelines[-1]\n",
     "\n",
     "# Undersampler case\n",
     "# we don't want to undersample this class, so class 1 (minority) will have a ratio of 1 to itself\n",
     "# for the majority class 0, we want to undersample it to have a minority/majority ratio of 0.5, which means we want majority to have 2x the samples as the minority\n",
     "# sampler_ratio_dict = {0: 0.5, 1: 1}\n",
-    "# pipeline_parameters = {\"Oversampler\": {\"sampler_balanced_ratio\": sampler_ratio_dict}}\n",
-    "# automl_auto_ratio_dict = AutoMLSearch(X_train=X, y_train=y, problem_type='binary', pipeline_parameters=pipeline_parameters)\n"
+    "# search_parameters = {\"Oversampler\": {\"sampler_balanced_ratio\": sampler_ratio_dict}}\n",
+    "# automl_auto_ratio_dict = AutoMLSearch(X_train=X, y_train=y, problem_type='binary', search_parameters=search_parameters)\n"
    ]
   },
   {

diff --git a/evalml/automl/automl_algorithm/automl_algorithm.py b/evalml/automl/automl_algorithm/automl_algorithm.py
@@ -1,6 +1,9 @@
 """Base class for the AutoML algorithms which power EvalML."""
+import inspect
 from abc import ABC, abstractmethod
 
+from skopt.space import Categorical, Integer, Real
+
 from evalml.exceptions import PipelineNotFoundError
 from evalml.pipelines.utils import _make_stacked_ensemble_pipeline
 from evalml.problem_types import is_multiclass
@@ -22,7 +25,7 @@ class AutoMLAlgorithm(ABC):
 
     Args:
         allowed_pipelines (list(class)): A list of PipelineBase subclasses indicating the pipelines allowed in the search. The default of None indicates all pipelines for this problem type are allowed.
-        custom_hyperparameters (dict): Custom hyperparameter ranges specified for pipelines to iterate over.
+        search_parameters (dict): Search parameter ranges specified for pipelines to iterate over.
         tuner_class (class): A subclass of Tuner, to be used to find parameters for each pipeline. The default of None indicates the SKOptTuner will be used.
         text_in_ensembling (boolean): If True and ensembling is True, then n_jobs will be set to 1 to avoid downstream sklearn stacking issues related to nltk. Defaults to None.
         random_seed (int): Seed for the random number generator. Defaults to 0.
@@ -31,7 +34,7 @@ class AutoMLAlgorithm(ABC):
     def __init__(
         self,
         allowed_pipelines=None,
-        custom_hyperparameters=None,
+        search_parameters=None,
         tuner_class=None,
         text_in_ensembling=False,
         random_seed=0,
@@ -45,9 +48,27 @@ def __init__(
         self.text_in_ensembling = text_in_ensembling
         self.n_jobs = n_jobs
         self._selected_cols = None
+        self.search_parameters = search_parameters or {}
+        self._hyperparameters = {}
+        self._pipeline_parameters = {}
+
+        # seperate out the parameter and hyperparameter values
+        for key, value in self.search_parameters.items():
+            hyperparam = {}
+            param = {}
+            for name, parameters in value.items():
+                if isinstance(parameters, (Integer, Categorical, Real)):
+                    hyperparam[name] = parameters
+                else:
+                    param[name] = parameters
+            if hyperparam:
+                self._hyperparameters[key] = hyperparam
+            if param:
+                self._pipeline_parameters[key] = param
+
         for pipeline in self.allowed_pipelines:
             pipeline_hyperparameters = pipeline.get_hyperparameter_ranges(
-                custom_hyperparameters
+                self._hyperparameters
             )
             self._tuners[pipeline.name] = self._tuner_class(
                 pipeline_hyperparameters, random_seed=self.random_seed
@@ -64,14 +85,57 @@ def next_batch(self):
             list[PipelineBase]: A list of instances of PipelineBase subclasses, ready to be trained and evaluated.
         """
 
-    @abstractmethod
     def _transform_parameters(self, pipeline, proposed_parameters):
-        """Given a pipeline parameters dict, make sure pipeline_params, custom_hyperparameters, n_jobs are set properly.
+        """Given a pipeline parameters dict, make sure pipeline_parameters, custom_hyperparameters, n_jobs are set properly.
 
         Arguments:
             pipeline (PipelineBase): The pipeline object to update the parameters.
             proposed_parameters (dict): Parameters to use when updating the pipeline.
         """
+        parameters = {}
+        if "pipeline" in self._pipeline_parameters:
+            parameters["pipeline"] = self._pipeline_parameters["pipeline"]
+
+        for (
+            name,
+            component_instance,
+        ) in pipeline.component_graph.component_instances.items():
+            component_class = type(component_instance)
+            component_parameters = proposed_parameters.get(name, {})
+            init_params = inspect.signature(component_class.__init__).parameters
+            # For first batch, pass the pipeline params to the components that need them
+            if component_parameters != {}:
+                print(component_parameters)
+            if name in self.search_parameters and name not in component_parameters:
+                # only write the value if the name is not existing in the proposed parameters
+                for param_name, value in self.search_parameters[name].items():
+                    if isinstance(value, (Integer, Real)):
+                        # get a random value in the space
+                        component_parameters[param_name] = value.rvs(
+                            random_state=self.random_seed
+                        )[0]
+                    elif isinstance(value, Categorical):
+                        # Categorical
+                        component_parameters[param_name] = value.rvs(
+                            random_state=self.random_seed
+                        )
+                    else:
+                        # we set the pipeline parameter value directly
+                        component_parameters[param_name] = value
+            # Inspects each component and adds the following parameters when needed
+            if "n_jobs" in init_params:
+                component_parameters["n_jobs"] = self.n_jobs
+            try:
+                if "number_features" in init_params:
+                    component_parameters["number_features"] = self.number_features
+            except AttributeError:
+                continue
+            if "pipeline" in self.search_parameters:
+                for param_name, value in self.search_parameters["pipeline"].items():
+                    if param_name in init_params:
+                        component_parameters[param_name] = value
+            parameters[name] = component_parameters
+        return parameters
 
     def add_result(self, score_to_minimize, pipeline, trained_pipeline_results):
         """Register results from evaluating a pipeline.
@@ -131,31 +195,31 @@ def _create_ensemble(self):
 
     def _set_additional_pipeline_params(self):
         drop_columns = (
-            self._pipeline_params["Drop Columns Transformer"]["columns"]
-            if "Drop Columns Transformer" in self._pipeline_params
+            self.search_parameters["Drop Columns Transformer"]["columns"]
+            if "Drop Columns Transformer" in self.search_parameters
             else None
         )
         index_and_unknown_columns = list(
             self.X.ww.select(["index", "unknown"], return_schema=True).columns
         )
         unknown_columns = list(self.X.ww.select("unknown", return_schema=True).columns)
         if len(index_and_unknown_columns) > 0 and drop_columns is None:
-            self._pipeline_params["Drop Columns Transformer"] = {
+            self.search_parameters["Drop Columns Transformer"] = {
                 "columns": index_and_unknown_columns
             }
             if len(unknown_columns):
                 self.logger.info(
                     f"Removing columns {unknown_columns} because they are of 'Unknown' type"
                 )
-        kina_columns = self._pipeline_params.get("pipeline", {}).get(
+        kina_columns = self.search_parameters.get("pipeline", {}).get(
             "known_in_advance", []
         )
         if kina_columns:
             no_kin_columns = [c for c in self.X.columns if c not in kina_columns]
             kin_name = "Known In Advance Pipeline - Select Columns Transformer"
             no_kin_name = "Not Known In Advance Pipeline - Select Columns Transformer"
-            self._pipeline_params[kin_name] = {"columns": kina_columns}
-            self._pipeline_params[no_kin_name] = {"columns": no_kin_columns}
+            self.search_parameters[kin_name] = {"columns": kina_columns}
+            self.search_parameters[no_kin_name] = {"columns": no_kin_columns}
 
     def _filter_estimators(
         self,