YAHPO Gym always requries full configuration, also in case of forbidden hyperparameters #94

LukasFehring · 2025-03-05T10:53:45Z

I observed this behavior, for example, in the case of "rbv2_ranger" on '470'.

The following example is created with a fresh environment and yahpogym . check=false is required because config space 0.6.1 does not contain the needed check_valid_configuration method

from yahpo_gym import benchmark_set

benchmark = benchmark_set.BenchmarkSet(scenario="rbv2_ranger", check=False)
benchmark.set_instance(value="470")

config = {
    "min.node.size": 50,
    "mtry.power": 0.0,
    "num.impute.selected.cpo": "impute.mean",
    "num.trees": 1000,
    "respect.unordered.factors": "ignore",
    "sample.fraction": 0.55,
    "splitrule": "gini",
    "task_id": "470",
}

print(benchmark.objective_function(config))

The text was updated successfully, but these errors were encountered:

sumny · 2025-03-05T20:51:50Z

Thanks for opening this issue.
Can you maybe elaborate a bit on what exactly the issue is and what the expected behavior should be?
Let me elaborate:

benchmark.get_opt_space()

Configuration space object:
  Hyperparameters:
    min.node.size, Type: UniformInteger, Range: [1, 100], Default: 50
    mtry.power, Type: UniformFloat, Range: [0.0, 1.0], Default: 0.0
    num.impute.selected.cpo, Type: Categorical, Choices: {impute.mean, impute.median, impute.hist}, Default: impute.mean
    num.random.splits, Type: UniformInteger, Range: [1, 100], Default: 1
    num.trees, Type: UniformInteger, Range: [1, 2000], Default: 1000
    repl, Type: Categorical, Choices: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, Default: 10
    replace, Type: Categorical, Choices: {TRUE, FALSE}, Default: TRUE
    respect.unordered.factors, Type: Categorical, Choices: {ignore, order, partition}, Default: ignore
    sample.fraction, Type: UniformFloat, Range: [0.1, 1.0], Default: 0.55
    splitrule, Type: Categorical, Choices: {gini, extratrees}, Default: gini
    task_id, Type: Constant, Value: 470
    trainsize, Type: UniformFloat, Range: [0.03, 1.0], Default: 0.525
  Conditions:
    num.random.splits | splitrule == 'extratrees'

tells you how a configuration should look like based on the optimization space (which sets the instance value to a constant and potentially drops fidelity parameters fixing them at the highest value); i.e. benchmark.config_space is not necessarily the search space that is optimized over but contains more parameters than the actual optimization space which you get with get_opt_space().

if we sample a point from the optimization space we can also see what the benchmark expects to be part of a configuration:

benchmark.get_opt_space().sample_configuration(1)

Configuration(values={
  'min.node.size': 71,
  'mtry.power': 0.9155386229529013,
  'num.impute.selected.cpo': 'impute.hist',
  'num.random.splits': 34,
  'num.trees': 1485,
  'repl': '1',
  'replace': 'TRUE',
  'respect.unordered.factors': 'partition',
  'sample.fraction': 0.16011881922566848,
  'splitrule': 'extratrees',
  'task_id': '470',
  'trainsize': 0.5538487728906515,
})

i.e., a configuration must always contain all parameters that are active.
This is due to the surrogate model being trained over all instances and points for a given scenario and being able to handle missing value imputation and therefore if you disable checks, the surrogate will still return a prediction (even for an incomplete configuration, because it can handle missing values, but it will likely return output that is not sensible).

If you keep check = True

benchmark = benchmark_set.BenchmarkSet(scenario="rbv2_ranger", check=True)

config = {
    "min.node.size": 50,
    "mtry.power": 0.0,
    "num.impute.selected.cpo": "impute.mean",
    "num.trees": 1000,
    "respect.unordered.factors": "ignore",
    "sample.fraction": 0.55,
    "splitrule": "gini",
    "task_id": "470",
}

print(benchmark.objective_function(config))

you will actual be told that your point is not fully specified:

ValueError: Active hyperparameter 'repl' not specified!

So in general, unless you always specify points fully, setting check = False can be misleading because the surrogate still tries to predict values for a configuration that it has never seen and actually also does not exist in this space (as it requires the full specification of all active parameters part of the optimization space).

What I am not sure about is the "check=False is required because ConfigSpace 0.6.1 does not contain the needed check_valid_configuration method" part of your question.
Can you provide more details here? As far as I recall, check = True here will perform an internal check of the configuration that you provide prior to evaluating it with the surrogate model and this should be done with the check_configuration method of the config_space object itself of class ConfigSpace.configuration_space.ConfigurationSpace.
Admittedly, YAHPO Gym still is in need of some overhaul to work with newer versions of ConfigSpace (which hopefully will eventually be done with a v2) but the check itself should work.

LukasFehring · 2025-03-06T14:28:26Z

Hi, Thank you for your swift answer :)

Yes, the issue appears to be caused by us using the new configspace. Due to the old ConfigSpace, SMAC and other libraries can, by default, not be used with yahpogym. For that reason, we use an updated form of configspace created by @benjamc and patch local references. This was done for CARP-S.

git clone https://github.com/benjamc/yahpo_gym.git lib/yahpo_gym
$CONDA_RUN_COMMAND $PIP install -e lib/yahpo_gym/yahpo_gym
cd $CARPS_ROOT/carps
mkdir benchmark_data
cd benchmark_data
git clone https://github.com/slds-lmu/yahpo_data.git
cd ../..
$CONDA_RUN_COMMAND python $CARPS_ROOT/scripts/patch_yahpo_configspace.py
$CONDA_RUN_COMMAND $PIP install ConfigSpace --upgrade

In order to still be able to use the library, we start from a default configuration and replace all optimized parameters as indicated as indicated below. Would you suggest setting those parameters differently? We would assume that the surrogate would be trained with the defaults?

def _train(self, config: Configuration, seed: int = 0):
    # Start with default config and replace values. Otherwise YahpoGym fails
    final_config = self.benchmark._get_config_space().get_default_configuration()
    for name, value in config.items():
        final_config[name] = value

    res = self.benchmark.objective_function(configuration=final_config)

sumny changed the title ~~YahpoGym always requries full configuraiton, also in case of forbidden hyperparameters~~ YAHPO Gym always requries full configuration, also in case of forbidden hyperparameters Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YAHPO Gym always requries full configuration, also in case of forbidden hyperparameters #94

YAHPO Gym always requries full configuration, also in case of forbidden hyperparameters #94

LukasFehring commented Mar 5, 2025 •

edited

Loading

sumny commented Mar 5, 2025 •

edited

Loading

LukasFehring commented Mar 6, 2025 •

edited

Loading

YAHPO Gym always requries full configuration, also in case of forbidden hyperparameters #94

YAHPO Gym always requries full configuration, also in case of forbidden hyperparameters #94

Comments

LukasFehring commented Mar 5, 2025 • edited Loading

sumny commented Mar 5, 2025 • edited Loading

LukasFehring commented Mar 6, 2025 • edited Loading

LukasFehring commented Mar 5, 2025 •

edited

Loading

sumny commented Mar 5, 2025 •

edited

Loading

LukasFehring commented Mar 6, 2025 •

edited

Loading