Regarding the cv_results_ #503

kaiweiang · 2018-06-28T07:13:20Z

Hi,

I was reading the result from cv_results_ and trying to understand

If the holdout resampling strategy is used, how are the mean_test_score and mean_fit_time computed since holdout method only has one train and validation set?
If the resampling strategy is CV with 5 folds, when the status is timeout, does it mean one of the 5 folds runs over the time limit or the total time taken to run these 5 folds run over the time limit?
I'm aware the default ml_memory_limit is ~3GB (3072MB) which is quite large. I'm wondering why some of the algorithm fittings will take more than that and hence, cause the memory out, even my dataset is less than 50 mb? Can you provide some of the scenarios?
For the time and memory limit, are the time taken for data and feature processing included?

Thank you

The text was updated successfully, but these errors were encountered:

kaiweiang · 2018-06-28T07:36:19Z

The other question is, is it possible to estimate the per_run_time_limit and ml_memory_limit based on the size of dataset to minimize the chance of the timeout and memout to occur?

mfeurer · 2018-07-02T08:49:55Z

If the holdout resampling strategy is used, how are the mean_test_score and mean_fit_time computed since holdout method only has one train and validation set?

It's the mean of one repetition -> it's the score on the holdout set.

If the resampling strategy is CV with 5 folds, when the status is timeout, does it mean one of the 5 folds runs over the time limit or the total time taken to run these 5 folds run over the time limit?

In case you're using cv, the time limit is for all five folds. If you use partial-cv, the time limit is per fold (but this disables the use of the ensemble).

I'm aware the default ml_memory_limit is ~3GB (3072MB) which is quite large. I'm wondering why some of the algorithm fittings will take more than that and hence, cause the memory out, even my dataset is less than 50 mb? Can you provide some of the scenarios?

Possible reasons for running over the memory limit are OneHotEncoding and feature expansion mechanism such as random kitchen sinks or the Nyström kernel approximation.

For the time and memory limit, are the time taken for data and feature processing included?

Yes. The time and memory limit are for the execution of the complete pipeline.

The other question is, is it possible to estimate the per_run_time_limit and ml_memory_limit based on the size of dataset to minimize the chance of the timeout and memout to occur?

Potentially yes, but we're not doing this.

kaiweiang · 2018-07-04T14:21:01Z

@mfeurer Thanks for answering all my earlier questions.

I do find out that the target algorithms like random forest has its parameter max_depth set at None which makes the tree expands until all leaves are pure or until all leaves contain less than min_samples_split samples, are more likely to hit memory or time limit. So, is there any way to limit the max_depth to certain depth? I'm thinking set_params but am not sure how to correctly use it.

Apart from that, when initial_configurations_via_metalearning set to 25, are these 25 set of target algorithms randomly chosen from the metalearner?

Thank you

mfeurer · 2018-07-19T14:05:35Z

So, is there any way to limit the max_depth to certain depth?

Not really. You could either change the code or create a new component with this hyperparameter activated and then deactivate the original random forest.

when initial_configurations_via_metalearning set to 25, are these 25 set of target algorithms randomly chosen from the metalearner?

No, they are chosen according a kNN algorithm as described in Initializing Bayesian Hyperparameter Optimization via Meta-Learning.

kaiweiang closed this as completed Feb 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding the cv_results_ #503

Regarding the cv_results_ #503

kaiweiang commented Jun 28, 2018 •

edited

Loading

kaiweiang commented Jun 28, 2018 •

edited

Loading

mfeurer commented Jul 2, 2018

kaiweiang commented Jul 4, 2018 •

edited

Loading

mfeurer commented Jul 19, 2018

Regarding the cv_results_ #503

Regarding the cv_results_ #503

Comments

kaiweiang commented Jun 28, 2018 • edited Loading

kaiweiang commented Jun 28, 2018 • edited Loading

mfeurer commented Jul 2, 2018

kaiweiang commented Jul 4, 2018 • edited Loading

mfeurer commented Jul 19, 2018

kaiweiang commented Jun 28, 2018 •

edited

Loading

kaiweiang commented Jun 28, 2018 •

edited

Loading

kaiweiang commented Jul 4, 2018 •

edited

Loading