Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests][python-package] Modify tests using boston housing dataset #4793

Closed
29 tasks done
Tracked by #5153
jmoralez opened this issue Nov 12, 2021 · 1 comment · Fixed by #5581
Closed
29 tasks done
Tracked by #5153

[tests][python-package] Modify tests using boston housing dataset #4793

jmoralez opened this issue Nov 12, 2021 · 1 comment · Fixed by #5581

Comments

@jmoralez
Copy link
Collaborator

jmoralez commented Nov 12, 2021

Description

It was recently brought to light that the boston housing dataset has an ethical problem (which is more thoroughly described in scikit-learn/scikit-learn#16155) and thus will be removed from scikit-learn in version 1.2 (Olivier Grisel's explanation). Since some of the tests for the python package rely on this dataset (sample logs with the warning) they should be changed to use a different dataset.

Tests currently using the boston dataset:

  • test_engine::test_regression
  • test_engine::test_continue_train
  • test_engine::test_continue_train_reused_dataset
  • test_engine::test_continue_train_dart
  • test_engine::test_cv
  • test_engine::test_feature_name
  • test_engine::test_save_load_copy_pickle
  • test_engine::test_mape_rf
  • test_engine::test_mape_dart
  • test_engine::test_model_size
  • test_engine::test_get_split_value_histogram
  • test_engine::test_early_stopping_for_only_first_metric
  • test_engine::test_extra_trees
  • test_engine::test_path_smoothing
  • test_engine::test_interaction_constraints
  • test_engine::test_predict_with_start_iteration
  • test_engine::test_force_split_with_feature_fraction
  • test_sklearn::test_regression
  • test_sklearn::test_objective_aliases
  • test_sklearn::test_regression_with_custom_objective
  • test_sklearn::test_dart
  • test_sklearn::test_stacking_regressor
  • test_sklearn::test_clone_and_property
  • test_sklearn::test_joblib
  • test_sklearn::test_non_serializable_objects_in_callbacks
  • test_sklearn::test_evaluate_train_set
  • test_sklearn::test_metrics
  • test_sklearn::test_first_metric_only
  • test_sklearn::test_training_succeeds_when_data_is_dataframe_and_label_is_column_array
@StrikerRUS
Copy link
Collaborator

Thanks for writing this up!
I just converted bulleted list into TODO list to track the progress.

Also, we should stop accept PRs with tests that utilizes this dataset.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants