Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When loading a binary file, take feature penalty from config if given there. #1881

Merged
merged 6 commits into from
Jan 16, 2019

Conversation

remcob-gr
Copy link
Contributor

This is #1862 , rebased on top of master.

This enables users to override the feature penalty when loading a dataset from a binary file.
For instance, it enables disabling/enabling features via:

dataset = lgbm.Dataset('data.bin',params={'feature_contri': [0,1,1,0]})
....

The above is especially useful when the data set is large, so the cost of recreating the data set from the source data is large.

@guolinke
Copy link
Collaborator

Thanks, this is indeed useful.
BTW, there are other similar parameters in dataset, like monotone_constraints.
it would be better to support them as well.

@StrikerRUS
Copy link
Collaborator

@remcob-gr Can you please update this PR and include the same logic for other similar params?

@remcob-gr
Copy link
Contributor Author

@StrikerRUS : I'll take care of the other parameters now.
The commit I just pushed is a small bug fix, it ensures that this change correctly handles the distinction between num_features_ and num_total_features_.

@remcob-gr
Copy link
Contributor Author

remcob-gr commented Jan 14, 2019

204de56 adds support for monotone_constraints as requested by @guolinke .
@StrikerRUS: Could you review and let me know if you need any further changes?

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@remcob-gr Many thanks!
Please fix indents below.

src/io/dataset_loader.cpp Outdated Show resolved Hide resolved
src/io/dataset_loader.cpp Outdated Show resolved Hide resolved
@StrikerRUS StrikerRUS requested a review from guolinke January 14, 2019 21:20
@remcob-gr
Copy link
Contributor Author

@StrikerRUS : The indents should be fixed now, could you check?

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @remcob-gr for your contribution!

@StrikerRUS StrikerRUS merged commit 6152785 into microsoft:master Jan 16, 2019
@StrikerRUS
Copy link
Collaborator

This PR introduced warnings with both gcc and Clang:

[ 40%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/dataset_loader.cpp.o
/home/travis/build/Microsoft/LightGBM/src/io/dataset_loader.cpp:374:40: warning: 
      comparison of integers of different signs: 'int' and
      'std::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    CHECK(dataset->num_total_features_ == config_.monotone_constraints.size());
          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/travis/build/Microsoft/LightGBM/include/LightGBM/utils/log.h:22:9: note: 
      expanded from macro 'CHECK'
  if (!(condition)) Log::Fatal("Check failed: " #condition \
        ^~~~~~~~~
/home/travis/build/Microsoft/LightGBM/src/io/dataset_loader.cpp:397:40: warning: 
      comparison of integers of different signs: 'int' and
      'std::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    CHECK(dataset->num_total_features_ == config_.feature_contri.size());
          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/travis/build/Microsoft/LightGBM/include/LightGBM/utils/log.h:22:9: note: 
      expanded from macro 'CHECK'
  if (!(condition)) Log::Fatal("Check failed: " #condition \

https://travis-ci.org/Microsoft/LightGBM/jobs/480404567#L737

@guolinke
Copy link
Collaborator

it needs a static_cast for num_total_features to size_t.

@lock lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants