You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project's continuous integration (CI) should include a job which tests that LightGBM Dataset binary files produced by previous versions can be successfully loaded and used in newer versions.
Specifically, it should test the following claim:
Binary Dataset files produced in LightGBM version (N).x.x should be readable and usable in all versions in the same major version series.
It should also include tests of expected compatibility between other versions. For example, if 4.0.0 does not include breaking changes to saving / loading of Dataset files, then a test should be added that such a file created in LightGBM 3.2.1 can be loaded in LightGBM 4.0.0
Motivation
LightGBM uses semantic versioning for releases. As a result, users expect that there will not be breaking changes within a major release series. For example, they expect that a Dataset saved to a binary file using LightGBM 3.1.0 will be readable in any other LightGBM 3.x.x release.
Adding explicit tests on that fact might provide greater confidence that releases are not introducing such changes.
Description
LightGBM performs several preprocessing steps on training data before beginning the boosting process. Those steps are performed in the construction of a Dataset object, a LightGBM-specific format for training data.
To support use cases like hyperparameter tuning, where users want to train many models using the same Dataset, LightGBM can save a constructed Dataset to a binary file.
This issue has been added to #2302 with other feature requests. I'd like to leave it open for a few days in case others want to add comments, since I just locked discussion on #4228.
After a few days, this issue will be closed until someone leaves a comment saying they'd like to work on it.
Ok now that this has been open for a few days, I am going to close it. If you're reading this and would like to work on this, please comment below and it can be re-opened!
Summary
This project's continuous integration (CI) should include a job which tests that LightGBM
Dataset
binary files produced by previous versions can be successfully loaded and used in newer versions.Specifically, it should test the following claim:
It should also include tests of expected compatibility between other versions. For example, if 4.0.0 does not include breaking changes to saving / loading of
Dataset
files, then a test should be added that such a file created in LightGBM 3.2.1 can be loaded in LightGBM 4.0.0Motivation
LightGBM uses semantic versioning for releases. As a result, users expect that there will not be breaking changes within a major release series. For example, they expect that a
Dataset
saved to a binary file using LightGBM 3.1.0 will be readable in any other LightGBM 3.x.x release.Adding explicit tests on that fact might provide greater confidence that releases are not introducing such changes.
Description
LightGBM performs several preprocessing steps on training data before beginning the boosting process. Those steps are performed in the construction of a
Dataset
object, a LightGBM-specific format for training data.To support use cases like hyperparameter tuning, where users want to train many models using the same
Dataset
, LightGBM can save a constructedDataset
to a binary file.LightGBM/src/c_api.cpp
Line 1435 in 0701a32
That
Dataset
can then be loaded back and used for training, without repeating those pre-processing steps.LightGBM/src/c_api.cpp
Line 897 in 0701a32
References
Created based on #4228 (comment).
The text was updated successfully, but these errors were encountered: