Skip to content

Releases: jinlow/forust

Release v0.2.0

20 Apr 00:32
Compare
Choose a tag to compare

This release is a major refactor to how splitting is handled internally, the external API and python API remain the same. With these changes it will be easier to allow for missing to be treated explicitly while training. Future releases will implement the ability to split out missing into it's own separate branch.

v0.1.7

20 Aug 19:35
86c368c
Compare
Choose a tag to compare

This release adds the following changes to the packages

  • Support for monotonic constraints. Features can now be supplied with a constraint so that they are forced to either have a monotonic increasing, decreasing, or unconstrained relationship with the target variable. This can be adjusted using the monotone_constraints parameter.
  • Experimental support for dealing with missing in different ways. This includes the ability to not allow splits on missing or non-missing alone, as well as not automatically imputing missing, and instead always sending it down a default branch instead of learning the best direction to send it. See the documentation on the allow_missing_splits and allow_missing_splits parameters.
  • The default value of the min_leaf_weight parameter was changed from 0.0, to 1.0.
  • Additional refactoring of the code to better align with modern python type hints, as well as adding pre-commit support for development, and adjusting some of the naming of modules to be clearer.

v0.1.6

19 Aug 14:19
6fef27c
Compare
Choose a tag to compare

This release fixes a bug where the README and LICENSE files aren't included in the source distribution.

v0.1.5

31 Jul 20:11
Compare
Choose a tag to compare
  • Added support and documentation for calculating partial dependency information for a model feature. This allows users to get an estimate of how a given feature is being used in the model.

v0.1.4

18 Jun 19:15
d288c31
Compare
Choose a tag to compare
  • Fix issue where sample weight would become misaligned when binning a feature with missing values.
  • Update links in README to point to correct URLs.

v0.1.3

17 Jun 00:05
e8d6fb6
Compare
Choose a tag to compare

This release introduces many additional optimization, leading to a speedup of more than 7X on data with more than 300K rows.

  • All internal statistics (histograms, gradient/hessian sums) have been converted to using f32 data types. However, for any summing aggregations these values are cast to f64 and then summed, this is to ensure that higher precision is maintained.
  • All gradients are aligned in memory before calculating feature histograms. This led to a about half of the performance improvement.
  • The data is realigned in memory prior to each tree being constructed, this led to most of the remaining speed gain.
  • The histograms, which where originally a hashmap of vectors, has been converted to a jagged matrix, to have a data structure with faster access.

By aligning the data in memory, this reduced the overall number of cache hits, which leads to drastically increased performance.

v0.1.2

09 Jun 00:15
Compare
Choose a tag to compare

v0.1.2

v0.0.1

08 Jun 04:21
681af3f
Compare
Choose a tag to compare

v0.0.1