Releases: jinlow/forust
Releases · jinlow/forust
Release v0.2.0
This release is a major refactor to how splitting is handled internally, the external API and python API remain the same. With these changes it will be easier to allow for missing to be treated explicitly while training. Future releases will implement the ability to split out missing into it's own separate branch.
v0.1.7
This release adds the following changes to the packages
- Support for monotonic constraints. Features can now be supplied with a constraint so that they are forced to either have a monotonic increasing, decreasing, or unconstrained relationship with the target variable. This can be adjusted using the
monotone_constraints
parameter. - Experimental support for dealing with missing in different ways. This includes the ability to not allow splits on missing or non-missing alone, as well as not automatically imputing missing, and instead always sending it down a default branch instead of learning the best direction to send it. See the documentation on the
allow_missing_splits
andallow_missing_splits
parameters. - The default value of the
min_leaf_weight
parameter was changed from 0.0, to 1.0. - Additional refactoring of the code to better align with modern python type hints, as well as adding pre-commit support for development, and adjusting some of the naming of modules to be clearer.
v0.1.6
v0.1.5
v0.1.4
v0.1.3
This release introduces many additional optimization, leading to a speedup of more than 7X on data with more than 300K rows.
- All internal statistics (histograms, gradient/hessian sums) have been converted to using
f32
data types. However, for any summing aggregations these values are cast tof64
and then summed, this is to ensure that higher precision is maintained. - All gradients are aligned in memory before calculating feature histograms. This led to a about half of the performance improvement.
- The data is realigned in memory prior to each tree being constructed, this led to most of the remaining speed gain.
- The histograms, which where originally a hashmap of vectors, has been converted to a jagged matrix, to have a data structure with faster access.
By aligning the data in memory, this reduced the overall number of cache hits, which leads to drastically increased performance.