Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add quantized training (CPU part) #5800

Merged
merged 31 commits into from
May 5, 2023
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
8187759
add quantized training (first stage)
shiyu1994 Dec 22, 2022
8480873
Merge remote-tracking branch 'origin/master' into quantized-training
shiyu1994 Mar 23, 2023
9e5d46b
add histogram construction functions for integer gradients
shiyu1994 Mar 23, 2023
dd2a3b4
add stochastic rounding
shiyu1994 Mar 23, 2023
41c6c79
update docs
shiyu1994 Mar 23, 2023
dfb5bc4
fix compilation errors by adding template instantiations
shiyu1994 Mar 23, 2023
d830128
update files for compilation
shiyu1994 Mar 23, 2023
e82675b
fix compilation of gpu version
shiyu1994 Mar 23, 2023
1d68e97
initialize gradient discretizer before share states
shiyu1994 Mar 30, 2023
4ccdf34
Merge remote-tracking branch 'origin/master' into quantized-training
shiyu1994 Mar 30, 2023
27dbf8c
add a test case for quantized training
shiyu1994 Apr 5, 2023
5c8aac1
add quantized training for data distributed training
shiyu1994 Apr 5, 2023
1fd115a
Delete origin.pred
shiyu1994 Apr 6, 2023
197b394
Delete ifelse.pred
shiyu1994 Apr 6, 2023
7140bb8
Delete LightGBM_model.txt
shiyu1994 Apr 6, 2023
1f142d5
remove useless changes
shiyu1994 Apr 6, 2023
22a98b7
Merge remote-tracking branch 'origin/master' into quantized-training
shiyu1994 Apr 19, 2023
bc848a0
Merge branch 'quantized-training' of https://github.com/Microsoft/Lig…
shiyu1994 Apr 19, 2023
d5fc93d
fix lint error
shiyu1994 Apr 19, 2023
ed066d0
remove debug loggings
shiyu1994 Apr 19, 2023
06826f0
fix mismatch of vector and allocator types
shiyu1994 Apr 24, 2023
025ad39
remove changes in main.cpp
shiyu1994 Apr 25, 2023
baef468
fix bugs with uninitialized gradient discretizer
shiyu1994 Apr 25, 2023
ce93015
initialize ordered gradients in gradient discretizer
shiyu1994 Apr 25, 2023
2b1118c
disable quantized training with gpu and cuda
shiyu1994 Apr 25, 2023
487f2c4
fix bug in data parallel tree learner
shiyu1994 Apr 26, 2023
8c0e67b
make quantized training test deterministic
shiyu1994 Apr 26, 2023
6a76fde
make quantized training in test case more accurate
shiyu1994 Apr 26, 2023
0812403
refactor test_quantized_training
shiyu1994 Apr 26, 2023
9c8894b
fix leaf splits initialization with quantized training
shiyu1994 May 4, 2023
788e1aa
check distributed quantized training result
shiyu1994 May 5, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions R-package/src/Makevars.in
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ OBJECTS = \
treelearner/data_parallel_tree_learner.o \
treelearner/feature_parallel_tree_learner.o \
treelearner/gpu_tree_learner.o \
treelearner/gradient_discretizer.o \
treelearner/linear_tree_learner.o \
treelearner/serial_tree_learner.o \
treelearner/tree_learner.o \
Expand Down
1 change: 1 addition & 0 deletions R-package/src/Makevars.win.in
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ OBJECTS = \
treelearner/data_parallel_tree_learner.o \
treelearner/feature_parallel_tree_learner.o \
treelearner/gpu_tree_learner.o \
treelearner/gradient_discretizer.o \
treelearner/linear_tree_learner.o \
treelearner/serial_tree_learner.o \
treelearner/tree_learner.o \
Expand Down
32 changes: 32 additions & 0 deletions docs/Parameters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -658,6 +658,38 @@ Learning Control Parameters

- **Note**: can be used only in CLI version

- ``use_quantized_grad`` :raw-html:`<a id="use_quantized_grad" title="Permalink to this parameter" href="#use_quantized_grad">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool

- whether to use gradient quantization when training

- enabling this will discretize (quantize) the gradients and hessians into bins of ``num_grad_quant_bins``

- with quantized training, most arithmetics in the training process will be integer operations

- gradient quantization can accelerate training, with little accuracy drop in most cases

- **Note**: can be used only with ``device_type = cpu``

- ``num_grad_quant_bins`` :raw-html:`<a id="num_grad_quant_bins" title="Permalink to this parameter" href="#num_grad_quant_bins">&#x1F517;&#xFE0E;</a>`, default = ``4``, type = int

- number of bins to quantization gradients and hessians

- with more bins, the quantized training will be closer to full precision training

- **Note**: can be used only with ``device_type = cpu``

- ``quant_train_renew_leaf`` :raw-html:`<a id="quant_train_renew_leaf" title="Permalink to this parameter" href="#quant_train_renew_leaf">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool

- whether to renew the leaf values with original gradients when quantized training

- renewing is very helpful for good quantized training accuracy for ranking objectives

- **Note**: can be used only with ``device_type = cpu``

- ``stochastic_rounding`` :raw-html:`<a id="stochastic_rounding" title="Permalink to this parameter" href="#stochastic_rounding">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool

- whether to use stochastic rounding in gradient quantization

IO Parameters
-------------

Expand Down
121 changes: 121 additions & 0 deletions include/LightGBM/bin.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,14 @@ enum MissingType {
};

typedef double hist_t;
typedef int32_t int_hist_t;
typedef uint64_t hist_cnt_t;
// check at compile time
static_assert(sizeof(hist_t) == sizeof(hist_cnt_t), "Histogram entry size is not correct");

const size_t kHistEntrySize = 2 * sizeof(hist_t);
const size_t kInt32HistEntrySize = 2 * sizeof(int_hist_t);
const size_t kInt16HistEntrySize = 2 * sizeof(int16_t);
const int kHistOffset = 2;
const double kSparseThreshold = 0.7;

Expand All @@ -56,6 +59,28 @@ inline static void HistogramSumReducer(const char* src, char* dst, int type_size
}
}

inline static void Int32HistogramSumReducer(const char* src, char* dst, int type_size, comm_size_t len) {
const int64_t* src_ptr = reinterpret_cast<const int64_t*>(src);
int64_t* dst_ptr = reinterpret_cast<int64_t*>(dst);
guolinke marked this conversation as resolved.
Show resolved Hide resolved
const comm_size_t steps = (len + (type_size * 2) - 1) / (type_size * 2);
const int num_threads = OMP_NUM_THREADS();
#pragma omp parallel for schedule(static) num_threads(num_threads)
for (comm_size_t i = 0; i < steps; ++i) {
dst_ptr[i] += src_ptr[i];
}
}

inline static void Int16HistogramSumReducer(const char* src, char* dst, int type_size, comm_size_t len) {
const int32_t* src_ptr = reinterpret_cast<const int32_t*>(src);
int32_t* dst_ptr = reinterpret_cast<int32_t*>(dst);
const comm_size_t steps = (len + (type_size * 2) - 1) / (type_size * 2);
const int num_threads = OMP_NUM_THREADS();
#pragma omp parallel for schedule(static) num_threads(num_threads)
for (comm_size_t i = 0; i < steps; ++i) {
dst_ptr[i] += src_ptr[i];
}
}

/*! \brief This class used to convert feature values into bin,
* and store some meta information for bin*/
class BinMapper {
Expand Down Expand Up @@ -332,6 +357,33 @@ class Bin {
const score_t* ordered_gradients, const score_t* ordered_hessians,
hist_t* out) const = 0;

virtual void ConstructHistogramInt8(
const data_size_t* data_indices, data_size_t start, data_size_t end,
const score_t* ordered_gradients, const score_t* ordered_hessians,
hist_t* out) const = 0;

virtual void ConstructHistogramInt8(data_size_t start, data_size_t end,
const score_t* ordered_gradients, const score_t* ordered_hessians,
hist_t* out) const = 0;

virtual void ConstructHistogramInt16(
const data_size_t* data_indices, data_size_t start, data_size_t end,
const score_t* ordered_gradients, const score_t* ordered_hessians,
hist_t* out) const = 0;

virtual void ConstructHistogramInt16(data_size_t start, data_size_t end,
const score_t* ordered_gradients, const score_t* ordered_hessians,
hist_t* out) const = 0;

virtual void ConstructHistogramInt32(
const data_size_t* data_indices, data_size_t start, data_size_t end,
const score_t* ordered_gradients, const score_t* ordered_hessians,
hist_t* out) const = 0;

virtual void ConstructHistogramInt32(data_size_t start, data_size_t end,
const score_t* ordered_gradients, const score_t* ordered_hessians,
hist_t* out) const = 0;

/*!
* \brief Construct histogram of this feature,
* Note: We use ordered_gradients and ordered_hessians to improve cache hit chance
Expand All @@ -351,6 +403,24 @@ class Bin {
virtual void ConstructHistogram(data_size_t start, data_size_t end,
const score_t* ordered_gradients, hist_t* out) const = 0;

virtual void ConstructHistogramInt8(const data_size_t* data_indices, data_size_t start, data_size_t end,
const score_t* ordered_gradients, hist_t* out) const = 0;

virtual void ConstructHistogramInt8(data_size_t start, data_size_t end,
const score_t* ordered_gradients, hist_t* out) const = 0;

virtual void ConstructHistogramInt16(const data_size_t* data_indices, data_size_t start, data_size_t end,
const score_t* ordered_gradients, hist_t* out) const = 0;

virtual void ConstructHistogramInt16(data_size_t start, data_size_t end,
const score_t* ordered_gradients, hist_t* out) const = 0;

virtual void ConstructHistogramInt32(const data_size_t* data_indices, data_size_t start, data_size_t end,
const score_t* ordered_gradients, hist_t* out) const = 0;

virtual void ConstructHistogramInt32(data_size_t start, data_size_t end,
const score_t* ordered_gradients, hist_t* out) const = 0;

virtual data_size_t Split(uint32_t min_bin, uint32_t max_bin,
uint32_t default_bin, uint32_t most_freq_bin,
MissingType missing_type, bool default_left,
Expand Down Expand Up @@ -464,6 +534,57 @@ class MultiValBin {
const score_t* ordered_hessians,
hist_t* out) const = 0;

virtual void ConstructHistogramInt32(const data_size_t* data_indices,
data_size_t start, data_size_t end,
const score_t* gradients,
const score_t* hessians,
hist_t* out) const = 0;

virtual void ConstructHistogramInt32(data_size_t start, data_size_t end,
const score_t* gradients,
const score_t* hessians,
hist_t* out) const = 0;

virtual void ConstructHistogramOrderedInt32(const data_size_t* data_indices,
data_size_t start, data_size_t end,
const score_t* ordered_gradients,
const score_t* ordered_hessians,
hist_t* out) const = 0;

virtual void ConstructHistogramInt16(const data_size_t* data_indices,
data_size_t start, data_size_t end,
const score_t* gradients,
const score_t* hessians,
hist_t* out) const = 0;

virtual void ConstructHistogramInt16(data_size_t start, data_size_t end,
const score_t* gradients,
const score_t* hessians,
hist_t* out) const = 0;

virtual void ConstructHistogramOrderedInt16(const data_size_t* data_indices,
data_size_t start, data_size_t end,
const score_t* ordered_gradients,
const score_t* ordered_hessians,
hist_t* out) const = 0;

virtual void ConstructHistogramInt8(const data_size_t* data_indices,
data_size_t start, data_size_t end,
const score_t* gradients,
const score_t* hessians,
hist_t* out) const = 0;

virtual void ConstructHistogramInt8(data_size_t start, data_size_t end,
const score_t* gradients,
const score_t* hessians,
hist_t* out) const = 0;

virtual void ConstructHistogramOrderedInt8(const data_size_t* data_indices,
data_size_t start, data_size_t end,
const score_t* ordered_gradients,
const score_t* ordered_hessians,
hist_t* out) const = 0;

virtual void FinishLoad() = 0;

virtual bool IsSparse() = 0;
Expand Down
24 changes: 24 additions & 0 deletions include/LightGBM/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -592,6 +592,30 @@ struct Config {
// desc = **Note**: can be used only in CLI version
int snapshot_freq = -1;

// [no-save]
// desc = whether to use gradient quantization when training
// desc = enabling this will discretize (quantize) the gradients and hessians into bins of ``num_grad_quant_bins``
// desc = with quantized training, most arithmetics in the training process will be integer operations
// desc = gradient quantization can accelerate training, with little accuracy drop in most cases
// desc = **Note**: can be used only with ``device_type = cpu``
bool use_quantized_grad = false;

// [no-save]
// desc = number of bins to quantization gradients and hessians
// desc = with more bins, the quantized training will be closer to full precision training
// desc = **Note**: can be used only with ``device_type = cpu``
int num_grad_quant_bins = 4;

// [no-save]
// desc = whether to renew the leaf values with original gradients when quantized training
// desc = renewing is very helpful for good quantized training accuracy for ranking objectives
// desc = **Note**: can be used only with ``device_type = cpu``
bool quant_train_renew_leaf = false;

// [no-save]
// desc = whether to use stochastic rounding in gradient quantization
bool stochastic_rounding = true;

#ifndef __NVCC__
#pragma endregion

Expand Down
19 changes: 12 additions & 7 deletions include/LightGBM/dataset.h
Original file line number Diff line number Diff line change
Expand Up @@ -598,10 +598,11 @@ class Dataset {

MultiValBin* GetMultiBinFromAllFeatures(const std::vector<uint32_t>& offsets) const;

template <bool USE_QUANT_GRAD, int HIST_BITS>
TrainingShareStates* GetShareStates(
score_t* gradients, score_t* hessians,
const std::vector<int8_t>& is_feature_used, bool is_constant_hessian,
bool force_col_wise, bool force_row_wise) const;
bool force_col_wise, bool force_row_wise, const int num_grad_quant_bins) const;

LIGHTGBM_EXPORT void FinishLoad();

Expand Down Expand Up @@ -636,7 +637,7 @@ class Dataset {
void InitTrain(const std::vector<int8_t>& is_feature_used,
TrainingShareStates* share_state) const;

template <bool USE_INDICES, bool USE_HESSIAN>
template <bool USE_INDICES, bool USE_HESSIAN, bool USE_QUANT_GRAD, int HIST_BITS>
void ConstructHistogramsInner(const std::vector<int8_t>& is_feature_used,
const data_size_t* data_indices,
data_size_t num_data, const score_t* gradients,
Expand All @@ -646,14 +647,15 @@ class Dataset {
TrainingShareStates* share_state,
hist_t* hist_data) const;

template <bool USE_INDICES, bool ORDERED>
template <bool USE_INDICES, bool ORDERED, bool USE_QUANT_GRAD, int HIST_BITS>
void ConstructHistogramsMultiVal(const data_size_t* data_indices,
data_size_t num_data,
const score_t* gradients,
const score_t* hessians,
TrainingShareStates* share_state,
hist_t* hist_data) const;

template <bool USE_QUANT_GRAD, int HIST_BITS>
inline void ConstructHistograms(
const std::vector<int8_t>& is_feature_used,
const data_size_t* data_indices, data_size_t num_data,
Expand All @@ -666,21 +668,21 @@ class Dataset {
bool use_indices = data_indices != nullptr && (num_data < num_data_);
if (share_state->is_constant_hessian) {
if (use_indices) {
ConstructHistogramsInner<true, false>(
ConstructHistogramsInner<true, false, USE_QUANT_GRAD, HIST_BITS>(
is_feature_used, data_indices, num_data, gradients, hessians,
ordered_gradients, ordered_hessians, share_state, hist_data);
} else {
ConstructHistogramsInner<false, false>(
ConstructHistogramsInner<false, false, USE_QUANT_GRAD, HIST_BITS>(
is_feature_used, data_indices, num_data, gradients, hessians,
ordered_gradients, ordered_hessians, share_state, hist_data);
}
} else {
if (use_indices) {
ConstructHistogramsInner<true, true>(
ConstructHistogramsInner<true, true, USE_QUANT_GRAD, HIST_BITS>(
is_feature_used, data_indices, num_data, gradients, hessians,
ordered_gradients, ordered_hessians, share_state, hist_data);
} else {
ConstructHistogramsInner<false, true>(
ConstructHistogramsInner<false, true, USE_QUANT_GRAD, HIST_BITS>(
is_feature_used, data_indices, num_data, gradients, hessians,
ordered_gradients, ordered_hessians, share_state, hist_data);
}
Expand All @@ -689,6 +691,9 @@ class Dataset {

void FixHistogram(int feature_idx, double sum_gradient, double sum_hessian, hist_t* data) const;

template <typename PACKED_HIST_BIN_T, typename PACKED_HIST_ACC_T, int HIST_BITS_BIN, int HIST_BITS_ACC>
void FixHistogramInt(int feature_idx, int64_t sum_gradient_and_hessian, hist_t* data) const;

inline data_size_t Split(int feature, const uint32_t* threshold,
int num_threshold, bool default_left,
const data_size_t* data_indices,
Expand Down
Loading