-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use less memory when decreasing parameter max_bin
#6319
Comments
Thanks for using LightGBM. Can you share the code you used to estimate that memory usage? For example, is that the memory usage of just the I ask because it's possible that for a sufficiently large model (in terms of |
Great, thanks for that! So to me, it looks like this statement is not true:
It seems that you did observe smaller memory footprint using a smaller And that the size of the model is the dominant source of memory usage in your application, not the
I recommend trying some combination of the the following to reduce the size of the model:
You can also try quantized training, which is available in the CUDA version since #5933. See https://lightgbm.readthedocs.io/en/latest/Parameters.html#use_quantized_grad. With quantized training, the gradients and hessians are represented with smaller data types. That allows you to trade some precision in exchange for lower memory usage. |
Thanks for taking the time. Although I am not sure it is the model taking this amount of space, I am starting to realize other necessary data structures are consuming memory as well (like you mention the gradients and hessians). I will experiment with quantized training. Thanks for the tip. |
You are totally right! It was a bit imprecise for me to say "the model". The training-time memory usage has these 4 main sources:
You can avoid the memory usage for the raw data by constructing a You can reduce the memory usage of the You can reduce the memory usage of the You can reduce the memory usage of the other data structures by trying quantized training. If you have a lot of rows and any are identical or very similar, you could also try collapsing those into a single row and using weighted training to capture the relative representation of those samples in the whole training dataset. We should get more of this information into the docs, sorry 😅 |
The other complication here in your case is which of these data structures are stored on the host memory, the GPU's memory, or both. That's an are of active development in LightGBM right now. If you're familiar with CUDA and want to look through the code here, we'd welcome contributions that identify ways to cut out any unnecessary copies being held in both places. |
Summary
Smaller
max_bin
should decrease the memory footprint used during training. In my tests, it does not.Motivation
Less memory requirement makes it possible to train on larger datasets. This is especially important in
gpu
andcuda
mode, where VRAM is scarce.Description
It is recommended to test different
max_bin
settings forgpu
andcuda
, to speed up the training, like15
,63
, and255
. While testing different settings, there was no significant change in the memory usage of the GPU. This is weird, as each value in the training array should require less number of bits (4 bits for15
, 6 bits for63
and 8 bits for255
). I can appreciate that it is hard to do, given that all of these sizes are equal to, or less than, 1 byte. Is it possible?References
Test results from my particular dataset (running
mse
regression):Data shape (41_865_312, 88) and 14.0 GB (float32) size in numpy before constructing LightGBM dataset.
max_bin
Finally, the GPU memory usage is more than half that of the numpy memory usage (that is using single precision floats). Shouldn't the memory usage be a quarter of that (like 3500 MB)?
Btw, the recently added
cuda
support is a tremendous improvement over the oldgpu
.The text was updated successfully, but these errors were encountered: