-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
device=cuda_exp is slower than device=cuda on lightgbm.cv #5693
Comments
I was able to reproduce this for just plain With This is on commit 9954bc4 just before
|
@ninist Thanks for the detailed benchmarking. I think cross validation requires more rounds of data loading. Currently the data loading part for Is the training time of |
Yes, still the same outcome of
Larger training dataI tried changing the start year of the range from 2019 to 1800 to see if more training data changes the outcome (1940641 rows now) -- still the same outcome.
|
Sorry for the long delay in response. The implementation that used to be called That implementation has also received significant improvements in the 14 months since there was last activity on this discussion, including:
Could you please try again, and hopefully with a smaller reproducible example? I'm marking this |
This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM! |
Edit 2023-02-09
I tried a simplified case without using RFECV in a followup-comment, and the issue is reproducible just using
lightgbm.cv
.Description
I built two different versions of lightgbm - the first with
cuda_exp
and the second withcuda
.I do feature selection with
sklearn.RFECV
.I instantiate a lightgbm.LGBMRegressor, and
N
times I callRFECV(model, n_jobs=None, ...)
, each time with a slightly different subset of the training data (simply selecting a subset of 95% of the data each call).The idea behind performing several (e.g. 25) RFECV-runs is to eliminate variability in the returned selected features.
The result is a collections.Counter object that counts how many times each feature was selected out of the
N
runs.The issue is that
device="cuda_exp"
is much slower thandevice="cuda"
.Specifically, if I import the module compiled with
cuda_exp
, bothdevice="cuda_exp"
anddevice="gpu"
are much slower thanwhat they are if I import the module compiled with the older
cuda
.Reproducible example
See code at the end of the post
Environment info
LightGBM version or commit hash:
Command(s) you used to install LightGBM
commit: 9954bc4
Hardware:
Code to run a particular model
Please note that the dataset below is 100% bogus. I unfortunately cannot share the real dataset. I made a lazy attempt to make the example-dataset below have the same number of features and rows and approximate range/collection of values as the real dataset.
To run a model, pass in one of the options:
cuda
,cuda_gpu
,cuda_exp
,cuda_exp_gpu
,cpu
.The invocation will produce a logfile with the time it took to run it. I have included a log file from my own invocations below.
Timing results
Below, each record is one execution of the above program. The identifier
ARG[DEVICE]
serves to show which library was imported, and which device was passed. The number after it is the time in seconds, and following the time is a textual representation of the model that was fitted.For
cpu
the library is the older cuda (no runs were performed withdevice="cpu"
on the cuda_exp-library).I note that CPU is faster in this case, though cuda_exp is almost 4x as slow as cuda.
Is this caused by cuda_exp having higher overhead than cuda?
I tried some tweaking of the suggested options like using double precision, changing metric from l1 to l2 to perform compute on the gpu, tweaking max bins, and trying to get rid of sparseness-warnings thrown by some models:
I had no luck with improving the performance of cuda_exp doing this.
Lastly, I would be inclined to agree that RFECV may interact poorly with CUDA/GPU-computing.
The text was updated successfully, but these errors were encountered: