-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model can't be reproduced when data is big with tree_method = "gpu_hist" ? #3921
Comments
@hcho3 I'm not sure whether the seed has effect on ColumnSampler, simple grepping on the c++ source code doesn't have many mentions on |
@trivialfis This line sets the seed globally: Line 300 in 0a0d423
|
@hcho3 Thanks! |
@trivialfis I think it might be not related to the random seed because the sample number has effect on the model result. Testing with cuda-9.2 and cuda-9.1 with the latest master branch, problem remains on cuda-9.2, while results are the same on cuda-9.1. I test another real data, although the difference all still exists, cuda-9.1 is smaller and seems more robust. Maybe related to cuda version? Could you reproduce my problem? |
@joegaotao I'm running CUDA 9.2 and I can reproduce your issue in R, but not in Python, which makes thing even weirder. I will try to instrument CUDA code when I have the time. |
@trivialfis I wonder if it has to do with the fact that XGBoost-R uses its own random generator. See #3781. |
@hcho3 @trivialfis Yes, parameter |
Reopening. We will look into this eventually. |
I have been running into the same issue of non-reproducible result with python xgboost when I run the 'tree_method' : 'gpu_hist'. I have set the np.random.seed() just before I run the xgb.cv and passed the 'seed' in the parameter too in the xgb.cv. But can't get the same result while re-running the model with the same data and same parameters |
@thanish Interesting, could you post related section of your script and describe the data shape and sparsity? |
My data is of shape 1600000, 26 Below is my code
Output of the first run
output of the 2nd run without any changes
|
@RAMitchell I tried to print the values of inputs and output of |
It's not related to CUDA version. We reproduced your problem* in yet another, even older CUDA version - 9.0.176. The positive connection of the bug occurence to data size seems to be there, but we haven't tested it thoroughly. *of XGBoost GPU (histogram) predictions instability between program runs (despite having fixed seeds and getting unchanged predictions in the CPU) |
Using |
Closing in favour of #5023 |
I did some tests in R(xgboost-0.81.0.1), when data N is big, I found the models trained with the same parameter are not the same on GPU (
tree_method = "gpu_hist"
), when data N is relatively small, the models are the same. But when I usetree_method = "hist"
to train the model repeatedly on cpu, all the models result are the same. I don't know what happened on GPU training, due to the precision?GPU test code, big data:
the difference is big, but change
N
to 80000 or replacetree_method = "gpu_hist"
totree_method = "hist"
, the results are the same.The text was updated successfully, but these errors were encountered: