-
Notifications
You must be signed in to change notification settings - Fork 833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lightgbm run twice with the same parameters, but got different result in validation #564
Comments
@wxh001qq would you be able to send a sample? I believe in spark distributed case, the order of the rows can be different from run to run, eg see this: |
the sample: 1, load data 2, to VectorAssembler 3, train lgb
|
@imatiach-msft Hello. I am facing the same issue. Is there a way to fix the random_state ?? Or any other parameter ? |
@KeertiBafna what version of lightgbm are you using in python? I wonder if it's a python version difference. It may also be that the dataset coming in is different between spark and pandas (eg the precision of the values may be different). |
@imatiach-msft Thanks for the quick reply. |
@imatiach-msft I am using the mmlspark version 0.17. Workspace is azure databricks. |
@imatiach-msft Any updates on this issue ? How do i get consistent results using mmlspark. The version is 0.17. |
i have the same question. same train dataset, input in different order (row order different), and same test dataset, but ndcg is different. does input order important? |
I meet the same problem in the version 0.18.1。I see there is only a param called baggingSeed which controls the bagging fraction, so the param "feature_fraction_seed" may be lacked. Is this issue resolved in the latest version? |
Also facing this issue right now. I'm providing a specific train set and test set, and the test set evaluation metric is different every time I train the model (LightGBM regressor). Any ideas @imatiach-msft ? |
in R, we have to set seed() before running machine learning algorithms, including GBM, because these algorithms involve stochastic processes. |
closing as lightgbm is now deterministic with merged PR #1387 as long as seed and deterministic=True parameters are set |
I run the lightgbm twice with the same parameters, but got different result in validation. I find the only random seed parameter is baggingSeed. After fixed baggingSeed, the problem also occured. Should I fix any other parameters? Thanks.
AB#1751559
The text was updated successfully, but these errors were encountered: