lightgbm run twice with the same parameters, but got different result in validation #564

wxh001qq · 2019-05-09T08:10:45Z

I run the lightgbm twice with the same parameters, but got different result in validation. I find the only random seed parameter is baggingSeed. After fixed baggingSeed, the problem also occured. Should I fix any other parameters? Thanks.

AB#1751559

imatiach-msft · 2019-05-09T13:47:27Z

@wxh001qq would you be able to send a sample? I believe in spark distributed case, the order of the rows can be different from run to run, eg see this:
https://issues.apache.org/jira/browse/SPARK-16207
eg one of the comments from spark commiter for that issue:
"Generally, things like RDD and DataFrame don't guarantee any order at all, unless they are product of an ordering operation like sort. I don't think blogs/SO are relevant as much as Spark docs, and they do cover this in places "

wxh001qq · 2019-05-13T06:44:41Z

the sample:

1, load data
trainDF = spark.read.format('csv').options(header='true', inferSchema='true').load('xxx/')

2, to VectorAssembler
assembler = VectorAssembler(
inputCols=feature_cols,
outputCol="features")
train_data = assembler.transform(trainDF).select('features', col('Label').alias('label'))

3, train lgb
classifier = LightGBMClassifier(learningRate=0.05, numIterations=100, numLeaves=70, maxDepth=10).fit(train_data)

and, when we train XGboost (ml.dmlc.xgboost4j, on spark) in the same way, the result can be reproduced.

kbafna-antuit · 2019-11-20T08:03:57Z

@imatiach-msft Hello. I am facing the same issue. Is there a way to fix the random_state ?? Or any other parameter ?

imatiach-msft · 2019-11-20T16:18:55Z

@KeertiBafna what version of lightgbm are you using in python? I wonder if it's a python version difference. It may also be that the dataset coming in is different between spark and pandas (eg the precision of the values may be different).

kbafna-antuit · 2019-11-21T19:13:27Z

@KeertiBafna what version of lightgbm are you using in python? I wonder if it's a python version difference. It may also be that the dataset coming in is different between spark and pandas (eg the precision of the values may be different).

@imatiach-msft Thanks for the quick reply.
I am using the mmlspark version 0.17 on azure databricks.

kbafna-antuit · 2019-11-22T06:07:15Z

@imatiach-msft I am using the mmlspark version 0.17. Workspace is azure databricks.
Even if i set the bagging seed constant and re-run the model i get different accuracies on the test set each time. Is it inherent in databricks due to parallelism ?
The objective is for me to tune the hyper parameters. Could you kindly suggest a way to do this ?

kbafna-antuit · 2020-03-26T05:51:51Z

@imatiach-msft Any updates on this issue ? How do i get consistent results using mmlspark. The version is 0.17.
I am still getting different results each time i run the model with the same parameters and data.
Thanks.

LIkensust · 2020-04-14T12:09:55Z

i have the same question. same train dataset, input in different order (row order different), and same test dataset, but ndcg is different. does input order important?

yangbingjiao · 2020-05-10T08:17:43Z

I meet the same problem in the version 0.18.1。I see there is only a param called baggingSeed which controls the bagging fraction, so the param "feature_fraction_seed" may be lacked. Is this issue resolved in the latest version?

andrew-arkhipov · 2021-09-08T14:14:33Z

Also facing this issue right now. I'm providing a specific train set and test set, and the test set evaluation metric is different every time I train the model (LightGBM regressor). Any ideas @imatiach-msft ?

shenglaiyin · 2021-09-14T08:48:25Z

in R, we have to set seed() before running machine learning algorithms, including GBM, because these algorithms involve stochastic processes.

imatiach-msft · 2022-04-18T04:40:42Z

closing as lightgbm is now deterministic with merged PR #1387 as long as seed and deterministic=True parameters are set

imatiach-msft self-assigned this May 16, 2019

imatiach-msft mentioned this issue Jun 5, 2019

Training LightGBMRanker several times gives different NDCG on testing set #580

Open

eisber added the area/lightgbm label Mar 7, 2020

imatiach-msft mentioned this issue Jan 12, 2022

using same dataset and same parameters while trained two different model #1332

Closed

imatiach-msft mentioned this issue Feb 8, 2022

fix: add seed parameters to lightgbm #1387

Merged

imatiach-msft closed this as completed Apr 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lightgbm run twice with the same parameters, but got different result in validation #564

lightgbm run twice with the same parameters, but got different result in validation #564

wxh001qq commented May 9, 2019 •

edited by mhamilton723

Loading

imatiach-msft commented May 9, 2019 •

edited

Loading

wxh001qq commented May 13, 2019

kbafna-antuit commented Nov 20, 2019

imatiach-msft commented Nov 20, 2019

kbafna-antuit commented Nov 21, 2019

kbafna-antuit commented Nov 22, 2019

kbafna-antuit commented Mar 26, 2020

LIkensust commented Apr 14, 2020

yangbingjiao commented May 10, 2020

andrew-arkhipov commented Sep 8, 2021 •

edited

Loading

shenglaiyin commented Sep 14, 2021

imatiach-msft commented Apr 18, 2022

lightgbm run twice with the same parameters, but got different result in validation #564

lightgbm run twice with the same parameters, but got different result in validation #564

Comments

wxh001qq commented May 9, 2019 • edited by mhamilton723 Loading

imatiach-msft commented May 9, 2019 • edited Loading

wxh001qq commented May 13, 2019

3, train lgb classifier = LightGBMClassifier(learningRate=0.05, numIterations=100, numLeaves=70, maxDepth=10).fit(train_data)

kbafna-antuit commented Nov 20, 2019

imatiach-msft commented Nov 20, 2019

kbafna-antuit commented Nov 21, 2019

kbafna-antuit commented Nov 22, 2019

kbafna-antuit commented Mar 26, 2020

LIkensust commented Apr 14, 2020

yangbingjiao commented May 10, 2020

andrew-arkhipov commented Sep 8, 2021 • edited Loading

shenglaiyin commented Sep 14, 2021

imatiach-msft commented Apr 18, 2022

wxh001qq commented May 9, 2019 •

edited by mhamilton723

Loading

imatiach-msft commented May 9, 2019 •

edited

Loading

3, train lgb
classifier = LightGBMClassifier(learningRate=0.05, numIterations=100, numLeaves=70, maxDepth=10).fit(train_data)

andrew-arkhipov commented Sep 8, 2021 •

edited

Loading