Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent results from LGBMRegressor between versions 3.2.1 and 3.3.5 #5913

Closed
mayashaked opened this issue Jun 8, 2023 · 6 comments
Closed
Labels

Comments

@mayashaked
Copy link

mayashaked commented Jun 8, 2023

Description

I have noticed a discrepancy in the output of the LGBMRegressor model when using version 3.2.1 vs. version 3.3.5 of LightGBM. Even when trained on the same data and with identical parameters, the model yields a different R-squared score. The discrepancy appears to be due to the colsample_bytree parameter.

Reproducible example

import sys
import lightgbm as lgbm
import numpy as np
import pandas as pd

RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)

x1 = np.random.uniform(1, 5, size=1000)
x2 = np.random.uniform(-1, 1, size=1000)
y = 3.4 * np.sin(1.3 + x1) + 2.2 * x2
samples = pd.DataFrame({"x1": x1, "x2": x2})

print(f"Python version {sys.version}")
print(f"LightGBM version {lgbm.__version__}")
print(f"NumPy version {np.__version__}")
print(f"pandas version {pd.__version__}")

for colsample_bytree in [0.4, 0.6, 0.8, 1.0]:
    model = lgbm.LGBMRegressor(colsample_bytree=colsample_bytree, random_state=RANDOM_SEED).fit(samples, y)
    
    score = model.score(samples, y)
    print(f"colsample_bytree {colsample_bytree}")
    print(f"score {score}")
    print(f"feature importance {model.feature_importances_}")

On version 3.3.5, the above yields:

Python version 3.9.16 (main, May 30 2023, 14:12:59) 
[Clang 14.0.3 (clang-1403.0.22.14.1)]
LightGBM version 3.3.5
NumPy version 1.24.3
pandas version 1.5.3
colsample_bytree 0.4
score 0.9989950421955228
feature importance [1500 1500]
colsample_bytree 0.6
score 0.9989950421955228
feature importance [1500 1500]
colsample_bytree 0.8
score 0.9993677373164579
feature importance [1627 1373]
colsample_bytree 1.0
score 0.9993677373164579
feature importance [1627 1373]

On version 3.2.1, it yields:

Python version 3.9.16 (main, Dec  7 2022, 10:15:43) 
[Clang 14.0.0 (clang-1400.0.29.202)]
LightGBM version 3.2.1
NumPy version 1.24.3
pandas version 1.5.3
colsample_bytree 0.4
score 0.723396182441457
feature importance [3000    0]
colsample_bytree 0.6
score 0.723396182441457
feature importance [3000    0]
colsample_bytree 0.8
score 0.9993677373164579
feature importance [1627 1373]
colsample_bytree 1.0
score 0.9993677373164579
feature importance [1627 1373]

The discrepancy also exists when using more than two features. For example, I also tried using a sample dataset with three features defined as follows:

x1 = np.random.uniform(1, 5, size=1000)
x2 = np.random.uniform(-1, 1, size=1000)
x3 = np.random.uniform(-2, 2, size=1000) # added a third feature
y = 3.4 * np.sin(1.3 + x1) + 2.2 * x2
samples = pd.DataFrame({"x1": x1, "x2": x2, "x3": x3})

On 3.3.5, the code yields:

Python version 3.9.16 (main, May 30 2023, 14:12:59) 
[Clang 14.0.3 (clang-1403.0.22.14.1)]
LightGBM version 3.3.5
NumPy version 1.24.3
pandas version 1.5.3
colsample_bytree 0.4
score 0.9949667353110078
feature importance [ 990 990 1020]
colsample_bytree 0.6
score 0.9908235442414114
feature importance [ 485 1373 1142]
colsample_bytree 0.8
score 0.9908235442414114
feature importance [ 485 1373 1142]
colsample_bytree 1.0
score 0.9994666215619484
feature importance [1410 1152 438]

On 3.2.1, it yields:

Python version 3.9.16 (main, Dec 7 2022, 10:15:43) 
[Clang 14.0.0 (clang-1400.0.29.202)]
LightGBM version 3.2.1
NumPy version 1.24.3
pandas version 1.5.3
colsample_bytree 0.4
score 0.9989950421955228
feature importance [1500 1500 0]
colsample_bytree 0.6
score 0.8132613678479168
feature importance [1478 0 1522]
colsample_bytree 0.8
score 0.8132613678479168
feature importance [1478 0 1522]
colsample_bytree 1.0
score 0.9994666215619484
feature importance [1410 1152 438]

Environment info

The results differed when using the following versions:

  • LightGBM version 3.3.5 on macOS Ventura 13.3.1. with an Apple M1 chip
  • LightGBM version 3.2.1 on macOS Monterey 12.3 with a 2.3 gHz Quad-Core Intel Core i7 processor
@mayashaked mayashaked changed the title Inconsistent results from LGBMRegressor between Versions 3.2.1 and 3.3.5 Inconsistent results from LGBMRegressor between versions 3.2.1 and 3.3.5 Jun 8, 2023
@jmoralez
Copy link
Collaborator

jmoralez commented Jun 8, 2023

Hi @mayashaked, thanks for using LightGBM. Could you run these on the same LightGBM version? There are 2 variables at the moment, the LightGBM version and the CPU.

Having said that, I think this issue is probably related to multithreading and having a different number of CPU cores. You can try setting force_col_wise=True, num_threads=2 in both runs to rule that out.

@mayashaked
Copy link
Author

mayashaked commented Jun 9, 2023

Hi @jmoralez, thanks for the speedy response! I tried again with adding the force_col_wise=True, num_threads=2 parameter. Version 3.3.5 yielded roughly the same result (scores differed slightly at the third decimal point for colsample_bytree=0.4 and colsample_bytree=0.6) on both the Intel processor and M1 chip machines:

Python version 3.9.16 (main, Dec  7 2022, 10:15:43) 
[Clang 14.0.0 (clang-1400.0.29.202)]
LightGBM version 3.3.5
NumPy version 1.24.3
pandas version 1.5.3
[LightGBM] [Warning] num_threads is set=2, n_jobs=-1 will be ignored. Current value: num_threads=2
colsample_bytree 0.4
score 0.9989950421955228
feature importance [1500 1500]
[LightGBM] [Warning] num_threads is set=2, n_jobs=-1 will be ignored. Current value: num_threads=2
colsample_bytree 0.6
score 0.9989950421955228
feature importance [1500 1500]
[LightGBM] [Warning] num_threads is set=2, n_jobs=-1 will be ignored. Current value: num_threads=2
colsample_bytree 0.8
score 0.9993677373164579
feature importance [1627 1373]
[LightGBM] [Warning] num_threads is set=2, n_jobs=-1 will be ignored. Current value: num_threads=2
colsample_bytree 1.0
score 0.9993677373164579
feature importance [1627 1373]

However, there is still a discrepancy with 3.2.1. Version 3.2.1 on the Intel processor machine yielded the following:

Python version 3.9.16 (main, Dec  7 2022, 10:15:43) 
[Clang 14.0.0 (clang-1400.0.29.202)]
LightGBM version 3.2.1
NumPy version 1.24.3
pandas version 1.5.3
[LightGBM] [Warning] num_threads is set=2, n_jobs=-1 will be ignored. Current value: num_threads=2
colsample_bytree 0.4
score 0.723396182441457
feature importance [3000    0]
[LightGBM] [Warning] num_threads is set=2, n_jobs=-1 will be ignored. Current value: num_threads=2
colsample_bytree 0.6
score 0.723396182441457
feature importance [3000    0]
[LightGBM] [Warning] num_threads is set=2, n_jobs=-1 will be ignored. Current value: num_threads=2
colsample_bytree 0.8
score 0.9993677373164579
feature importance [1627 1373]
[LightGBM] [Warning] num_threads is set=2, n_jobs=-1 will be ignored. Current value: num_threads=2
colsample_bytree 1.0
score 0.9993677373164579
feature importance [1627 1373]

I am having trouble with running 3.2.1 on my M1 machine altogether but I think that's a separate issue related to my setup and outside the scope of this (potential) bug. I am still puzzled as to why running the same code on the same machine, and therefore removing the additional factor of CPU, would return different results.

@jameslamb
Copy link
Collaborator

jameslamb commented Jun 9, 2023

trouble with running 3.2.1 on my M1 machine altogether

There has not yet been a LightGBM release that supports the M1/M2 Macs. See #4843 (comment).

Sorry for the inconvenience, we are working on it.

why running the same code on the same machine would return difference results

Please see #5887 (comment).

Briefly:

  • Multithreading can lead to results from multiple threads being returned in a non-deterministic order, which can lead to differences due to numerical precision when those results are multiplied together
    • solution: set deterministic=true and num_threads=1
  • During Dataset construction, LightGBM checks how long it takes to to bin the first few features, then uses those timings to decide whether to parallelize Dataset construction by row or by column. These two approaches can yield slightly different results due to numerical precision issues, and which will be faster is non-deterministic since it can be based on, among other things, what other work is being done on the machine's CPUs (outside of LightGBM)
    • solution: set force_row_wise=true or force_col_wise=true

this (potential) bug

We don't consider these sources of non-deterministic behavior to be bugs. They tend to have a small impact on the results for larger datasets, in exchange for faster training time.

@mayashaked
Copy link
Author

Understood. Thanks! You can close out the issue.

@jameslamb
Copy link
Collaborator

Sure, thanks very much for working with us and for the report.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants