Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard outputs infos even when disabled [+ Usage question] #987

Closed
elDan101 opened this issue Oct 12, 2017 · 3 comments
Closed

Standard outputs infos even when disabled [+ Usage question] #987

elDan101 opened this issue Oct 12, 2017 · 3 comments

Comments

@elDan101
Copy link

elDan101 commented Oct 12, 2017

Sorry, I didn't realise I was on a pre-release version (2.0.6.). After updating to 2.0.7. the issue is resolved. I leave the text below anyway.

Usage questions:

What does "Finished loading 120 models" mean? I get this as an output after incremental training.
Are these single trees (=models) in the Boosting tree ensemble? It is increasing every time I incrementally train (not staying the same number of decrease).

When training incrementally I get many warnings [WARNING] No further splits with positive gain, best gain: -inf Is there a "best practice" how to tackle this, to avoid these warnings?

Thanks a lot, I will leave the issue open for the usage question, but close it then.

Obsolete issue:

Hi,

my target function is a time series. I split my dataset in multiple training/testing phases.

At each training phase I want to continue training from previous models (incremental learning).

Inside a training phase I do cross validation using TimeSeriesSplit for cross validation, where I test multiple parameter settings.

At the moment I am having troubles with the console outputs. Here is a sample:

Phase 1 refers to the first training phase (there is no existing model yet).


******************************************************
INFO: fitting LightGBM in phase 1

--------------------------------------------

INFO: Start parameter tuning

INFO: Handling level0
INFO: **improved cv_error** with params={'verbose': 0, 'learning_rate': 0.01, 'num_leaves': 10, 'max_bin': 255} || current best result: 8207.489200311997
INFO: **improved cv_error** with params={'verbose': 0, 'learning_rate': 0.01, 'num_leaves': 20, 'max_bin': 255} || current best result: 8090.180625524847
INFO: tested for params={'verbose': 0, 'learning_rate': 0.01, 'num_leaves': 30, 'max_bin': 255} || result: 8252.27902379209
INFO: tested for params={'verbose': 0, 'learning_rate': 0.02, 'num_leaves': 10, 'max_bin': 255} || result: 8195.467582308242
INFO: tested for params={'verbose': 0, 'learning_rate': 0.02, 'num_leaves': 20, 'max_bin': 255} || result: 8136.15878262733
[...]

But when I enter phase 2, I get the following:

******************************************************
INFO: fitting LightGBM in phase 2
--------------------------------------------
INFO: Start parameter tuning
INFO: Handling level0
[LightGBM] [Info] Finished loading 120 models
[LightGBM] [Info] Trained a tree with leaves=10 and max_depth=6
[LightGBM] [Info] Trained a tree with leaves=10 and max_depth=6
[LightGBM] [Info] Trained a tree with leaves=10 and max_depth=6
[LightGBM] [Info] Trained a tree with leaves=10 and max_depth=6
[... Many more of these outputs...]
[LightGBM] [Info] Trained a tree with leaves=20 and max_depth=9
[LightGBM] [Info] Trained a tree with leaves=20 and max_depth=7
[LightGBM] [Info] Trained a tree with leaves=20 and max_depth=7
[LightGBM] [Info] Trained a tree with leaves=20 and max_depth=11
[LightGBM] [Info] Trained a tree with leaves=20 and max_depth=8
INFO: tested for params={'verbose': 0, 'learning_rate': 0.01, 'num_leaves': 20, 'max_bin': 255} || result: 5529.753655247278

When I set init_model=None, then in round 2 the logs "Trained a tree with leaves=XX and max_depth=XX" disappear. So I think this is a bug.

Environment info

Operating System: Windows 7 Professional Service Pack 1
CPU: Intel(R) Core(TM) i7-2640M CPU @2.8Ghz
Python version: 3.6.1., Anaconda 4.4.0, 64 bit
LightGBM 2.0.6
Further notice: I just tried the below code also on Linux Mint, and I do not see the std-output there, so may be Windows specific.

Reproducible code:

import numpy as np
import lightgbm as lgb

X1 = np.random.rand(100, 5)
Y1 = np.random.rand(100)

X2 = np.random.rand(100, 5)
Y2 = np.random.rand(100)

vX1 = np.random.rand(50, 5)
vY1 = np.random.rand(50)
vX2 = np.random.rand(50, 5)
vY2 = np.random.rand(50)

ds1 = lgb.Dataset(data=X1, label=Y1, silent=True)
ds2 = lgb.Dataset(data=X2, label=Y2, silent=True)

vds1 = lgb.Dataset(data=vX1, label=vY1, silent=True)
vds2 = lgb.Dataset(data=vX2, label=vY2, silent=True)

params = {"learning_rate": 0.1, "num_leaves": 20}
 
def feval(pred, target):
    if isinstance(pred, lgb.Dataset):
        pred = pred.label

    if isinstance(target, lgb.Dataset):
        target = target.label

    diff = pred - target
    n_samples = pred.shape[0]
    rmse = np.sqrt(np.sum(diff * diff) / n_samples)
    return "rmse", rmse, False

m1 = lgb.train(params=params, train_set=ds1, num_boost_round=1000,
          early_stopping_rounds=30, valid_names=["early_stopping_valid"],
          valid_sets=[vds1], feval=feval, init_model=None, verbose_eval=False,
          keep_training_booster=True)

print("FINISHED training model 1") 

m2 = lgb.train(params=params, train_set=ds2, num_boost_round=1000,
                early_stopping_rounds=30, valid_names=["early_stopping_valid"],
                valid_sets=[vds2], feval=feval, init_model=m1, verbose_eval=False,
                keep_training_booster=True)

print("FINISHED training model 2")

Output:

[LightGBM] [Info] Total Bins 105
[LightGBM] [Info] Number of data: 100, number of used features: 5
FINISHED training model 1
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=4 and max_depth=3
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=3 and max_depth=2
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=3 and max_depth=2
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[...]
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=4 and max_depth=2
FINISHED training model 2
@guolinke
Copy link
Collaborator

we reduce these output in the latest version. please try it.

@elDan101
Copy link
Author

yes, the output was reduced when updating (sorry, I didn't realise that I was on an old version).
But I still get the Finished loading X models. Can you give me a hint what this message means? I get this as an output after each incremental training. And it seems that number is always increasing.

@guolinke
Copy link
Collaborator

@elDan101 it will output when using continued train

@lock lock bot locked as resolved and limited conversation to collaborators Mar 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants