Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UserWarning when using lightgbm.Dataset from pandas.DataFrame if both train and valid_sets are specified #2157

Closed
PGijsbers opened this issue May 8, 2019 · 2 comments

Comments

@PGijsbers
Copy link

First off, I am not sure if this is related/hinted at in #960.
I am sorry if (you feel) this is a duplicate, feel free to close it.

As per the title, when using lightgbm.Dataset constructed from pandas.DataFrame for both train and validation data, I get the warning UserWarning: categorical_feature in param dict is overridden. even if categorical_feature is never specified.

And a question, if I may; I am now not sure now if lightgbm properly uses the categorical features despite this warning. Does this behavior have any effect on the actual models?

MWE:

from sklearn.datasets import load_iris
import pandas as pd
import lightgbm as lgbm

X, y = load_iris(return_X_y=True)
X = pd.DataFrame(X)
X[0] = X[0].astype(int).astype('category')

X_train = X.sample(100)
y_train = y[X_train.index]
X_eval = X[~X.index.isin(X_train.index)]
y_eval = y[X_eval.index]

d_train = lgbm.Dataset(X_train, y_train)
d_eval = lgbm.Dataset(X_eval, y_eval)

params = {}
lgbm.train(params, d_train, valid_sets=d_eval)

Environment info

OS: Debian GNU/Linux 8 (jessie) in a docker container hosted on Windows 10 Enterprise, 64 bit.
Python: 3.6.4
Lightgbm: 2.2.3

@StrikerRUS
Copy link
Collaborator

@PGijsbers First of all thanks a lot for the reproducible example! We will use it as one of use cases in test_pandas.py to ensure the quality of logging routine.

Everything is OK with your code. LightGBM uses unordered category columns from pandas DataFrame as category features by default. You may ignore this warning as it was introduced as the confirmation that LightGBM uses category features (as the opposite side to a case when user explicitly specifies category features, but LightGBM doesn't use them). I completely agree that this warning is very annoying and we'll definitely refactor it as a part of work on #960 and #1021.

I have added a link to this issue in our TODO.

@PGijsbers
Copy link
Author

Thanks for the clarification!

@lock lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants