Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Confusing error when trying to predict on empty dataframe slice #1395

Closed
Ingvar-Y opened this issue May 24, 2018 · 4 comments
Closed

Comments

@Ingvar-Y
Copy link

Hi!

Iterating through different slices of pandas.DataFrame I accidentally tried to use Booster.predict on an empty slice. This resulted in the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-31-b12cf753de2c> in <module>()
----> 1 bst.predict(get_goods(3335353, data_pred).drop(columns=['Reserves_lasting_days_log']))

~\AppData\Local\Continuum\anaconda3\lib\site-packages\lightgbm\sklearn.py in predict(self, X, raw_score, num_iteration)
    513                              "input n_features is %s "
    514                              % (self._n_features, n_features))
--> 515         return self.booster_.predict(X, raw_score=raw_score, num_iteration=num_iteration)
    516 
    517     def apply(self, X, num_iteration=0):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\lightgbm\basic.py in predict(self, data, num_iteration, raw_score, pred_leaf, pred_contrib, data_has_header, is_reshape, pred_parameter)
   1782         if num_iteration <= 0:
   1783             num_iteration = self.best_iteration
-> 1784         return predictor.predict(data, num_iteration, raw_score, pred_leaf, pred_contrib, data_has_header, is_reshape)
   1785 
   1786     def get_leaf_output(self, tree_id, leaf_id):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\lightgbm\basic.py in predict(self, data, num_iteration, raw_score, pred_leaf, pred_contrib, data_has_header, is_reshape)
    401         if isinstance(data, Dataset):
    402             raise TypeError("Cannot use Dataset instance for prediction, please use raw data instead")
--> 403         data = _data_from_pandas(data, None, None, self.pandas_categorical)[0]
    404         predict_type = C_API_PREDICT_NORMAL
    405         if raw_score:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\lightgbm\basic.py in _data_from_pandas(data, feature_name, categorical_feature, pandas_categorical)
    268 
    269             msg = """DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields """
--> 270             raise ValueError(msg + ', '.join(bad_fields))
    271         data = data.values.astype('float')
    272     else:

ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields RC, Goods_code, Holiday, Day_num, Measure, VAT, Price_range, Region, Price_region, Month

This is confusing because the problem stems from empty input and not from categorical dtypes of the dataframe columns which is supported.

@guolinke
Copy link
Collaborator

ping @StrikerRUS

@StrikerRUS
Copy link
Collaborator

@Ingvar-Y Thanks for creating the issue. Could you please provide a minimum reproducible example?

@Ingvar-Y
Copy link
Author

@StrikerRUS Example: the following code -

import pandas as pd
from lightgbm import LGBMRegressor

li = list(range(50))
di = {'A': li}
X = pd.DataFrame.from_dict(di)
X['B'] = (-1) ** X['A']
X['C'] = X['A'] * X['B']
X['B'] = X['B'].astype('category')
X1 = X.loc[:24,:]
X2 = X.loc[25:,:]
bst = LGBMRegressor(n_estimators=5, min_data=1, min_data_in_bin=1)
bst.fit(X1.drop(columns = ['C']), X1['C'])
bst.predict(X2[X2['A'] > 100].drop(columns=['C']))

results in the following error -

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-fa4305a852e6> in <module>()
----> 1 bst.predict(X2[X2['A'] > 100].drop(columns=['C']))

~\AppData\Local\Continuum\miniconda3\envs\work\lib\site-packages\lightgbm\sklearn.py in predict(self, X, raw_score, num_iteration)
    513                              "input n_features is %s "
    514                              % (self._n_features, n_features))
--> 515         return self.booster_.predict(X, raw_score=raw_score, num_iteration=num_iteration)
    516 
    517     def apply(self, X, num_iteration=0):

~\AppData\Local\Continuum\miniconda3\envs\work\lib\site-packages\lightgbm\basic.py in predict(self, data, num_iteration, raw_score, pred_leaf, pred_contrib, data_has_header, is_reshape, pred_parameter)
   1782         if num_iteration <= 0:
   1783             num_iteration = self.best_iteration
-> 1784         return predictor.predict(data, num_iteration, raw_score, pred_leaf, pred_contrib, data_has_header, is_reshape)
   1785 
   1786     def get_leaf_output(self, tree_id, leaf_id):

~\AppData\Local\Continuum\miniconda3\envs\work\lib\site-packages\lightgbm\basic.py in predict(self, data, num_iteration, raw_score, pred_leaf, pred_contrib, data_has_header, is_reshape)
    401         if isinstance(data, Dataset):
    402             raise TypeError("Cannot use Dataset instance for prediction, please use raw data instead")
--> 403         data = _data_from_pandas(data, None, None, self.pandas_categorical)[0]
    404         predict_type = C_API_PREDICT_NORMAL
    405         if raw_score:

~\AppData\Local\Continuum\miniconda3\envs\work\lib\site-packages\lightgbm\basic.py in _data_from_pandas(data, feature_name, categorical_feature, pandas_categorical)
    268 
    269             msg = """DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields """
--> 270             raise ValueError(msg + ', '.join(bad_fields))
    271         data = data.values.astype('float')
    272     else:

ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields B

@StrikerRUS
Copy link
Collaborator

@Ingvar-Y Thank you for the example!

At present pandas support is partial and limited. There are some bugs in it. Your case is another one.
We have plans about rewriting pandas support completely: #960.

@lock lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants