[Python] Confusing error when trying to predict on empty dataframe slice #1395

Ingvar-Y · 2018-05-24T08:59:32Z

Hi!

Iterating through different slices of pandas.DataFrame I accidentally tried to use Booster.predict on an empty slice. This resulted in the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-31-b12cf753de2c> in <module>()
----> 1 bst.predict(get_goods(3335353, data_pred).drop(columns=['Reserves_lasting_days_log']))

~\AppData\Local\Continuum\anaconda3\lib\site-packages\lightgbm\sklearn.py in predict(self, X, raw_score, num_iteration)
    513                              "input n_features is %s "
    514                              % (self._n_features, n_features))
--> 515         return self.booster_.predict(X, raw_score=raw_score, num_iteration=num_iteration)
    516 
    517     def apply(self, X, num_iteration=0):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\lightgbm\basic.py in predict(self, data, num_iteration, raw_score, pred_leaf, pred_contrib, data_has_header, is_reshape, pred_parameter)
   1782         if num_iteration <= 0:
   1783             num_iteration = self.best_iteration
-> 1784         return predictor.predict(data, num_iteration, raw_score, pred_leaf, pred_contrib, data_has_header, is_reshape)
   1785 
   1786     def get_leaf_output(self, tree_id, leaf_id):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\lightgbm\basic.py in predict(self, data, num_iteration, raw_score, pred_leaf, pred_contrib, data_has_header, is_reshape)
    401         if isinstance(data, Dataset):
    402             raise TypeError("Cannot use Dataset instance for prediction, please use raw data instead")
--> 403         data = _data_from_pandas(data, None, None, self.pandas_categorical)[0]
    404         predict_type = C_API_PREDICT_NORMAL
    405         if raw_score:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\lightgbm\basic.py in _data_from_pandas(data, feature_name, categorical_feature, pandas_categorical)
    268 
    269             msg = """DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields """
--> 270             raise ValueError(msg + ', '.join(bad_fields))
    271         data = data.values.astype('float')
    272     else:

ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields RC, Goods_code, Holiday, Day_num, Measure, VAT, Price_range, Region, Price_region, Month

This is confusing because the problem stems from empty input and not from categorical dtypes of the dataframe columns which is supported.

The text was updated successfully, but these errors were encountered:

guolinke · 2018-05-25T07:30:15Z

ping @StrikerRUS

StrikerRUS · 2018-05-25T17:38:27Z

@Ingvar-Y Thanks for creating the issue. Could you please provide a minimum reproducible example?

Ingvar-Y · 2018-05-27T12:06:30Z

@StrikerRUS Example: the following code -

import pandas as pd
from lightgbm import LGBMRegressor

li = list(range(50))
di = {'A': li}
X = pd.DataFrame.from_dict(di)
X['B'] = (-1) ** X['A']
X['C'] = X['A'] * X['B']
X['B'] = X['B'].astype('category')
X1 = X.loc[:24,:]
X2 = X.loc[25:,:]
bst = LGBMRegressor(n_estimators=5, min_data=1, min_data_in_bin=1)
bst.fit(X1.drop(columns = ['C']), X1['C'])
bst.predict(X2[X2['A'] > 100].drop(columns=['C']))

results in the following error -

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-fa4305a852e6> in <module>()
----> 1 bst.predict(X2[X2['A'] > 100].drop(columns=['C']))

~\AppData\Local\Continuum\miniconda3\envs\work\lib\site-packages\lightgbm\sklearn.py in predict(self, X, raw_score, num_iteration)
    513                              "input n_features is %s "
    514                              % (self._n_features, n_features))
--> 515         return self.booster_.predict(X, raw_score=raw_score, num_iteration=num_iteration)
    516 
    517     def apply(self, X, num_iteration=0):

~\AppData\Local\Continuum\miniconda3\envs\work\lib\site-packages\lightgbm\basic.py in predict(self, data, num_iteration, raw_score, pred_leaf, pred_contrib, data_has_header, is_reshape, pred_parameter)
   1782         if num_iteration <= 0:
   1783             num_iteration = self.best_iteration
-> 1784         return predictor.predict(data, num_iteration, raw_score, pred_leaf, pred_contrib, data_has_header, is_reshape)
   1785 
   1786     def get_leaf_output(self, tree_id, leaf_id):

~\AppData\Local\Continuum\miniconda3\envs\work\lib\site-packages\lightgbm\basic.py in predict(self, data, num_iteration, raw_score, pred_leaf, pred_contrib, data_has_header, is_reshape)
    401         if isinstance(data, Dataset):
    402             raise TypeError("Cannot use Dataset instance for prediction, please use raw data instead")
--> 403         data = _data_from_pandas(data, None, None, self.pandas_categorical)[0]
    404         predict_type = C_API_PREDICT_NORMAL
    405         if raw_score:

~\AppData\Local\Continuum\miniconda3\envs\work\lib\site-packages\lightgbm\basic.py in _data_from_pandas(data, feature_name, categorical_feature, pandas_categorical)
    268 
    269             msg = """DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields """
--> 270             raise ValueError(msg + ', '.join(bad_fields))
    271         data = data.values.astype('float')
    272     else:

ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields B

StrikerRUS · 2018-05-27T13:28:26Z

@Ingvar-Y Thank you for the example!

At present pandas support is partial and limited. There are some bugs in it. Your case is another one.
We have plans about rewriting pandas support completely: #960.

wxchan mentioned this issue May 27, 2018

[python] refine pandas support #960

Closed

6 tasks

StrikerRUS mentioned this issue May 27, 2018

[python] added check for pandas DataFrame dimensions #1402

Merged

guolinke closed this as completed Jun 12, 2018

lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Confusing error when trying to predict on empty dataframe slice #1395

[Python] Confusing error when trying to predict on empty dataframe slice #1395

Ingvar-Y commented May 24, 2018

guolinke commented May 25, 2018

StrikerRUS commented May 25, 2018

Ingvar-Y commented May 27, 2018

StrikerRUS commented May 27, 2018

[Python] Confusing error when trying to predict on empty dataframe slice #1395

[Python] Confusing error when trying to predict on empty dataframe slice #1395

Comments

Ingvar-Y commented May 24, 2018

guolinke commented May 25, 2018

StrikerRUS commented May 25, 2018

Ingvar-Y commented May 27, 2018

StrikerRUS commented May 27, 2018