Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support different data types (when load data from Python) #3459

Closed
etveritas opened this issue Oct 16, 2020 · 6 comments
Closed

Support different data types (when load data from Python) #3459

etveritas opened this issue Oct 16, 2020 · 6 comments

Comments

@etveritas
Copy link

Summary

Specify data type when load data from Python, like col1-->float16, col2-->int8

Motivation

Save memory.

Description

We usually use Python API of LightGBM, and use pandas DataFrame reserve data. Sometimes, we specify data in int8 or float16, but when convert into LightGBM Dataset, they all convert to float64 or float32, it can cause more memory usage. So how can we specify data type when load data from Python.

@guolinke
Copy link
Collaborator

this feature is not supported yet, LightGBM requires the same type for all features.

@aldanor
Copy link

aldanor commented Nov 1, 2020

On a side note, it would be nice if lightgbm supported feature types like int16 etc. E.g. you may have a huge matrix of int16 features but you have to convert it to float32/float64 just in order for lightgbm to be able to consume it...

@etveritas
Copy link
Author

@guolinke hello, would this feature will be added in the future?

@guolinke
Copy link
Collaborator

Currently, the data loader in LightGBM is designed for row-wise data. For the column-wise, it needs more refactoring works.
But I think this is quite important, as it can save memories when loading from many column-wise objects, like pandas/datatables.

@etveritas
Copy link
Author

I see, most of ml system use C++ as backend can meet this problem, maybe we should get some change.
Thanks~
And feel free to close this issue or not.

@StrikerRUS
Copy link
Collaborator

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

@StrikerRUS StrikerRUS changed the title How to specify data type when load data from Python? Support different data types (when load data from Python) Jan 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants