Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hdfs] support parquet file #1286

Closed
janelu9 opened this issue Mar 26, 2018 · 7 comments
Closed

[hdfs] support parquet file #1286

janelu9 opened this issue Mar 26, 2018 · 7 comments

Comments

@janelu9
Copy link

janelu9 commented Mar 26, 2018

can it only read a single txtfile on hdfs ?
it is suggested to support parquet file , generally we write file as parquet format from spark directly after feature project~

@janelu9 janelu9 changed the title it seems that lightgbm con't read the path of parquet files on hdfs 【BUG】it seems that lightgbm con't read the path of parquet files on hdfs Mar 27, 2018
@guolinke
Copy link
Collaborator

@janelu9 Sorry, it only supports text file now.

@guolinke guolinke changed the title 【BUG】it seems that lightgbm con't read the path of parquet files on hdfs [hdfs] support parquet file Mar 28, 2018
@guolinke guolinke closed this as completed Aug 1, 2019
@imatiach-msft
Copy link
Contributor

@janelu9 @guolinke
you can run lightgbm on mmlspark, which can handle parquet files when loaded into spark DataFrame

@StrikerRUS
Copy link
Collaborator

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

@imatiach-msft
Copy link
Contributor

@StrikerRUS yep, I was just saying this requested feature already exists in mmlspark, which seems to fit the user's scenario above (running lightgbm on a parquet file from spark). Since it exists it doesn't even need to be in #2302 and can be closed because it is an actual existing feature as opposed to a non-yet-existing requested feature.

@StrikerRUS
Copy link
Collaborator

@imatiach-msft Are you sure that this feature doesn't need to be implemented in pure LightGBM (like HDFS support), independently from mmlspark package?

@imatiach-msft
Copy link
Contributor

@StrikerRUS it certainly could be, however with the use case from user: "generally we write file as parquet format from spark ", it seems that running lightgbm in spark is the best solution. Maybe we can leave the feature open, but with low priority (if there is a way to assign priorities to tasks).

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Aug 1, 2019

@imatiach-msft

it seems that running lightgbm in spark is the best solution.

Agree with you! I think we can re-open it in case of concrete request in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants