Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] [python-package] remove h2o datatable support? #6662

Open
jameslamb opened this issue Oct 5, 2024 · 7 comments
Open

[RFC] [python-package] remove h2o datatable support? #6662

jameslamb opened this issue Oct 5, 2024 · 7 comments

Comments

@jameslamb
Copy link
Collaborator

Summary

Support for the h2o's datatable library was added to LightGBM 5.5+ years ago, in #1970.

Proposing here that lightgbm:

  • issue a deprecation warning for the next 2-3 releases whenever datatable is used
  • permanently remove datatable support 2-3 releases from now

Motivation

That project seems to be abandoned:

In those 5.5 years since #1970, the only bug reports / feature requests received about datatable support have been from one person working for h2o... and the last of those was 4 years ago:

And in all that time, I don't think we have ever tested against datatable in CI.

Description

Doing this would simplify the Python package, making it easier for others to contribute.

It'd also make it more manageable to add support for newer, more popular input formats like polars (#6204).

See @trivialfis's summary of the current state of supporting data frame libraries at dmlc/xgboost#10554 (comment) ... I agree with it.

References

I am not proposing here that lightgbm should support H2OFrame... Dask doesn't, XGBoost doesn't, scikit-learn doesn't... and I think our limited time and attention here would be better spent on more widely-used input formats, like polars.

@jameslamb
Copy link
Collaborator Author

@guolinke @shiyu1994 @StrikerRUS @jmoralez @borchero @btrotta please let me know what you think whenever you have time

@StrikerRUS
Copy link
Collaborator

I'm +1 for dropping support of datatable. Especially given that so called "support" is simple .to_numpy() method call 🙃

@trivialfis
Copy link

Thank you for the ping. Sounds good to me considering that there's no new commit to the project now.

@jmoralez
Copy link
Collaborator

jmoralez commented Oct 6, 2024

I'm +1 as well

@guolinke
Copy link
Collaborator

guolinke commented Oct 8, 2024

I am +1

@borchero
Copy link
Collaborator

borchero commented Oct 8, 2024

I'm in favor of removing as well ✅

@jameslamb
Copy link
Collaborator Author

Thank you all for the quick responses! I'll put up a PR adding a deprecation warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants