-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LinearRegression: add support for multiple targets #4988
LinearRegression: add support for multiple targets #4988
Conversation
closes #3850 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Allard for the PR! It looks great overall, I just have a few comments.
a668a6e
to
5a44fa4
Compare
Thanks for catching the skipped tests. That was an oversight on my part. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Allard for the update, the PR looks good to me!
84f1cf5
to
677a08b
Compare
Forgot to mention in my review, I think the cuml/python/cuml/linear_model/base.pyx Line 64 in 677a08b
|
@ahendriksen Thanks a lot for tackling this! 👍 I have provided a few comments. My main concern is with potentially unnecessary conversions, would appreciate if those could be addressed. |
On a representative benchmark, this PR speeds up LinearRegression by 20-50x compared to using a loop in Python. This was how multi-target linear regression has been implemented in practice so far. Results on Volta for the below script show that the new code is faster both when the number of targets is large and when it is small.
import cupy as cp
from cuml.linear_model import LinearRegression
from time import perf_counter as timer
from contextlib import contextmanager
@contextmanager
def time(name):
# Code to acquire resource, e.g.:
start = timer()
yield
duration = timer() - start
print(f"{name}: {duration:0.2f} seconds")
n_features = 3
n_samples = 91_000
n_targets = 5_000
X = cp.random.normal(size=(n_samples, n_features))
y = cp.random.normal(size=(n_samples, n_targets))
out1 = cp.zeros(y.shape)
out2 = cp.zeros(y.shape)
# Create linear regression instance that can be reused.
lr = LinearRegression(fit_intercept=False, output_type="cupy", algorithm="svd")
with time("loop"):
for i in range(n_targets):
lr.fit(X, y[:, i])
out1[:, i] = lr.predict(X)
with time("new"):
lr.fit(X, y)
out2[:] = lr.predict(X) |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## branch-22.12 #4988 +/- ##
===============================================
Coverage ? 79.44%
===============================================
Files ? 184
Lines ? 11698
Branches ? 0
===============================================
Hits ? 9293
Misses ? 2405
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
I think we should also update the tags of the estimator class by adding @staticmethod
def _more_static_tags():
return {"multioutput": True} to the base class. |
88402bf
to
c18d7ff
Compare
c18d7ff
to
37261b3
Compare
Thank you for the review! Due to the removal of the conversions, the code became another factor of 2 faster. Apologies for the wip commits. I have a bit of a hobbled workflow when working with pyx files. |
Can this PR be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. Thanks Allard!
@gpucibot merge |
LinearRegression did not have support for target vectors with multiple columns previously. This PR adds support. Authors: - Allard Hendriksen (https://github.com/ahendriksen) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4988
LinearRegression did not have support for target vectors with multiple columns previously. This PR adds support.