-
Notifications
You must be signed in to change notification settings - Fork 538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Evaluate/confirm completeness of coverage of built In Scorers #242
Comments
This is also necessary for building sklearn pipelines |
I wanted to also note that there are two options for tackling this issue:
Both options, I believe, are equally reasonable but also have drawbacks. For example, with option 1., it will be time consuming to build out many scoring algorithms with proper testing (though perhaps we only start with a handful 3-5?) . With option 2., cuml would need a fair amount of support from cudf to implement much of the numpy interface (unary and binary ops are on the way) however, more importantly, cuml would need to build more sklearn comparable methods like |
My vote is definitely a +1 on this, as I would eventually like to see all the scoring & metrics be exposed through cuml. Most of these scores involve a massively parallel operation with a simple reduction at the end, which make them perfect for the cuda design. I would also prefer that these were implemented in the c++ layer and exposed through cython, as we do with all of our algorithms, so that they can be ported easily to other distributed frameworks (eg Spark). My vote would be start with option #2 and evolve to #1 over time. Starting with #2 would enable us to leverage the path of least resistance for finishing the hyper-param tuning feature for now. As the metrics & scores become available within cuml, we can swap them out in our hyper-param tuning framework. I have entries in our algorithms tracker to support these. |
I agree with Corey. We can add that in the cuda level whenever we have time.
Onur Yilmaz | NVIDIA
Solutions Architect
cell: (201) 455 9226
email: [email protected]
…________________________________
From: Corey J. Nolet <[email protected]>
Sent: Thursday, March 14, 2019 11:01 AM
To: rapidsai/cuml
Cc: Subscribed
Subject: Re: [rapidsai/cuml] [FEA] Bultin Scorers (#242)
My vote is definitely a +1 on this, as I would eventually like to see all the scoring & metrics be exposed through cuml. Most of these scores involve a massively parallel operation with a simple reduction at the end, which make them perfect for the cuda design.
My vote would be start with option #2<#2> and evolve to #1<#1> over time. Starting with #2<#2> would enable us to leverage the path of least resistance for finishing the hyper-param tuning feature for now.
As the metrics & scores become available within cuml, we can swap them out in our hyper-param tuning framework. I have entries in our algorithms tracker to support these.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#242 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/Ahq6cTH4s2qnmPzDycKaXLK8USXdl4VLks5vWmQvgaJpZM4bSuxf>.
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
|
Set of initial evaluation metrics / scores that are being planned:
|
[gpuCI] Auto-merge branch-0.10 to branch-0.11 [skip ci]
Similar to #1522 , this could be a starting point for a CuPy version of recall score. Quite a bit faster than sklearn with low millions of rows, though with many classes it will begin to take a hit due to kernel calls in a loop. def cupy_recall_score(y, y_pred, average='binary'):
"""
TODO: Handle the following
- average=micro (slightly more annoying)
- average=weighted (slightly more annoying)
"""
nclasses = len(cp.unique(y))
if average == 'binary' and nclasses > 2:
raise ValueError
if nclasses < 2:
raise ValueError("Single class precision is not yet supported")
res = cp.zeros(nclasses)
for i in range(nclasses):
pos_pred_ix = cp.where(y_pred == i)[0]
# short circuit
if len(pos_pred_ix) == 0:
res[i] = 0
break
neg_pred_ix = cp.where(y_pred != i)[0]
tp_sum = (y_pred[pos_pred_ix] == y[pos_pred_ix]).sum()
fn_sum = (y[neg_pred_ix] == i).sum()
res[i] = (tp_sum / (tp_sum + fn_sum)).item()
if not average:
return res.get()
elif average == 'binary':
return res[nclasses-1].item()
elif average == 'macro':
return res.mean().item()
return res.get() |
I believe this issue is complete now. CLosing. |
All sklearn estimators have builtin score method. When performing hyperparamter optimization this builtin method is extremely useful so one doesn't have to also build a custom metric for scoring.
It would be nice if cuml also exposed such a method on all estimators
cc @dantegd
The text was updated successfully, but these errors were encountered: