[FEA] Evaluate/confirm completeness of coverage of built In Scorers #242

quasiben · 2019-02-26T17:39:56Z

All sklearn estimators have builtin score method. When performing hyperparamter optimization this builtin method is extremely useful so one doesn't have to also build a custom metric for scoring.

It would be nice if cuml also exposed such a method on all estimators

cc @dantegd

quasiben · 2019-03-14T13:52:24Z

This is also necessary for building sklearn pipelines

quasiben · 2019-03-14T14:05:18Z

I wanted to also note that there are two options for tackling this issue:

build scoring functions with cuda/numba
leverage existing scoring function within sklearn and build out the appropriate array_ufunc interfaces.

Both options, I believe, are equally reasonable but also have drawbacks. For example, with option 1., it will be time consuming to build out many scoring algorithms with proper testing (though perhaps we only start with a handful 3-5?) . With option 2., cuml would need a fair amount of support from cudf to implement much of the numpy interface (unary and binary ops are on the way) however, more importantly, cuml would need to build more sklearn comparable methods like get_precision and this is rather time consuming and perhaps undesirable.

cjnolet · 2019-03-14T15:01:01Z

My vote is definitely a +1 on this, as I would eventually like to see all the scoring & metrics be exposed through cuml. Most of these scores involve a massively parallel operation with a simple reduction at the end, which make them perfect for the cuda design.

I would also prefer that these were implemented in the c++ layer and exposed through cython, as we do with all of our algorithms, so that they can be ported easily to other distributed frameworks (eg Spark).

My vote would be start with option #2 and evolve to #1 over time. Starting with #2 would enable us to leverage the path of least resistance for finishing the hyper-param tuning feature for now.

As the metrics & scores become available within cuml, we can swap them out in our hyper-param tuning framework. I have entries in our algorithms tracker to support these.

oyilmaz-nvidia · 2019-03-14T15:04:17Z

I agree with Corey. We can add that in the cuda level whenever we have time. Onur Yilmaz | NVIDIA Solutions Architect cell: (201) 455 9226 email: [email protected]

…

________________________________ From: Corey J. Nolet <[email protected]> Sent: Thursday, March 14, 2019 11:01 AM To: rapidsai/cuml Cc: Subscribed Subject: Re: [rapidsai/cuml] [FEA] Bultin Scorers (#242) My vote is definitely a +1 on this, as I would eventually like to see all the scoring & metrics be exposed through cuml. Most of these scores involve a massively parallel operation with a simple reduction at the end, which make them perfect for the cuda design. My vote would be start with option #2<#2> and evolve to #1<#1> over time. Starting with #2<#2> would enable us to leverage the path of least resistance for finishing the hyper-param tuning feature for now. As the metrics & scores become available within cuml, we can swap them out in our hyper-param tuning framework. I have entries in our algorithms tracker to support these. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#242 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/Ahq6cTH4s2qnmPzDycKaXLK8USXdl4VLks5vWmQvgaJpZM4bSuxf>.

----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -----------------------------------------------------------------------------------

cjnolet · 2019-04-16T15:16:14Z

Set of initial evaluation metrics / scores that are being planned:

R2 Score ([FEA] Primitive for calculating an R^2 Score given y and y_hat arrays #429)
K-means score (opposite of value of objective)
Sillhouette Score
Adjusted Rand Score
Sillhouette Score
Homogeneity Score
Basic supervised perf metrics (accuracy, precision, recall, f1, auc, confusion matrix, etc...)

cjnolet · 2019-08-21T02:03:33Z

#608

[gpuCI] Auto-merge branch-0.10 to branch-0.11 [skip ci]

beckernick · 2020-02-12T14:40:28Z

Similar to #1522 , this could be a starting point for a CuPy version of recall score. Quite a bit faster than sklearn with low millions of rows, though with many classes it will begin to take a hit due to kernel calls in a loop.

def cupy_recall_score(y, y_pred, average='binary'):
    """
    TODO: Handle the following
        - average=micro (slightly more annoying)
        - average=weighted (slightly more annoying)
    """
    nclasses = len(cp.unique(y))
    
    if average == 'binary' and nclasses > 2:
        raise ValueError
        
    if nclasses < 2:
        raise ValueError("Single class precision is not yet supported")
    
    res = cp.zeros(nclasses)
    
    for i in range(nclasses):
        pos_pred_ix = cp.where(y_pred == i)[0]
        
        # short circuit
        if len(pos_pred_ix) == 0:
            res[i] = 0
            break
            
        neg_pred_ix = cp.where(y_pred != i)[0]
        tp_sum = (y_pred[pos_pred_ix] == y[pos_pred_ix]).sum()
        fn_sum = (y[neg_pred_ix] == i).sum()
        res[i] = (tp_sum / (tp_sum + fn_sum)).item()
        
    if not average:
        return res.get()
    elif average == 'binary':
        return res[nclasses-1].item()
    elif average == 'macro':
        return res.mean().item()
    return res.get()

cjnolet · 2021-07-29T20:47:01Z

I believe this issue is complete now. CLosing.

quasiben added ? - Needs Triage Need team to review and classify feature request New feature or request labels Feb 26, 2019

dantegd added 0 - Backlog In queue waiting for assignment and removed ? - Needs Triage Need team to review and classify feature request New feature or request labels Mar 5, 2019

dantegd added the Cython / Python Cython or Python issue label Mar 8, 2019

cjnolet added CUDA / C++ CUDA issue New Algorithm For tracking new algorithms that will be added to our existing collection labels Mar 14, 2019

quasiben changed the title ~~[FEA] Bultin Scorers~~ [FEA] Built In Scorers Mar 27, 2019

cjnolet added the Tracker For epoch-level tracking of work that encapsulates many stories label May 30, 2019

This was referenced Dec 30, 2019

[FEA] Area under receiver operating characteristic curve (AUC for ROC curve) #1523

Closed

[FEA] Confusion matrix #1524

Closed

Salonijain27 pushed a commit to Salonijain27/cuml that referenced this issue Jan 22, 2020

Merge pull request rapidsai#242 from rapidsai/branch-0.10

b389adb

[gpuCI] Auto-merge branch-0.10 to branch-0.11 [skip ci]

beckernick mentioned this issue Feb 12, 2020

[FEA] Model precision with precision_score #1522

Open

dantegd changed the title ~~[FEA] Built In Scorers~~ [FEA] Evaluate/confirm completeness of coverage of built In Scorers Jun 6, 2021

cjnolet closed this as completed Jul 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Evaluate/confirm completeness of coverage of built In Scorers #242

[FEA] Evaluate/confirm completeness of coverage of built In Scorers #242

quasiben commented Feb 26, 2019

quasiben commented Mar 14, 2019

quasiben commented Mar 14, 2019

cjnolet commented Mar 14, 2019 •

edited

Loading

oyilmaz-nvidia commented Mar 14, 2019 via email

cjnolet commented Apr 16, 2019 •

edited

Loading

cjnolet commented Aug 21, 2019

beckernick commented Feb 12, 2020

cjnolet commented Jul 29, 2021

[FEA] Evaluate/confirm completeness of coverage of built In Scorers #242

[FEA] Evaluate/confirm completeness of coverage of built In Scorers #242

Comments

quasiben commented Feb 26, 2019

quasiben commented Mar 14, 2019

quasiben commented Mar 14, 2019

cjnolet commented Mar 14, 2019 • edited Loading

oyilmaz-nvidia commented Mar 14, 2019 via email

cjnolet commented Apr 16, 2019 • edited Loading

cjnolet commented Aug 21, 2019

beckernick commented Feb 12, 2020

cjnolet commented Jul 29, 2021

cjnolet commented Mar 14, 2019 •

edited

Loading

cjnolet commented Apr 16, 2019 •

edited

Loading