Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Evaluate/confirm completeness of coverage of built In Scorers #242

Closed
quasiben opened this issue Feb 26, 2019 · 8 comments
Closed
Labels
0 - Backlog In queue waiting for assignment CUDA / C++ CUDA issue Cython / Python Cython or Python issue New Algorithm For tracking new algorithms that will be added to our existing collection Tracker For epoch-level tracking of work that encapsulates many stories

Comments

@quasiben
Copy link
Member

All sklearn estimators have builtin score method. When performing hyperparamter optimization this builtin method is extremely useful so one doesn't have to also build a custom metric for scoring.

It would be nice if cuml also exposed such a method on all estimators

cc @dantegd

@quasiben quasiben added ? - Needs Triage Need team to review and classify feature request New feature or request labels Feb 26, 2019
@dantegd dantegd added 0 - Backlog In queue waiting for assignment and removed ? - Needs Triage Need team to review and classify feature request New feature or request labels Mar 5, 2019
@dantegd dantegd added the Cython / Python Cython or Python issue label Mar 8, 2019
@quasiben
Copy link
Member Author

This is also necessary for building sklearn pipelines

@quasiben
Copy link
Member Author

I wanted to also note that there are two options for tackling this issue:

  1. build scoring functions with cuda/numba
  2. leverage existing scoring function within sklearn and build out the appropriate array_ufunc interfaces.

Both options, I believe, are equally reasonable but also have drawbacks. For example, with option 1., it will be time consuming to build out many scoring algorithms with proper testing (though perhaps we only start with a handful 3-5?) . With option 2., cuml would need a fair amount of support from cudf to implement much of the numpy interface (unary and binary ops are on the way) however, more importantly, cuml would need to build more sklearn comparable methods like get_precision and this is rather time consuming and perhaps undesirable.

@cjnolet
Copy link
Member

cjnolet commented Mar 14, 2019

My vote is definitely a +1 on this, as I would eventually like to see all the scoring & metrics be exposed through cuml. Most of these scores involve a massively parallel operation with a simple reduction at the end, which make them perfect for the cuda design.

I would also prefer that these were implemented in the c++ layer and exposed through cython, as we do with all of our algorithms, so that they can be ported easily to other distributed frameworks (eg Spark).

My vote would be start with option #2 and evolve to #1 over time. Starting with #2 would enable us to leverage the path of least resistance for finishing the hyper-param tuning feature for now.

As the metrics & scores become available within cuml, we can swap them out in our hyper-param tuning framework. I have entries in our algorithms tracker to support these.

@cjnolet cjnolet added CUDA / C++ CUDA issue New Algorithm For tracking new algorithms that will be added to our existing collection labels Mar 14, 2019
@oyilmaz-nvidia
Copy link
Contributor

oyilmaz-nvidia commented Mar 14, 2019 via email

@quasiben quasiben changed the title [FEA] Bultin Scorers [FEA] Built In Scorers Mar 27, 2019
@cjnolet
Copy link
Member

cjnolet commented Apr 16, 2019

Set of initial evaluation metrics / scores that are being planned:

@cjnolet cjnolet added the Tracker For epoch-level tracking of work that encapsulates many stories label May 30, 2019
@cjnolet
Copy link
Member

cjnolet commented Aug 21, 2019

#608

Salonijain27 pushed a commit to Salonijain27/cuml that referenced this issue Jan 22, 2020
[gpuCI] Auto-merge branch-0.10 to branch-0.11 [skip ci]
@beckernick
Copy link
Member

Similar to #1522 , this could be a starting point for a CuPy version of recall score. Quite a bit faster than sklearn with low millions of rows, though with many classes it will begin to take a hit due to kernel calls in a loop.

def cupy_recall_score(y, y_pred, average='binary'):
    """
    TODO: Handle the following
        - average=micro (slightly more annoying)
        - average=weighted (slightly more annoying)
    """
    nclasses = len(cp.unique(y))
    
    if average == 'binary' and nclasses > 2:
        raise ValueError
        
    if nclasses < 2:
        raise ValueError("Single class precision is not yet supported")
    
    res = cp.zeros(nclasses)
    
    for i in range(nclasses):
        pos_pred_ix = cp.where(y_pred == i)[0]
        
        # short circuit
        if len(pos_pred_ix) == 0:
            res[i] = 0
            break
            
        neg_pred_ix = cp.where(y_pred != i)[0]
        tp_sum = (y_pred[pos_pred_ix] == y[pos_pred_ix]).sum()
        fn_sum = (y[neg_pred_ix] == i).sum()
        res[i] = (tp_sum / (tp_sum + fn_sum)).item()
        
    if not average:
        return res.get()
    elif average == 'binary':
        return res[nclasses-1].item()
    elif average == 'macro':
        return res.mean().item()
    return res.get()

@dantegd dantegd changed the title [FEA] Built In Scorers [FEA] Evaluate/confirm completeness of coverage of built In Scorers Jun 6, 2021
@cjnolet
Copy link
Member

cjnolet commented Jul 29, 2021

I believe this issue is complete now. CLosing.

@cjnolet cjnolet closed this as completed Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment CUDA / C++ CUDA issue Cython / Python Cython or Python issue New Algorithm For tracking new algorithms that will be added to our existing collection Tracker For epoch-level tracking of work that encapsulates many stories
Projects
None yet
Development

No branches or pull requests

5 participants