Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Hamming, Jensen-Shannon, KL-Divergence, Russell rao and Correlation distance metrics support #306

Merged
merged 21 commits into from
Aug 25, 2021

Conversation

mdoijade
Copy link
Contributor

This PR introduces the following distances:

  • Hamming
  • Jensen-Shannon
  • Russell-Rao
  • KL-Divergence
  • Correlation
    with unit tests for each of them.

@mdoijade mdoijade requested review from a team as code owners July 30, 2021 15:27
@mdoijade
Copy link
Contributor Author

mdoijade commented Aug 2, 2021

@cjnolet @teju85 for help with review.

@teju85
Copy link
Member

teju85 commented Aug 3, 2021

@cjnolet Mahesh found that in some metrics there's a huge perf gain if we used the fast-math instrinsics. Thus, we were discussing about introducing "fast" versions of some of these metrics (eg: jensen-shannon and jensen-shannon-fast) in our list of metrics currently supported. What are your thoughts? BTW, if this sounds good, it'll be addressed in a separate PR.

@teju85 teju85 added breaking Breaking change enhancement New feature or request feature request New feature or request labels Aug 3, 2021
@dantegd dantegd requested a review from cjnolet August 4, 2021 01:50
…ich only requires log (x), and post processing revert the log(x) back to x
@mdoijade
Copy link
Contributor Author

mdoijade commented Aug 9, 2021

@cjnolet ping for help with review.

@cjnolet
Copy link
Member

cjnolet commented Aug 9, 2021

@teju85 @mdoijade,

So the proposal is to have two separate versions, since the faster one is an approximation, and make it optional for the user to select it? That sounds fine to me.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cjnolet
Copy link
Member

cjnolet commented Aug 9, 2021

@teju85,

I notice this is marked w/ the breaking label. Are there any updates that need to be made on the cuML side before this can be merged?

@dantegd
Copy link
Member

dantegd commented Aug 9, 2021

@cjnolet @mdoijade is a corresponding PR in cuML like this one needed?

@mdoijade
Copy link
Contributor Author

@teju85 @mdoijade,

So the proposal is to have two separate versions, since the faster one is an approximation, and make it optional for the user to select it? That sounds fine to me.

@cjnolet yes that is the proposed plan to have additional distance metrics for jensen-shannon & kl-divergence as fast versions. I see a perf difference of almost 2-3x when using fast log() for these distance metrics.

@mdoijade
Copy link
Contributor Author

@cjnolet @mdoijade is a corresponding PR in cuML like this one needed?

@dantegd though merging this raft PR shouldn't break the cuML build, I am working on exposing these distance metrics on cuML cpp/python interface. that PR should be ready in a day or two.

@GPUtester
Copy link
Contributor

Can one of the admins verify this patch?

@dantegd
Copy link
Member

dantegd commented Aug 23, 2021

add to allowlist

@mdoijade
Copy link
Contributor Author

@teju85 can you rerun the tests? seems like the some CI error causing failure.

@teju85
Copy link
Member

teju85 commented Aug 25, 2021

rerun tests

@teju85 teju85 added non-breaking Non-breaking change and removed breaking Breaking change labels Aug 25, 2021
@teju85
Copy link
Member

teju85 commented Aug 25, 2021

I notice this is marked w/ the breaking label. Are there any updates that need to be made on the cuML side before this can be merged?

Ah I missed your question, @cjnolet . This was an oversight from me. This is not a breaking change. I've rectified it.

@cjnolet
Copy link
Member

cjnolet commented Aug 25, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit aab9b95 into rapidsai:branch-21.10 Aug 25, 2021
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Sep 14, 2021
…o distance metrics (#4155)

-- This PR depends on RAFT PR - rapidsai/raft#306
-- Adds cpp & python interfaces for these distance metrics with pytest support for each of them.
-- also remove redundant commented code in canberra distance metric

Authors:
  - Mahesh Doijade (https://github.com/mdoijade)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #4155
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
…o distance metrics (rapidsai#4155)

-- This PR depends on RAFT PR - rapidsai/raft#306
-- Adds cpp & python interfaces for these distance metrics with pytest support for each of them.
-- also remove redundant commented code in canberra distance metric

Authors:
  - Mahesh Doijade (https://github.com/mdoijade)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4155
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp enhancement New feature or request feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants