Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Silhouette Score Error #3584

Closed
rsshah1993 opened this issue Mar 4, 2021 · 0 comments · Fixed by #3619
Closed

[BUG] Silhouette Score Error #3584

rsshah1993 opened this issue Mar 4, 2021 · 0 comments · Fixed by #3619
Assignees
Labels
bug Something isn't working

Comments

@rsshah1993
Copy link

Describe the bug
It appears that the calculation for the Silhouette scoring is inconsistent especially with singleton clusters. See below:

Steps/Code to reproduce bug

>>> from cuml.metrics.cluster import silhouette_samples
>>> import numpy as np
>>> vecs = np.array([[0.0, 0.0, 0.0], [1.0, 1.0, 1.0], [2.0, 2.0, 2.0], [10.0, 10.0, 10.0]])
>>> labels = np.array([0, 0, 1, 3])
>>> silhouette_samples(X=vecs, labels=labels)
array([0.5, 0. , 0. , 1. ])

It looks like for vecs[3] (which is a singleton cluster) produces a score of 0 while vecs[4] produces a score of 1. Is this the expected behavior? Another example:

>>> vecs = np.array([[0.0, 0.0, 0.0], [1.0, 1.0, 1.0], [10.0, 10.0, 10.0]])
>>> labels = np.array([1, 1, 3])
>>> silhouette_samples(X=vecs, labels=labels)
array([1., 1., 1.])

Expected behavior
I believe example one should return: [0.5, 0. , 0. , 0. ] and example two should return: [0.9, 0.888888888888889, 0.0]. It looks like sklearn assigns singleton clusters a score of 0 where as matlab assigns singleton clusters a score of 1 but still not sure if that explains either example.

Environment details:

  • Environment location: Docker
  • Linux Distro/Architecture: ubuntu18.04
  • GPU Model/Driver: T4/455.32.00
  • CUDA: 11.1
  • Method of cuDF & cuML install: conda install -c rapidsai-nightly -c nvidia -c conda-forge -c defaults rapids=0.19 python=3.7 cudatoolkit=10.2
@rsshah1993 rsshah1993 added ? - Needs Triage Need team to review and classify bug Something isn't working labels Mar 4, 2021
@lowener lowener removed the ? - Needs Triage Need team to review and classify label Mar 10, 2021
rapids-bot bot pushed a commit that referenced this issue Mar 16, 2021
closes #3584

Authors:
  - Divye Gala (@divyegala)

Approvers:
  - Dante Gama Dessavre (@dantegd)

URL: #3619
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants