Skip to content

Commit

Permalink
Improve docstrings for silhouette score metrics. (#4026)
Browse files Browse the repository at this point in the history
This PR fixes a few typos I found while reading about how to use silhouette scores. Thanks!

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #4026
  • Loading branch information
bdice authored Jul 2, 2021
1 parent f7fb363 commit dacfef1
Showing 1 changed file with 12 additions and 12 deletions.
24 changes: 12 additions & 12 deletions python/cuml/metrics/cluster/silhouette_score.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def _silhouette_coeff(
X, labels, metric='euclidean', sil_scores=None, chunksize=None,
handle=None):
"""Function wrapped by silhouette_score and silhouette_samples to compute
silhouette coefficients
silhouette coefficients.
Parameters
----------
Expand All @@ -64,16 +64,16 @@ def _silhouette_coeff(
The assigned cluster labels for each sample.
metric : string
A string representation of the distance metric to use for evaluating
the silhouette schore. Available options are "cityblock", "cosine",
the silhouette score. Available options are "cityblock", "cosine",
"euclidean", "l1", "l2", "manhattan", and "sqeuclidean".
sil_scores : array_like, shape = (1, n_samples), dtype='float64'
An optional array in which to store the silhouette score for each
sample.
chunksize : integer (default = None)
An integer, 1 <= chunksize <= n_rows to tile the pairwise distance
An integer, 1 <= chunksize <= n_samples to tile the pairwise distance
matrix computations, so as to reduce the quadratic memory usage of
having the entire pairwise distance matrix in GPU memory.
If None, chunksize will automically be set to 40000, which through
If None, chunksize will automatically be set to 40000, which through
experiments has proved to be a safe number for the computation
to run on a GPU with 16 GB VRAM.
handle : cuml.Handle
Expand Down Expand Up @@ -156,7 +156,7 @@ def cython_silhouette_score(
metric='euclidean',
chunksize=None,
handle=None):
"""Calculate the mean silhouette coefficient for the provided data
"""Calculate the mean silhouette coefficient for the provided data.
Given a set of cluster labels for every sample in the provided data,
compute the mean intra-cluster distance (a) and the mean nearest-cluster
Expand All @@ -171,13 +171,13 @@ def cython_silhouette_score(
The assigned cluster labels for each sample.
metric : string
A string representation of the distance metric to use for evaluating
the silhouette schore. Available options are "cityblock", "cosine",
the silhouette score. Available options are "cityblock", "cosine",
"euclidean", "l1", "l2", "manhattan", and "sqeuclidean".
chunksize : integer (default = None)
An integer, 1 <= chunksize <= n_rows to tile the pairwise distance
An integer, 1 <= chunksize <= n_samples to tile the pairwise distance
matrix computations, so as to reduce the quadratic memory usage of
having the entire pairwise distance matrix in GPU memory.
If None, chunksize will automically be set to 40000, which through
If None, chunksize will automatically be set to 40000, which through
experiments has proved to be a safe number for the computation
to run on a GPU with 16 GB VRAM.
handle : cuml.Handle
Expand All @@ -200,7 +200,7 @@ def cython_silhouette_samples(
metric='euclidean',
chunksize=None,
handle=None):
"""Calculate the silhouette coefficient for each sample in the provided data
"""Calculate the silhouette coefficient for each sample in the provided data.
Given a set of cluster labels for every sample in the provided data,
compute the mean intra-cluster distance (a) and the mean nearest-cluster
Expand All @@ -215,13 +215,13 @@ def cython_silhouette_samples(
The assigned cluster labels for each sample.
metric : string
A string representation of the distance metric to use for evaluating
the silhouette schore. Available options are "cityblock", "cosine",
the silhouette score. Available options are "cityblock", "cosine",
"euclidean", "l1", "l2", "manhattan", and "sqeuclidean".
chunksize : integer (default = None)
An integer, 1 <= chunksize <= n_rows to tile the pairwise distance
An integer, 1 <= chunksize <= n_samples to tile the pairwise distance
matrix computations, so as to reduce the quadratic memory usage of
having the entire pairwise distance matrix in GPU memory.
If None, chunksize will automically be set to 40000, which through
If None, chunksize will automatically be set to 40000, which through
experiments has proved to be a safe number for the computation
to run on a GPU with 16 GB VRAM.
handle : cuml.Handle
Expand Down

0 comments on commit dacfef1

Please sign in to comment.