Skip to content

Commit

Permalink
Sphinx warnings as errors (rapidsai#4585)
Browse files Browse the repository at this point in the history
Goals:
- Python documentation should be built for PRs 
- Checks should fail upon sphinx warnings
- Existing warnings are fixed in this PR

This PR covers only the python documentation. C++ docs are out of scope.

Authors:
  - Rory Mitchell (https://github.com/RAMitchell)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4585
  • Loading branch information
RAMitchell authored Mar 28, 2022
1 parent 8bc9fb1 commit 8f30123
Show file tree
Hide file tree
Showing 16 changed files with 50 additions and 28 deletions.
5 changes: 4 additions & 1 deletion ci/docs/build.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
# Copyright (c) 2020-2021, NVIDIA CORPORATION.
# Copyright (c) 2020-2022, NVIDIA CORPORATION.
#################################
# cuML Docs build script for CI #
#################################
Expand Down Expand Up @@ -51,6 +51,7 @@ gpuci_logger "Build Doxygen docs"
gpuci_logger "Build Sphinx docs"
cd "$PROJECT_WORKSPACE/docs"
make html
RETVAL=$?

#Commit to Website
cd "$DOCS_WORKSPACE"
Expand All @@ -65,3 +66,5 @@ done

mv "$PROJECT_WORKSPACE/cpp/build/html/"* "$DOCS_WORKSPACE/api/libcuml/$BRANCH_VERSION"
mv "$PROJECT_WORKSPACE/docs/build/html/"* "$DOCS_WORKSPACE/api/cuml/$BRANCH_VERSION"

exit $RETVAL
5 changes: 5 additions & 0 deletions ci/gpu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,11 @@ else
unset LIBCUML_BUILD_DIR
$WORKSPACE/build.sh cppdocs -v

if [ "$CUDA_REL" != "11.0" ]; then
gpuci_logger "Building python docs"
$WORKSPACE/build.sh pydocs
fi

fi

if [ -n "${CODECOV_TOKEN}" ]; then
Expand Down
2 changes: 1 addition & 1 deletion docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#

# You can set these variables from the command line.
SPHINXOPTS =
SPHINXOPTS = "-W"
SPHINXBUILD = sphinx-build
SPHINXPROJ = cuML
SOURCEDIR = source
Expand Down
25 changes: 14 additions & 11 deletions python/cuml/cluster/agglomerative.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -106,21 +106,24 @@ class AgglomerativeClustering(Base, ClusterMixin, CMajorInputTagMixin):
Which linkage criterion to use. The linkage criterion determines
which distance to use between sets of observations. The algorithm
will merge the pairs of clusters that minimize this criterion.
- 'single' uses the minimum of the distances between all
observations of the two sets.
* 'single' uses the minimum of the distances between all
observations of the two sets.
n_neighbors : int (default = 15)
The number of neighbors to compute when connectivity = "knn"
connectivity : {"pairwise", "knn"}, (default = "knn")
The type of connectivity matrix to compute.
- 'pairwise' will compute the entire fully-connected graph of
pairwise distances between each set of points. This is the
fastest to compute and can be very fast for smaller datasets
but requires O(n^2) space.
- 'knn' will sparsify the fully-connected connectivity matrix to
save memory and enable much larger inputs. "n_neighbors" will
control the amount of memory used and the graph will be connected
automatically in the event "n_neighbors" was not large enough
to connect it.
* 'pairwise' will compute the entire fully-connected graph of
pairwise distances between each set of points. This is the
fastest to compute and can be very fast for smaller datasets
but requires O(n^2) space.
* 'knn' will sparsify the fully-connected connectivity matrix to
save memory and enable much larger inputs. "n_neighbors" will
control the amount of memory used and the graph will be connected
automatically in the event "n_neighbors" was not large enough
to connect it.
output_type : {'input', 'cudf', 'cupy', 'numpy', 'numba'}, default=None
Variable to control output type of the results and attributes of
the estimator. If None, it'll inherit the output type set at the
Expand Down
11 changes: 5 additions & 6 deletions python/cuml/cluster/hdbscan.pyx
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
#
# Copyright (c) 2021-2022, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
Expand Down Expand Up @@ -293,7 +292,6 @@ class HDBSCAN(Base, ClusterMixin, CMajorInputTagMixin):
alpha : float, optional (default=1.0)
A distance scaling parameter as used in robust single linkage.
See [2]_ for more information.
verbose : int or boolean, default=False
Sets logging level. It must be one of `cuml.common.logger.level_*`.
Expand All @@ -311,7 +309,7 @@ class HDBSCAN(Base, ClusterMixin, CMajorInputTagMixin):
cluster_selection_epsilon : float, optional (default=0.0)
A distance threshold. Clusters below this value will be merged.
See [3]_ for more information. Note that this should not be used
Note that this should not be used
if we want to predict the cluster labels for new points in future
(e.g. using approximate_predict), as the approximate_predict function
is not aware of this argument.
Expand Down Expand Up @@ -342,6 +340,7 @@ class HDBSCAN(Base, ClusterMixin, CMajorInputTagMixin):
to find the most persistent clusters. Alternatively you can instead
select the clusters at the leaves of the tree -- this provides the
most fine grained and homogeneous clusters. Options are:
* ``eom``
* ``leaf``
Expand All @@ -351,17 +350,17 @@ class HDBSCAN(Base, ClusterMixin, CMajorInputTagMixin):
the case that you feel this is a valid result for your dataset.
gen_min_span_tree : bool, optional (default=False)
Whether to populate the minimum_spanning_tree_ member for
Whether to populate the `minimum_spanning_tree_` member for
utilizing plotting tools. This requires the `hdbscan` CPU Python
package to be installed.
gen_condensed_tree : bool, optional (default=False)
Whether to populate the condensed_tree_ member for
Whether to populate the `condensed_tree_` member for
utilizing plotting tools. This requires the `hdbscan` CPU
Python package to be installed.
gen_single_linkage_tree_ : bool, optinal (default=False)
Whether to populate the single_linkage_tree_ member for
Whether to populate the `single_linkage_tree_` member for
utilizing plotting tools. This requires the `hdbscan` CPU
Python package t be installed.
Expand Down
6 changes: 4 additions & 2 deletions python/cuml/dask/cluster/kmeans.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,12 +141,14 @@ def fit(self, X, sample_weight=None):
X : Dask cuDF DataFrame or CuPy backed Dask Array
Training data to cluster.
sample_weight : Dask cuDF DataFrame or CuPy backed Dask Array
shape = (n_samples,), default=None # noqa
sample_weight : Dask cuDF DataFrame or CuPy backed Dask Array \
shape = (n_samples,), default=None # noqa
The weights for each observation in X. If None, all observations
are assigned equal weight.
Acceptable formats: cuDF DataFrame, NumPy ndarray, Numba device
ndarray, cuda array interface compliant array like CuPy
"""

sample_weight = self._check_normalize_sample_weight(sample_weight)
Expand Down
4 changes: 4 additions & 0 deletions python/cuml/dask/ensemble/randomforestclassifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ class RandomForestClassifier(BaseRandomForestModel, DelayedPredictionMixin,
* ``4`` or ``'poisson'`` for poisson half deviance
* ``5`` or ``'gamma'`` for gamma half deviance
* ``6`` or ``'inverse_gaussian'`` for inverse gaussian deviance
``2``, ``'mse'``, ``4``, ``'poisson'``, ``5``, ``'gamma'``, ``6``,
``'inverse_gaussian'`` not valid for classification
bootstrap : boolean (default = True)
Expand All @@ -105,6 +106,7 @@ class RandomForestClassifier(BaseRandomForestModel, DelayedPredictionMixin,
* If ``'sqrt'`` then ``max_features=1/sqrt(n_features)``.
* If ``'log2'`` then ``max_features=log2(n_features)/n_features``.
* If ``None``, then ``max_features = 1.0``.
n_bins : int (default = 128)
Maximum number of bins used by the split algorithm per feature.
min_samples_leaf : int or float (default = 1)
Expand All @@ -114,6 +116,7 @@ class RandomForestClassifier(BaseRandomForestModel, DelayedPredictionMixin,
* If ``float``, then ``min_samples_leaf`` represents a fraction
and ``ceil(min_samples_leaf * n_rows)`` is the minimum number of
samples for each leaf node.
min_samples_split : int or float (default = 2)
The minimum number of samples required to split an internal
node.\n
Expand All @@ -122,6 +125,7 @@ class RandomForestClassifier(BaseRandomForestModel, DelayedPredictionMixin,
* If type ``float``, then ``min_samples_split`` represents a fraction
and ``ceil(min_samples_split * n_rows)`` is the minimum number of
samples for each split.
n_streams : int (default = 4 )
Number of parallel streams used for forest building
workers : optional, list of strings
Expand Down
1 change: 1 addition & 0 deletions python/cuml/dask/ensemble/randomforestregressor.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ class RandomForestRegressor(BaseRandomForestModel, DelayedPredictionMixin,
* ``4`` or ``'poisson'`` for poisson half deviance
* ``5`` or ``'gamma'`` for gamma half deviance
* ``6`` or ``'inverse_gaussian'`` for inverse gaussian deviance
``0``, ``'gini'``, ``1``, ``'entropy'`` not valid for regression
bootstrap : boolean (default = True)
Control bootstrapping.\n
Expand Down
1 change: 1 addition & 0 deletions python/cuml/ensemble/randomforestclassifier.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@ class RandomForestClassifier(BaseRandomForestModel,
* ``4`` or ``'poisson'`` for poisson half deviance
* ``5`` or ``'gamma'`` for gamma half deviance
* ``6`` or ``'inverse_gaussian'`` for inverse gaussian deviance
only ``0``/``'gini'`` and ``1``/``'entropy'`` valid for classification
bootstrap : boolean (default = True)
Control bootstrapping.\n
Expand Down
1 change: 1 addition & 0 deletions python/cuml/ensemble/randomforestregressor.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ class RandomForestRegressor(BaseRandomForestModel,
* ``4`` or ``'poisson'`` for poisson half deviance
* ``5`` or ``'gamma'`` for gamma half deviance
* ``6`` or ``'inverse_gaussian'`` for inverse gaussian deviance
``0``, ``'gini'``, ``1`` and ``'entropy'`` not valid for regression.
bootstrap : boolean (default = True)
Control bootstrapping.\n
Expand Down
3 changes: 2 additions & 1 deletion python/cuml/feature_extraction/_tfidf_vectorizer.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2020-2021, NVIDIA CORPORATION.
# Copyright (c) 2020-2022, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -260,6 +260,7 @@ def transform(self, raw_documents):
def get_feature_names(self):
"""
Array mapping from feature integer indices to feature name.
Returns
-------
feature_names : Series
Expand Down
2 changes: 0 additions & 2 deletions python/cuml/fil/fil.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -578,7 +578,6 @@ class ForestInference(Base,

Parameters
----------
{}
preds : gpuarray or cudf.Series, shape = (n_samples,)
Optional 'out' location to store inference results

Expand Down Expand Up @@ -607,7 +606,6 @@ class ForestInference(Base,

Parameters
----------
{}
preds : gpuarray or cudf.Series, shape = (n_samples,2)
Binary probability output
Optional 'out' location to store inference results
Expand Down
1 change: 1 addition & 0 deletions python/cuml/metrics/pairwise_distances.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,7 @@ def sparse_pairwise_distances(X, Y=None, metric="euclidean", handle=None,
See the documentation for scipy.spatial.distance for details on these
metrics.
- ['inner_product', 'hellinger']
Parameters
----------
X : array-like (device or host) of shape (n_samples_x, n_features)
Expand Down
5 changes: 2 additions & 3 deletions python/cuml/metrics/pairwise_kernels.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,9 +202,8 @@ def pairwise_kernels(X, Y=None, metric="linear", *,
array.
If Y is given (default is None), then the returned matrix is the pairwise
kernel between the arrays from both X and Y.
Valid values for metric are:
['additive_chi2', 'chi2', 'linear', 'poly', 'polynomial', 'rbf',
'laplacian', 'sigmoid', 'cosine']
Valid values for metric are: ['additive_chi2', 'chi2', 'linear', 'poly',
'polynomial', 'rbf', 'laplacian', 'sigmoid', 'cosine']
Parameters
----------
Expand Down
4 changes: 4 additions & 0 deletions python/cuml/naive_bayes/naive_bayes.py
Original file line number Diff line number Diff line change
Expand Up @@ -1524,6 +1524,7 @@ def _check_X(self, X):

def fit(self, X, y, sample_weight=None) -> "CategoricalNB":
"""Fit Naive Bayes classifier according to X, y
Parameters
----------
X : array-like of shape (n_samples, n_features)
Expand All @@ -1539,6 +1540,7 @@ def fit(self, X, y, sample_weight=None) -> "CategoricalNB":
sample_weight : array-like of shape (n_samples), default=None
Weights applied to individual samples (1. for unweighted).
Currently sample weight is ignored.
Returns
-------
self : object
Expand All @@ -1556,6 +1558,7 @@ def partial_fit(self, X, y, classes=None,
This method has some performance overhead hence it is better to call
partial_fit on chunks of data that are as large as possible
(as long as fitting in the memory budget) to hide the overhead.
Parameters
----------
X : array-like of shape (n_samples, n_features)
Expand All @@ -1575,6 +1578,7 @@ def partial_fit(self, X, y, classes=None,
sample_weight : array-like of shape (n_samples), default=None
Weights applied to individual samples (1. for unweighted).
Currently sample weight is ignored.
Returns
-------
self : object
Expand Down
2 changes: 1 addition & 1 deletion python/cuml/preprocessing/TargetEncoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ class TargetEncoder:
'continuous': consecutive samples are grouped into one folds.
'interleaved': samples are assign to each fold in a round robin way.
'customize': customize splitting by providing a `fold_ids` array
in `fit()` or `fit_transform()` functions.
in `fit()` or `fit_transform()` functions.
output_type: {'cupy', 'numpy', 'auto'}, default = 'auto'
The data type of output. If 'auto', it matches input data.
stat: {'mean','var'}, default = 'mean'
Expand Down

0 comments on commit 8f30123

Please sign in to comment.