Improve Python Docs with Default Role (#3445)

This PR sets the default type of "interpolated text" in sphinx to `:py:obj:`. This is very useful for us since we frequently use a single backtick in our python documentation to refer to another python object. Currently, the docstring: ``` `cuml.datasets.make_blobs` ``` Would generate (italicized, variable spaced): ![image](https://user-images.githubusercontent.com/42954918/106529509-dd4c1900-64a7-11eb-9977-49a2594c7c3e.png) This PR changes it to (bold, mono spaced): ![image](https://user-images.githubusercontent.com/42954918/106529282-77f82800-64a7-11eb-86e2-a38e37ccea0d.png) The added benefit here is if the interpolated text is found in the index, it will link to that section. So in the above example, clicking on `cuml.datasets.make_blobs` will take you to the function documentation. Finally, this PR adds a new type of interpolated text role: `:py:` . This should be used for inline python code. For example, the following code: ``` * `cuml.cuml.datasets.make_blobs` for references do objects (functions, classes, modules, etc.) * :py:`import cupy as cp` for inline python code * ``import cupy as cp`` for literal code ``` will generate: ![image](https://user-images.githubusercontent.com/42954918/106530276-3f594e00-64a9-11eb-8edf-569fc9dd829e.png) I also looked for a few examples to replace to help seed usage of these new options. Updating every location would be very time consuming and is best done over time. Authors: - Michael Demoret (@mdemoret-nv) Approvers: - Dante Gama Dessavre (@dantegd) URL: #3445
rapidsai · Feb 2, 2021 · 28953ab · 28953ab
1 parent fa9edbc
commit 28953ab
Show file tree

Hide file tree

Showing 8 changed files with 67 additions and 54 deletions.
diff --git a/docs/source/api.rst b/docs/source/api.rst
@@ -2,6 +2,11 @@
 cuML API Reference
 ~~~~~~~~~~~~~~~~~~~
 
+.. role:: py(code)
+   :language: python
+   :class: highlight
+
+
 Module Configuration
 ====================
 

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -1,7 +1,7 @@
 #!/usr/bin/env python3
 # -*- coding: utf-8 -*-
 #
-# Copyright (c) 2018, NVIDIA CORPORATION.
+# Copyright (c) 2018-2021, NVIDIA CORPORATION.
 #
 # This file is execfile()d with the current directory set to its
 # containing dir.
@@ -182,7 +182,10 @@
 ]
 
 # Example configuration for intersphinx: refer to the Python standard library.
-intersphinx_mapping = {'https://docs.python.org/': None}
+intersphinx_mapping = {
+    "python": ('https://docs.python.org/', None),
+    "scipy": ('https://docs.scipy.org/doc/scipy/reference', None)
+}
 
 # Config numpydoc
 numpydoc_show_inherited_class_members = False
@@ -201,3 +204,8 @@ def setup(app):
     'cuml', 'https://github.com/rapidsai/'
     'cuml/blob/{revision}/python/'
     '{package}/{path}#L{lineno}')
+
+# Set the default role for interpreted code (anything surrounded in `single
+# backticks`) to be a python object. See
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-default_role
+default_role = "py:obj"
diff --git a/python/cuml/common/input_utils.py b/python/cuml/common/input_utils.py
@@ -1,5 +1,5 @@
 #
-# Copyright (c) 2019-2020, NVIDIA CORPORATION.
+# Copyright (c) 2019-2021, NVIDIA CORPORATION.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -85,8 +85,8 @@ def get_supported_input_type(X):
     -----
     To closely match the functionality of
     :func:`~cuml.common.input_utils.input_to_cuml_array`, this method will
-    return ``cupy.ndarray`` for any object supporting
-    `__cuda_array_interface__` and ``numpy.ndarray`` for any object supporting
+    return `cupy.ndarray` for any object supporting
+    `__cuda_array_interface__` and `numpy.ndarray` for any object supporting
     `__array_interface__`.
 
     Returns

diff --git a/python/cuml/dask/datasets/classification.py b/python/cuml/dask/datasets/classification.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2020, NVIDIA CORPORATION.
+# Copyright (c) 2020-2021, NVIDIA CORPORATION.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -46,17 +46,17 @@ def make_classification(n_samples=100, n_features=20, n_informative=2,
 
     This initially creates clusters of points normally distributed (std=1)
     about vertices of an `n_informative`-dimensional hypercube with sides of
-    length ``2 * class_sep`` and assigns an equal number of clusters to each
+    length :py:`2 * class_sep` and assigns an equal number of clusters to each
     class. It introduces interdependence between these features and adds
     various types of further noise to the data.
 
-    Without shuffling, ``X`` horizontally stacks features in the following
+    Without shuffling, `X` horizontally stacks features in the following
     order: the primary `n_informative` features, followed by `n_redundant`
     linear combinations of the informative features, followed by `n_repeated`
     duplicates, drawn randomly with replacement from the informative and
     redundant features. The remaining features are filled with random noise.
     Thus, without shuffling, all useful features are contained in the columns
-    ``X[:, :n_informative + n_redundant + n_repeated]``.
+    :py:`X[:, :n_informative + n_redundant + n_repeated]`.
 
     Examples
     --------
@@ -104,7 +104,7 @@ def make_classification(n_samples=100, n_features=20, n_informative=2,
         The total number of features. These comprise `n_informative`
         informative features, `n_redundant` redundant features,
         `n_repeated` duplicated features and
-        ``n_features-n_informative-n_redundant-n_repeated`` useless features
+        :py:`n_features-n_informative-n_redundant-n_repeated` useless features
         drawn at random.
     n_informative : int, optional (default=2)
         The number of informative features. Each class is composed of a number
@@ -124,10 +124,10 @@ def make_classification(n_samples=100, n_features=20, n_informative=2,
         The number of classes (or labels) of the classification problem.
     n_clusters_per_class : int, optional (default=2)
         The number of clusters per class.
-    weights : array-like of shape ``(n_classes,)`` or ``(n_classes - 1,)``, \
-        (default=None)
+    weights : array-like of shape :py:`(n_classes,)` or :py:`(n_classes - 1,)`\
+        , (default=None)
         The proportions of samples assigned to each class. If None, then
-        classes are balanced. Note that if ``len(weights) == n_classes - 1``,
+        classes are balanced. Note that if :py:`len(weights) == n_classes - 1`,
         then the last class weight is automatically inferred.
         More than `n_samples` samples may be returned if the sum of
         `weights` exceeds 1.

diff --git a/python/cuml/datasets/blobs.py b/python/cuml/datasets/blobs.py
@@ -1,5 +1,5 @@
 #
-# Copyright (c) 2020, NVIDIA CORPORATION.
+# Copyright (c) 2020-2021, NVIDIA CORPORATION.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -80,12 +80,12 @@ def make_blobs(n_samples=100, n_features=2, centers=None, cluster_std=1.0,
         the number of samples per cluster.
     n_features : int, optional (default=2)
         The number of features for each sample.
-    centers : int or array of shape [n_centers, n_features], optional
+    centers : int or array of shape [`n_centers`, `n_features`], optional
         (default=None)
         The number of centers to generate, or the fixed center locations.
-        If n_samples is an int and centers is None, 3 centers are generated.
-        If n_samples is array-like, centers must be
-        either None or an array of length equal to the length of n_samples.
+        If `n_samples` is an int and centers is None, 3 centers are generated.
+        If `n_samples` is array-like, centers must be
+        either None or an array of length equal to the length of `n_samples`.
     cluster_std : float or sequence of floats, optional (default=1.0)
         The standard deviation of the clusters.
     center_box : pair of floats (min, max), optional (default=(-10.0, 10.0))

diff --git a/python/cuml/datasets/classification.py b/python/cuml/datasets/classification.py
@@ -54,17 +54,17 @@ def make_classification(n_samples=100, n_features=20, n_informative=2,
     """
     Generate a random n-class classification problem.
     This initially creates clusters of points normally distributed (std=1)
-    about vertices of an ``n_informative``-dimensional hypercube with sides of
-    length ``2*class_sep`` and assigns an equal number of clusters to each
+    about vertices of an `n_informative`-dimensional hypercube with sides of
+    length :py:`2*class_sep` and assigns an equal number of clusters to each
     class. It introduces interdependence between these features and adds
     various types of further noise to the data.
-    Without shuffling, ``X`` horizontally stacks features in the following
-    order: the primary ``n_informative`` features, followed by ``n_redundant``
-    linear combinations of the informative features, followed by ``n_repeated``
+    Without shuffling, `X` horizontally stacks features in the following
+    order: the primary `n_informative` features, followed by `n_redundant`
+    linear combinations of the informative features, followed by `n_repeated`
     duplicates, drawn randomly with replacement from the informative and
     redundant features. The remaining features are filled with random noise.
     Thus, without shuffling, all useful features are contained in the columns
-    ``X[:, :n_informative + n_redundant + n_repeated]``.
+    :py:`X[:, :n_informative + n_redundant + n_repeated]`.
 
     Examples
     --------
@@ -106,15 +106,15 @@ def make_classification(n_samples=100, n_features=20, n_informative=2,
     n_samples : int, optional (default=100)
         The number of samples.
     n_features : int, optional (default=20)
-        The total number of features. These comprise ``n_informative``
-        informative features, ``n_redundant`` redundant features,
-        ``n_repeated`` duplicated features and
-        ``n_features-n_informative-n_redundant-n_repeated`` useless features
+        The total number of features. These comprise `n_informative`
+        informative features, `n_redundant` redundant features,
+        `n_repeated` duplicated features and
+        :py:`n_features-n_informative-n_redundant-n_repeated` useless features
         drawn at random.
     n_informative : int, optional (default=2)
         The number of informative features. Each class is composed of a number
         of gaussian clusters each located around the vertices of a hypercube
-        in a subspace of dimension ``n_informative``. For each cluster,
+        in a subspace of dimension `n_informative`. For each cluster,
         informative features are drawn independently from  N(0, 1) and then
         randomly linearly combined within each cluster in order to add
         covariance. The clusters are then placed on the vertices of the
@@ -132,10 +132,10 @@ def make_classification(n_samples=100, n_features=20, n_informative=2,
     weights : array-like of shape (n_classes,) or (n_classes - 1,),\
               (default=None)
         The proportions of samples assigned to each class. If None, then
-        classes are balanced. Note that if ``len(weights) == n_classes - 1``,
+        classes are balanced. Note that if :py:`len(weights) == n_classes - 1`,
         then the last class weight is automatically inferred.
-        More than ``n_samples`` samples may be returned if the sum of
-        ``weights`` exceeds 1.
+        More than `n_samples` samples may be returned if the sum of
+        `weights` exceeds 1.
     flip_y : float, optional (default=0.01)
         The fraction of samples whose class is assigned randomly. Larger
         values introduce noise in the labels and make the classification
@@ -188,16 +188,16 @@ def make_classification(n_samples=100, n_features=20, n_informative=2,
            time for each feature class (informative, repeated, etc.) while
            also providing the added speedup of generating a big matrix
            on GPU
-        2. We generate `order=F` construction. We exploit the
+        2. We generate :py:`order=F` construction. We exploit the
            fact that X is a generated from a univariate normal, and
            covariance is introduced with matrix multiplications. Which means,
            we can generate X as a 1D array and just reshape it to the
            desired order, which only updates the metadata and eliminates
            copies
         3. Lastly, we also shuffle by construction. Centroid indices are
            permuted for each sample, and then we construct the data for
-           each centroid. This shuffle works for both `order=C` and
-           `order=F` and eliminates any need for secondary copies
+           each centroid. This shuffle works for both :py:`order=C` and
+           :py:`order=F` and eliminates any need for secondary copies
 
     References
     ----------

diff --git a/python/cuml/decomposition/incremental_pca.py b/python/cuml/decomposition/incremental_pca.py
@@ -1,5 +1,5 @@
 #
-# Copyright (c) 2020, NVIDIA CORPORATION.
+# Copyright (c) 2020-2021, NVIDIA CORPORATION.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -39,15 +39,15 @@ class IncrementalPCA(PCA):
     Depending on the size of the input data, this algorithm can be much
     more memory efficient than a PCA, and allows sparse input.
     This algorithm has constant memory complexity, on the order of
-    ``batch_size * n_features``, enabling use of np.memmap files without
+    :py:`batch_size * n_features`, enabling use of np.memmap files without
     loading the entire file into memory. For sparse matrices, the input
     is converted to dense in batches (in order to be able to subtract the
     mean) which avoids storing the entire dense matrix at any one time.
     The computational overhead of each SVD is
-    ``O(batch_size * n_features ** 2)``, but only 2 * batch_size samples
-    remain in memory at a time. There will be ``n_samples / batch_size``
+    :py:`O(batch_size * n_features ** 2)`, but only 2 * batch_size samples
+    remain in memory at a time. There will be :py:`n_samples / batch_size`
     SVD computations to get the principal components, versus 1 large SVD
-    of complexity ``O(n_samples * n_features ** 2)`` for PCA.
+    of complexity :py:`O(n_samples * n_features ** 2)` for PCA.
 
     Parameters
     ----------
@@ -60,21 +60,21 @@ class IncrementalPCA(PCA):
         handles in several streams.
         If it is None, a new one is created.
     n_components : int or None, (default=None)
-        Number of components to keep. If ``n_components`` is ``None``,
-        then ``n_components`` is set to ``min(n_samples, n_features)``.
+        Number of components to keep. If `n_components` is ``None``,
+        then `n_components` is set to :py:`min(n_samples, n_features)`.
     whiten : bool, optional
         If True, de-correlates the components. This is done by dividing them by
         the corresponding singular values then multiplying by sqrt(n_samples).
         Whitening allows each component to have unit variance and removes
         multi-collinearity. It might be beneficial for downstream
         tasks like LinearRegression where correlated features cause problems.
     copy : bool, (default=True)
-        If False, X will be overwritten. ``copy=False`` can be used to
+        If False, X will be overwritten. :py:`copy=False` can be used to
         save memory but is unsafe for general use.
     batch_size : int or None, (default=None)
         The number of samples to use for each batch. Only used when calling
-        ``fit``. If ``batch_size`` is ``None``, then ``batch_size``
-        is inferred from the data and set to ``5 * n_features``, to provide a
+        `fit`. If `batch_size` is ``None``, then `batch_size`
+        is inferred from the data and set to :py:`5 * n_features`, to provide a
         balance between approximation accuracy and memory consumption.
     verbose : int or boolean, default=False
         Sets logging level. It must be one of `cuml.common.logger.level_*`.
@@ -98,24 +98,24 @@ class IncrementalPCA(PCA):
         to 1.0.
     singular_values_ : array, shape (n_components,)
         The singular values corresponding to each of the selected components.
-        The singular values are equal to the 2-norms of the ``n_components``
+        The singular values are equal to the 2-norms of the `n_components`
         variables in the lower-dimensional space.
     mean_ : array, shape (n_features,)
-        Per-feature empirical mean, aggregate over calls to ``partial_fit``.
+        Per-feature empirical mean, aggregate over calls to `partial_fit`.
     var_ : array, shape (n_features,)
         Per-feature empirical variance, aggregate over calls to
-        ``partial_fit``.
+        `partial_fit`.
     noise_variance_ : float
         The estimated noise covariance following the Probabilistic PCA model
         from [4]_.
     n_components_ : int
         The estimated number of components. Relevant when
-        ``n_components=None``.
+        `n_components=None`.
     n_samples_seen_ : int
         The number of samples processed by the estimator. Will be reset on
-        new calls to fit, but increments across ``partial_fit`` calls.
+        new calls to fit, but increments across `partial_fit` calls.
     batch_size_ : int
-        Inferred batch size from ``batch_size``.
+        Inferred batch size from `batch_size`.
 
     Notes
     -----
@@ -126,8 +126,8 @@ class IncrementalPCA(PCA):
     decomposition used in specific situations to reduce the algorithmic
     complexity of the SVD. The source for this technique is [3]_. This
     technique has been omitted because it is advantageous only when decomposing
-    a matrix with ``n_samples >= 5/3 * n_features`` where ``n_samples`` and
-    ``n_features`` are the matrix rows and columns, respectively. In addition,
+    a matrix with :py:`n_samples >= 5/3 * n_features` where `n_samples` and
+    `n_features` are the matrix rows and columns, respectively. In addition,
     it hurts the readability of the implemented algorithm. This would be a good
     opportunity for future optimization, if it is deemed necessary.
 

diff --git a/python/cuml/svm/svc.pyx b/python/cuml/svm/svc.pyx
@@ -216,7 +216,7 @@ class SVC(SVMBase, ClassifierMixin):
     coef_ : float, shape (1, n_cols)
         Only available for linear kernels. It is the normal of the
         hyperplane.
-        ``coef_ = sum_k=1..n_support dual_coef_[k] * support_vectors[k,:]``
+        coef_ = sum_k=1..n_support dual_coef_[k] * support_vectors[k,:]
     classes_: shape (n_classes_,)
         Array of class labels
     n_classes_ : int