-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOC] Convert all remaining Python docstrings to pydoc and examples to doctest #2415
Comments
This seems something I can handle. If nobody else is working on it, can I claim this issue? |
@yuqli I think some of these might have been fixed by #2649 and others will soon be automatically handled by a decorator #2635 (automatically generating docstrings for fit/predict/transform/etc methods). Though that decorator doesn't touch the cuml.dask docstrings, so overall checking the status around using 2635 as a base might be a very good idea and very much welcomed! |
It would be great to update some of the python docstrings to use the doctest format in this issue. Right now most of our examples look like: Examples
--------
.. code-block:: python
import cupy as cp
from cuml.metrics import pairwise_distances
X = cp.array([[2.0, 3.0], [3.0, 5.0], [5.0, 8.0]])
Y = cp.array([[1.0, 0.0], [2.0, 1.0]])
# Euclidean Pairwise Distance, Single Input:
pairwise_distances(X, metric='euclidean')
# Cosine Pairwise Distance, Multi-Input:
pairwise_distances(X, Y, metric='cosine')
# Manhattan Pairwise Distance, Multi-Input:
pairwise_distances(X, Y, metric='manhattan')
Output:
.. code-block:: python
array([[0. , 2.23606798, 5.83095189],
[2.23606798, 0. , 3.60555128],
[5.83095189, 3.60555128, 0. ]])
array([[0.4452998 , 0.13175686],
[0.48550424, 0.15633851],
[0.47000106, 0.14671817]])
array([[ 4., 2.],
[ 7., 5.],
[12., 10.]]) Instead, the doctest format would be (generated by literally copying the example section into a python interactive session): Examples
--------
>>> import cupy as cp
>>> from cuml.metrics import pairwise_distances
>>>
>>> X = cp.array([[2.0, 3.0], [3.0, 5.0], [5.0, 8.0]])
>>> Y = cp.array([[1.0, 0.0], [2.0, 1.0]])
>>>
>>> # Euclidean Pairwise Distance, Single Input:
>>> pairwise_distances(X, metric='euclidean')
array([[0. , 2.23606798, 5.83095189],
[2.23606798, 0. , 3.60555128],
[5.83095189, 3.60555128, 0. ]])
>>>
>>> # Cosine Pairwise Distance, Multi-Input:
>>> pairwise_distances(X, Y, metric='cosine')
array([[0.4452998 , 0.13175686],
[0.48550424, 0.15633851],
[0.47000106, 0.14671817]])
>>>
>>> # Manhattan Pairwise Distance, Multi-Input:
>>> pairwise_distances(X, Y, metric='manhattan')
array([[ 4., 2.],
[ 7., 5.],
[12., 10.]]) Which doesnt look better in Github markdown, but is a big improvement in Sphinx. This allows the user to see the output inline with the code that generated it and the |
Thanks for the reply and the instruction. Sure I will take care of them. Thanks. |
Sorry for the delay. I have converted the "Example" section to doctest for some modules. Just wondering how big should a pull request be? Should I submit a PR after finishing all the changes, or should I submit a PR for say every ~200 lines of code change? Thanks! |
This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. |
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d. |
We should revive this issue. cuGraph has an open issue for this and cuDF just did this and found quite a few issues. It will fix broken documentation examples like the following, which will definitely fail as written ( from cuml import SVR
from cuml import make_regression
from cuml import train_test_split
from cuml.explainer import KernelExplainer
X, y = make_regression(
n_samples=102,
n_features=10,
noise=0.1,
random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=2,
random_state=42)
model = SVR().fit(X_train, y_train)
cu_explainer = KernelExplainer(
model=model.predict,
data=X_train,
gpu_model=True)
cu_shap_values = cu_explainer.shap_values(X_test)
cu_shap_values |
Linking PR #4618 |
There are several places in the codebase currently that are not using the proper docstrings format [1].
It would be worth scraping through the codebase and updating these.
[1] https://numpydoc.readthedocs.io/en/latest/format.html
The text was updated successfully, but these errors were encountered: