Run doctests. #9815

bdice · 2021-12-01T22:05:09Z

This PR adds doctests and resolves #9513. Several issues were found by running doctests that have now been resolved:

codecov · 2022-01-13T21:39:36Z

Codecov Report

Merging #9815 (e1a19bc) into branch-22.02 (967a333) will decrease coverage by 0.07%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-22.02    #9815      +/-   ##
================================================
- Coverage         10.49%   10.41%   -0.08%     
================================================
  Files               119      119              
  Lines             20305    20541     +236     
================================================
+ Hits               2130     2139       +9     
- Misses            18175    18402     +227

Impacted Files	Coverage Δ
python/custreamz/custreamz/kafka.py	`29.16% <0.00%> (-0.63%)`	⬇️
python/dask_cudf/dask_cudf/sorting.py	`92.66% <0.00%> (-0.25%)`	⬇️
python/dask_cudf/dask_cudf/core.py	`70.85% <0.00%> (-0.17%)`	⬇️
python/cudf/cudf/__init__.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/frame.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/index.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/parquet.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/series.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/utils/utils.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/api/__init__.py	`0.00% <0.00%> (ø)`
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c07fdab...e1a19bc. Read the comment docs.

vyasr

Looks pretty good to me! I just have a few small questions.

python/cudf/cudf/api/extensions/accessor.py

vyasr · 2022-01-13T21:43:27Z

python/cudf/cudf/utils/ioutils.py

->>> df = df.set_index([3, 2, 1, 0])
+...                      'y': [1.0, 3.3, 2.2, 4.4],
+...                      'z': ['a', 'b', 'c', 'd']})
+>>> df = df.set_index(cudf.Series([3, 2, 1, 0]))


Is this necessary?

Suggested change

>>> df = df.set_index(cudf.Series([3, 2, 1, 0]))

>>> df = df.set_index([3, 2, 1, 0])

Still confused by what caused this to be introduced.

Missed this comment in the previous round. Without cudf.Series(...), this raises:
KeyError: 'None of [3, 2, 1, 0] are in the columns'

cudf/python/cudf/cudf/core/dataframe.py

Lines 2397 to 2398 in b01c846

if col_not_found:

raise KeyError(f"None of {col_not_found} are in the columns")

Seems like a bug to me.

Nope, pandas does the same.

>>> import pandas as pd >>> df = pd.DataFrame({"a": [1, 2, 3, 4]}) >>> df a 0 1 1 2 2 3 3 4 >>> df.set_index([3, 2, 0, 1]) Traceback (most recent call last): File "<stdin>", line 1, in <module> File ".../site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(*args, **kwargs) File ".../site-packages/pandas/core/frame.py", line 5451, in set_index raise KeyError(f"None of {missing} are in the columns") KeyError: 'None of [3, 2, 0, 1] are in the columns'

Wow that is a terrible API. Passing a Series makes it become the index, while passing an iterable is assumed to mean you're passing an iterable of strings indicating the names of the columns that should become the new index. Well I see why the change is necessary here.

python/cudf/cudf/tests/test_doctests.py

shwina

LGTM! Fantastic work, @bdice :)

vyasr

There's one outstanding question from my last review, but nothing worth holding this up over. Address if you can. Nice work!

…ptions.

bdice · 2022-01-14T22:08:04Z

@shwina @vyasr FYI, I pushed two more commits that were needed to fix some issues. Feel free to review:

f9512ad shows the failed doctest outputs in the traceback, making it easier to debug from the failed tests output in gpuCI (no need to read the whole console output)
e1a19bc fixed an issue where test_dataframe_to_string was leaking state into pandas.options.display.max_rows and pandas.options.display.max_columns, which made some doctests fail. This commit rewrites that test to fix the leaky state problem, makes it more rigorous (checking the whole output rather than the last line), fixes an issue where the test wasn't actually doing what it said (# Test skipped columns was not skipping columns) and breaks the test into three tests with one assert instead of one test with three asserts.

vyasr · 2022-01-14T22:12:03Z

python/cudf/cudf/tests/test_doctests.py

+
+        # Capture stdout and include failing outputs in the traceback.
+        doctest_stdout = io.StringIO()
+        with contextlib.redirect_stdout(doctest_stdout):


Do we also need to redirect stderr, or does runner.run automatically do a 2>&1 internally?

It probably doesn’t matter, the part I want to capture is always sent to stdout.

The doctest runner just writes directly to stdout when failures occur, rather than raising or providing a helpful object with the failure info. Getting that into a traceback requires the capture.

vyasr · 2022-01-14T22:14:31Z

@bdice the changes look good to me, just one question about output capturing.

bdice · 2022-01-15T00:35:40Z

@gpucibot merge

Similar to rapidsai/cudf#9815, this change uses doctest to test that the pylibraft example docstrings run without issue. This caught several errors in the example docstrings, that are also fixed in this PR: * a missing ‘device_ndarray’ import in kmeans fit when the centroids weren’t explicitly passed in * an error in the fused_l2_nn_argmin docstring where output wasn’t defined * An `AttributeError: module 'pylibraft.neighbors.ivf_pq' has no attribute 'np'` error in ivf_pq Closes rapidsai#981

Similar to rapidsai/cudf#9815, this change uses doctest to test that the pylibraft example docstrings run without issue. This caught several errors in the example docstrings, that are also fixed in this PR: * a missing ‘device_ndarray’ import in kmeans fit when the centroids weren’t explicitly passed in * an error in the fused_l2_nn_argmin docstring where output wasn’t defined * An `AttributeError: module 'pylibraft.neighbors.ivf_pq' has no attribute 'np'` error in ivf_pq Closes #981 Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1073

bdice added 9 commits October 29, 2021 15:01

Add doctests script.

c11b9a4

Intermediate progress.

5e88c67

Merge remote-tracking branch 'upstream/branch-21.12' into run-doctests

f51ff21

Merge branch 'branch-22.02' into run-doctests

e3ee9db

Merge branch 'branch-22.02' into run-doctests

c16af98

Merge remote-tracking branch 'upstream/branch-22.02' into run-doctests

bab366b

Update __all__ in cudf/__init__.py.

bb37a38

Fix recursion logic for modules and classes.

e4330af

Make test methods private.

1ed143a

bdice self-assigned this Dec 1, 2021

github-actions bot added the Python Affects Python cuDF API. label Dec 1, 2021

bdice added doc Documentation tests Unit testing for project improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed doc Documentation labels Dec 1, 2021

bdice added 13 commits December 1, 2021 20:29

Use <NA> instead of null.

7155cf4

Inject globals into doctests.

36c819f

Add cudf to globals.

4f18028

Fix Series.dt.

427a724

Fix Series.memory_usage.

46c6435

Fix Series.hash_encode(..., use_name=True).

38b2fd8

Fix Series.keys.

ac0e174

Fix Series.drop.

b380afe

Fix Series.dropna.

318a0b7

Fix Series.data, Series.as_mask.

1e4f183

Fix Series.cat.

2da598d

Fix Scalar.

e60e909

Fix MultiIndex.

b7443d3

bdice requested a review from a team as a code owner January 13, 2022 19:32

vyasr requested changes Jan 13, 2022

View reviewed changes

Remove try/finally.

d6553db

shwina reviewed Jan 13, 2022

View reviewed changes

python/cudf/cudf/tests/test_doctests.py Outdated Show resolved Hide resolved

bdice added 4 commits January 13, 2022 14:27

Use assert not...

8eea41a

Use NumPy-style docstring.

95303a3

Improve defaults in doctest finder.

f4254fd

Remove __all__ from accessor.

3110606

bdice requested review from vyasr and shwina January 13, 2022 22:38

shwina approved these changes Jan 13, 2022

View reviewed changes

vyasr approved these changes Jan 13, 2022

View reviewed changes

bdice added 3 commits January 14, 2022 13:12

Show doctest failures in the traceback.

f9512ad

Prevent test_dataframe_to_string from leaking state into the pandas o…

64a17c7

…ptions.

Split test_dataframe_to_string into multiple tests.

e1a19bc

vyasr reviewed Jan 14, 2022

View reviewed changes

rapids-bot bot merged commit e24fa8f into rapidsai:branch-22.02 Jan 15, 2022

bdice mentioned this pull request Jan 15, 2022

Implement DataFrame diff() #9817

Merged

beckernick mentioned this pull request Feb 23, 2022

[DOC] Convert all remaining Python docstrings to pydoc and examples to doctest rapidsai/cuml#2415

Closed

isVoid mentioned this pull request Sep 27, 2022

[FEA] Execute example code in CI rapidsai/cuspatial#697

Open

benfred mentioned this pull request Dec 7, 2022

Use doctest for testing python example docstrings rapidsai/raft#1073

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run doctests. #9815

Run doctests. #9815

bdice commented Dec 1, 2021 •

edited

Loading

codecov bot commented Jan 13, 2022 •

edited

Loading

vyasr left a comment

vyasr Jan 13, 2022

vyasr Jan 13, 2022

bdice Jan 14, 2022 •

edited

Loading

bdice Jan 14, 2022

vyasr Jan 14, 2022

shwina left a comment

vyasr left a comment

bdice commented Jan 14, 2022 •

edited

Loading

vyasr Jan 14, 2022

bdice Jan 14, 2022

bdice Jan 14, 2022 •

edited

Loading

vyasr commented Jan 14, 2022

bdice commented Jan 15, 2022

	>>> df = df.set_index(cudf.Series([3, 2, 1, 0]))
	>>> df = df.set_index([3, 2, 1, 0])

	if col_not_found:
	raise KeyError(f"None of {col_not_found} are in the columns")

Run doctests. #9815

Run doctests. #9815

Conversation

bdice commented Dec 1, 2021 • edited Loading

codecov bot commented Jan 13, 2022 • edited Loading

Codecov Report

vyasr left a comment

Choose a reason for hiding this comment

vyasr Jan 13, 2022

Choose a reason for hiding this comment

vyasr Jan 13, 2022

Choose a reason for hiding this comment

bdice Jan 14, 2022 • edited Loading

Choose a reason for hiding this comment

bdice Jan 14, 2022

Choose a reason for hiding this comment

vyasr Jan 14, 2022

Choose a reason for hiding this comment

shwina left a comment

Choose a reason for hiding this comment

vyasr left a comment

Choose a reason for hiding this comment

bdice commented Jan 14, 2022 • edited Loading

vyasr Jan 14, 2022

Choose a reason for hiding this comment

bdice Jan 14, 2022

Choose a reason for hiding this comment

bdice Jan 14, 2022 • edited Loading

Choose a reason for hiding this comment

vyasr commented Jan 14, 2022

bdice commented Jan 15, 2022

bdice commented Dec 1, 2021 •

edited

Loading

codecov bot commented Jan 13, 2022 •

edited

Loading

bdice Jan 14, 2022 •

edited

Loading

bdice commented Jan 14, 2022 •

edited

Loading

bdice Jan 14, 2022 •

edited

Loading