Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run doctests. #9815

Merged
merged 70 commits into from
Jan 15, 2022
Merged

Run doctests. #9815

merged 70 commits into from
Jan 15, 2022

Conversation

@bdice bdice self-assigned this Dec 1, 2021
@github-actions github-actions bot added the Python Affects Python cuDF API. label Dec 1, 2021
@bdice bdice added doc Documentation tests Unit testing for project improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed doc Documentation labels Dec 1, 2021
@bdice bdice requested a review from a team as a code owner January 13, 2022 19:32
@codecov
Copy link

codecov bot commented Jan 13, 2022

Codecov Report

Merging #9815 (e1a19bc) into branch-22.02 (967a333) will decrease coverage by 0.07%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.02    #9815      +/-   ##
================================================
- Coverage         10.49%   10.41%   -0.08%     
================================================
  Files               119      119              
  Lines             20305    20541     +236     
================================================
+ Hits               2130     2139       +9     
- Misses            18175    18402     +227     
Impacted Files Coverage Δ
python/custreamz/custreamz/kafka.py 29.16% <0.00%> (-0.63%) ⬇️
python/dask_cudf/dask_cudf/sorting.py 92.66% <0.00%> (-0.25%) ⬇️
python/dask_cudf/dask_cudf/core.py 70.85% <0.00%> (-0.17%) ⬇️
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/index.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/parquet.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/series.py 0.00% <0.00%> (ø)
python/cudf/cudf/utils/utils.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/__init__.py 0.00% <0.00%> (ø)
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c07fdab...e1a19bc. Read the comment docs.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me! I just have a few small questions.

python/cudf/cudf/api/extensions/accessor.py Outdated Show resolved Hide resolved
>>> df = df.set_index([3, 2, 1, 0])
... 'y': [1.0, 3.3, 2.2, 4.4],
... 'z': ['a', 'b', 'c', 'd']})
>>> df = df.set_index(cudf.Series([3, 2, 1, 0]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

Suggested change
>>> df = df.set_index(cudf.Series([3, 2, 1, 0]))
>>> df = df.set_index([3, 2, 1, 0])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still confused by what caused this to be introduced.

Copy link
Contributor Author

@bdice bdice Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this comment in the previous round. Without cudf.Series(...), this raises:
KeyError: 'None of [3, 2, 1, 0] are in the columns'

if col_not_found:
raise KeyError(f"None of {col_not_found} are in the columns")

Seems like a bug to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, pandas does the same.

>>> import pandas as pd
>>> df = pd.DataFrame({"a": [1, 2, 3, 4]})
>>> df
   a
0  1
1  2
2  3
3  4
>>> df.set_index([3, 2, 0, 1])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File ".../site-packages/pandas/core/frame.py", line 5451, in set_index
    raise KeyError(f"None of {missing} are in the columns")
KeyError: 'None of [3, 2, 0, 1] are in the columns'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow that is a terrible API. Passing a Series makes it become the index, while passing an iterable is assumed to mean you're passing an iterable of strings indicating the names of the columns that should become the new index. Well I see why the change is necessary here.

python/cudf/cudf/tests/test_doctests.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_doctests.py Show resolved Hide resolved
python/cudf/cudf/tests/test_doctests.py Outdated Show resolved Hide resolved
@bdice bdice requested review from vyasr and shwina January 13, 2022 22:38
Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Fantastic work, @bdice :)

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one outstanding question from my last review, but nothing worth holding this up over. Address if you can. Nice work!

@bdice
Copy link
Contributor Author

bdice commented Jan 14, 2022

@shwina @vyasr FYI, I pushed two more commits that were needed to fix some issues. Feel free to review:

  • f9512ad shows the failed doctest outputs in the traceback, making it easier to debug from the failed tests output in gpuCI (no need to read the whole console output)
  • e1a19bc fixed an issue where test_dataframe_to_string was leaking state into pandas.options.display.max_rows and pandas.options.display.max_columns, which made some doctests fail. This commit rewrites that test to fix the leaky state problem, makes it more rigorous (checking the whole output rather than the last line), fixes an issue where the test wasn't actually doing what it said (# Test skipped columns was not skipping columns) and breaks the test into three tests with one assert instead of one test with three asserts.


# Capture stdout and include failing outputs in the traceback.
doctest_stdout = io.StringIO()
with contextlib.redirect_stdout(doctest_stdout):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to redirect stderr, or does runner.run automatically do a 2>&1 internally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably doesn’t matter, the part I want to capture is always sent to stdout.

Copy link
Contributor Author

@bdice bdice Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doctest runner just writes directly to stdout when failures occur, rather than raising or providing a helpful object with the failure info. Getting that into a traceback requires the capture.

@vyasr
Copy link
Contributor

vyasr commented Jan 14, 2022

@bdice the changes look good to me, just one question about output capturing.

@bdice
Copy link
Contributor Author

bdice commented Jan 15, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit e24fa8f into rapidsai:branch-22.02 Jan 15, 2022
benfred added a commit to benfred/raft that referenced this pull request Dec 7, 2022
Similar to rapidsai/cudf#9815, this change uses doctest
to test that the pylibraft example docstrings run without issue.

This caught several errors in the example docstrings, that are also fixed in this PR:
 *  a missing ‘device_ndarray’ import in kmeans fit when the centroids weren’t explicitly passed in
 *  an error in the fused_l2_nn_argmin docstring where output wasn’t defined
 *  An `AttributeError: module 'pylibraft.neighbors.ivf_pq' has no attribute 'np'` error in ivf_pq

Closes rapidsai#981
rapids-bot bot pushed a commit to rapidsai/raft that referenced this pull request Dec 7, 2022
Similar to rapidsai/cudf#9815, this change uses doctest to test that the pylibraft example docstrings run without issue.

This caught several errors in the example docstrings, that are also fixed in this PR:
 *  a missing ‘device_ndarray’ import in kmeans fit when the centroids weren’t explicitly passed in
 *  an error in the fused_l2_nn_argmin docstring where output wasn’t defined
 *  An `AttributeError: module 'pylibraft.neighbors.ivf_pq' has no attribute 'np'` error in ivf_pq

Closes #981

Authors:
  - Ben Frederickson (https://github.com/benfred)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1073
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API. tests Unit testing for project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Execute documentation examples in CI tests (doctests)
4 participants