-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run doctests. #9815
Run doctests. #9815
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.02 #9815 +/- ##
================================================
- Coverage 10.49% 10.41% -0.08%
================================================
Files 119 119
Lines 20305 20541 +236
================================================
+ Hits 2130 2139 +9
- Misses 18175 18402 +227
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good to me! I just have a few small questions.
>>> df = df.set_index([3, 2, 1, 0]) | ||
... 'y': [1.0, 3.3, 2.2, 4.4], | ||
... 'z': ['a', 'b', 'c', 'd']}) | ||
>>> df = df.set_index(cudf.Series([3, 2, 1, 0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary?
>>> df = df.set_index(cudf.Series([3, 2, 1, 0])) | |
>>> df = df.set_index([3, 2, 1, 0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still confused by what caused this to be introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed this comment in the previous round. Without cudf.Series(...)
, this raises:
KeyError: 'None of [3, 2, 1, 0] are in the columns'
cudf/python/cudf/cudf/core/dataframe.py
Lines 2397 to 2398 in b01c846
if col_not_found: | |
raise KeyError(f"None of {col_not_found} are in the columns") |
Seems like a bug to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, pandas does the same.
>>> import pandas as pd
>>> df = pd.DataFrame({"a": [1, 2, 3, 4]})
>>> df
a
0 1
1 2
2 3
3 4
>>> df.set_index([3, 2, 0, 1])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File ".../site-packages/pandas/core/frame.py", line 5451, in set_index
raise KeyError(f"None of {missing} are in the columns")
KeyError: 'None of [3, 2, 0, 1] are in the columns'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow that is a terrible API. Passing a Series makes it become the index, while passing an iterable is assumed to mean you're passing an iterable of strings indicating the names of the columns that should become the new index. Well I see why the change is necessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Fantastic work, @bdice :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one outstanding question from my last review, but nothing worth holding this up over. Address if you can. Nice work!
@shwina @vyasr FYI, I pushed two more commits that were needed to fix some issues. Feel free to review:
|
|
||
# Capture stdout and include failing outputs in the traceback. | ||
doctest_stdout = io.StringIO() | ||
with contextlib.redirect_stdout(doctest_stdout): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to redirect stderr, or does runner.run
automatically do a 2>&1 internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably doesn’t matter, the part I want to capture is always sent to stdout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doctest runner just writes directly to stdout when failures occur, rather than raising or providing a helpful object with the failure info. Getting that into a traceback requires the capture.
@bdice the changes look good to me, just one question about output capturing. |
@gpucibot merge |
Similar to rapidsai/cudf#9815, this change uses doctest to test that the pylibraft example docstrings run without issue. This caught several errors in the example docstrings, that are also fixed in this PR: * a missing ‘device_ndarray’ import in kmeans fit when the centroids weren’t explicitly passed in * an error in the fused_l2_nn_argmin docstring where output wasn’t defined * An `AttributeError: module 'pylibraft.neighbors.ivf_pq' has no attribute 'np'` error in ivf_pq Closes rapidsai#981
Similar to rapidsai/cudf#9815, this change uses doctest to test that the pylibraft example docstrings run without issue. This caught several errors in the example docstrings, that are also fixed in this PR: * a missing ‘device_ndarray’ import in kmeans fit when the centroids weren’t explicitly passed in * an error in the fused_l2_nn_argmin docstring where output wasn’t defined * An `AttributeError: module 'pylibraft.neighbors.ivf_pq' has no attribute 'np'` error in ivf_pq Closes #981 Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1073
This PR adds doctests and resolves #9513. Several issues were found by running doctests that have now been resolved:
Series.applymap
#9821top
in categorical column #9825