-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use doctest for testing python example docstrings #1073
Conversation
Similar to rapidsai/cudf#9815, this change uses doctest to test that the pylibraft example docstrings run without issue. This caught several errors in the example docstrings, that are also fixed in this PR: * a missing ‘device_ndarray’ import in kmeans fit when the centroids weren’t explicitly passed in * an error in the fused_l2_nn_argmin docstring where output wasn’t defined * An `AttributeError: module 'pylibraft.neighbors.ivf_pq' has no attribute 'np'` error in ivf_pq Closes rapidsai#981
>>> | ||
... n_probes=20, | ||
... lut_dtype=np.float16, | ||
... internal_distance_dtype=np.float32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be cp.float16
and cp.float32
? The numpy import might not be necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cupy has been purposefully left out of the list of dependencies for RAFT because cupy actually depends on RAFT. This is why we just assume numpy, since it's likely to already be installed in the user's environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, we do use cupy in the usage examples, but we could just as easily use PyTorch or Numba in the examples. We provide a "device_ndarray" object to keep RAFT itself dependency agnostic but interoperable. I think we should build a notebook for this very topic at some point soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cupy is already being used in this example snippet - including for dtypes (on line 621 for instance). I've updated in the last commit to use cupy here to be consistent.
I think putting together a notebook showing interop with other frameworks like pytorch/tensorflow/jax is a great idea (or even interop with just numpy without using cupy) - but is out of scope of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cupy is already being used in this example snippet - including for dtypes (on line 621 for instance). I've updated in the last commit to use cupy here to be consistent.
I missed the detail that this comment was made on a usage example in my initial reply to @lowener. However, it got me thinking more about it and now I'm now confused how these tests are passing without cupy installed. I don't see it being installed in anywhere explicitly in our ci scripts or conda environments but I do see it in the CI logs for the gpu test tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it must already be packaged in the docker container CI is using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me! I'm excited to be testing these.
Once we add some notebooks to the repo, we should also look at how cuml and cugraph (and I believe cudf?) are converting those to python scripts and testing them in CI.
>>> X = cp.random.random_sample((n_samples, n_features), | ||
>>> dtype=cp.float32) | ||
>>> | ||
... dtype=cp.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is good to know. So this is how we represent continuation of previous line in the pydocs.
>>> | ||
... n_probes=20, | ||
... lut_dtype=np.float16, | ||
... internal_distance_dtype=np.float32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, we do use cupy in the usage examples, but we could just as easily use PyTorch or Numba in the examples. We provide a "device_ndarray" object to keep RAFT itself dependency agnostic but interoperable. I think we should build a notebook for this very topic at some point soon.
For testing notebooks, I think that cudf uses some bash script's to run notebooks using ipython https://github.com/rapidsai/cudf/blob/branch-23.02/ci/utils/nbtest.sh . Alternatively- the Merlin team is currently using testbook to test out their notebooks - and it seems to work pretty well. |
That looks perfect! |
@gpucibot merge |
Similar to rapidsai/cudf#9815, this change uses doctest to test that the pylibraft example docstrings run without issue.
This caught several errors in the example docstrings, that are also fixed in this PR:
AttributeError: module 'pylibraft.neighbors.ivf_pq' has no attribute 'np'
error in ivf_pqCloses #981