Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add codespell as a linter #5265

Merged
merged 14 commits into from
Mar 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@ repos:
types_or: [python, cython]
exclude: thirdparty
additional_dependencies: [flake8-force]
- repo: https://github.com/codespell-project/codespell
rev: v2.2.2
hooks:
- id: codespell
additional_dependencies: [tomli]
args: ["--toml", "pyproject.toml"]
exclude: (?x)^(.*stemmer.*|.*stop_words.*|^CHANGELOG.md$)
- repo: local
hooks:
- id: no-deprecationwarning
Expand Down
4 changes: 2 additions & 2 deletions BUILD.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ $ ./build.sh cuml --singlegpu # build the cuML python package without M
$ ./build.sh --ccache # use ccache to cache compilations, speeding up subsequent builds
```

By default, Ninja is used as the cmake generator. To override this and use (e.g.) `make`, define the `CMAKE_GENERATOR` environment variable accodingly:
By default, Ninja is used as the cmake generator. To override this and use (e.g.) `make`, define the `CMAKE_GENERATOR` environment variable accordingly:
```bash
CMAKE_GENERATOR='Unix Makefiles' ./build.sh
```
Expand Down Expand Up @@ -123,7 +123,7 @@ If using a conda environment (recommended), then cmake can be configured appropr
$ cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX
```

Note: The following warning message is dependent upon the version of cmake and the `CMAKE_INSTALL_PREFIX` used. If this warning is displayed, the build should still run succesfully. We are currently working to resolve this open issue. You can silence this warning by adding `-DCMAKE_IGNORE_PATH=$CONDA_PREFIX/lib` to your `cmake` command.
Note: The following warning message is dependent upon the version of cmake and the `CMAKE_INSTALL_PREFIX` used. If this warning is displayed, the build should still run successfully. We are currently working to resolve this open issue. You can silence this warning by adding `-DCMAKE_IGNORE_PATH=$CONDA_PREFIX/lib` to your `cmake` command.
```
Cannot generate a safe runtime search path for target ml_test because files
in some directories may conflict with libraries in implicit directories:
Expand Down
15 changes: 13 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,9 @@ into three categories:
2. Find an issue to work on. The best way is to look for the [good first issue](https://github.com/rapidsai/cuml/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
or [help wanted](https://github.com/rapidsai/cuml/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22) labels
3. Comment on the issue saying you are going to work on it.
4. Get familar with the developer guide relevant for you:
4. Get familiar with the developer guide relevant for you:
* For C++ developers it is available here [DEVELOPER_GUIDE.md](wiki/cpp/DEVELOPER_GUIDE.md)
* For Python developers, a [Python DEVELOPER_GUIDE.md](wiki/python/DEVELOPER_GUIDE.md) is availabe as well.
* For Python developers, a [Python DEVELOPER_GUIDE.md](wiki/python/DEVELOPER_GUIDE.md) is available as well.
5. Code! Make sure to update unit tests!
6. When done, [create your pull request](https://github.com/rapidsai/cuml/compare).
7. Verify that CI passes all [status checks](https://help.github.com/articles/about-status-checks/), or fix if needed.
Expand Down Expand Up @@ -88,6 +88,16 @@ To skip the checks temporarily, use `git commit --no-verify` or its short form
_Note_: If the auto-formatters' changes affect each other, you may need to go
through multiple iterations of `git commit` and `git add -u`.

cuML also uses [codespell](https://github.com/codespell-project/codespell) to find spelling
mistakes, and this check is run as part of the pre-commit hook. To apply the suggested spelling
fixes, you can run `codespell -i 3 -w .` from the command-line in the cuML root directory.
This will bring up an interactive prompt to select which spelling fixes to apply.

If you want to ignore errors highlighted by codespell you can:
* Add the word to the ignore-words-list in pyproject.toml, to exclude for all of cuML
* Exclude the entire file from spellchecking, by adding to the `exclude` regex in .pre-commit-config.yaml
* Ignore only specific lines as shown in https://github.com/codespell-project/codespell/issues/1212#issuecomment-654191881

### Summary of pre-commit hooks

The pre-commit hooks configured for this repository consist of a number of
Expand All @@ -102,6 +112,7 @@ please see the `.pre-commit-config.yaml` file.
- _`#include` syntax checker_: Ensures consistent syntax for C++ `#include` statements.
- _Copyright header checker and auto-formatter_: Ensures the copyright headers
of files are up-to-date and in the correct format.
- `codespell`: Checks for spelling mistakes

### Managing PR labels

Expand Down
4 changes: 2 additions & 2 deletions ci/checks/black_lists.sh
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
#!/bin/bash
# Copyright (c) 2019, NVIDIA CORPORATION.
# Copyright (c) 2019-2023, NVIDIA CORPORATION.
##########################################
# cuML black listed function call Tester #
##########################################

# PR_TARGET_BRANCH is set by the CI enviroment
# PR_TARGET_BRANCH is set by the CI environment

git checkout --quiet $PR_TARGET_BRANCH

Expand Down
4 changes: 2 additions & 2 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#=============================================================================
# Copyright (c) 2018-2022, NVIDIA CORPORATION.
# Copyright (c) 2018-2023, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -99,7 +99,7 @@ message(VERBOSE "CUML_CPP: Build and statically link FAISS library: ${CUML_USE_F
message(VERBOSE "CUML_CPP: Build and statically link Treelite library: ${CUML_USE_TREELITE_STATIC}")

set(CUML_ALGORITHMS "ALL" CACHE STRING "Experimental: Choose which algorithms are built into libcuml++.so. Can specify individual algorithms or groups in a semicolon-separated list.")
message(VERBOSE "CUML_CPP: Building libcuml++ with algoriths: '${CUML_ALGORITHMS}'.")
message(VERBOSE "CUML_CPP: Building libcuml++ with algorithms: '${CUML_ALGORITHMS}'.")

# Set RMM logging level
set(RMM_LOGGING_LEVEL "INFO" CACHE STRING "Choose the logging level.")
Expand Down
4 changes: 2 additions & 2 deletions cpp/bench/sg/dataset.cuh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2022, NVIDIA CORPORATION.
* Copyright (c) 2019-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -37,7 +37,7 @@ namespace Bench {
* by every Benchmark's Params structure.
*/
struct DatasetParams {
/** number of rows in the datset */
/** number of rows in the dataset */
int nrows;
/** number of cols in the dataset */
int ncols;
Expand Down
4 changes: 2 additions & 2 deletions cpp/examples/symreg/symreg_example.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
* Copyright (c) 2021-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -32,7 +32,7 @@
#include <rmm/device_scalar.hpp>
#include <rmm/device_uvector.hpp>

// Namspace alias
// Namespace alias
namespace cg = cuml::genetic;

#ifndef CUDA_RT_CALL
Expand Down
4 changes: 2 additions & 2 deletions cpp/include/cuml/cluster/hdbscan.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
* Copyright (c) 2021-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -497,7 +497,7 @@ void compute_core_dists(const raft::handle_t& handle,
* @brief Compute the map from final, normalize labels to the labels in the CondensedHierarchy
*
* @param[in] handle raft handle for resource reuse
* @param[in] condensed_tree the Condensed Hiearchy object
* @param[in] condensed_tree the Condensed Hierarchy object
* @param[in] n_leaves number of leaves in the input data
* @param[in] cluster_selection_method cluster selection method
* @param[out] inverse_label_map rmm::device_uvector of size 0. It will be resized during the
Expand Down
4 changes: 2 additions & 2 deletions cpp/include/cuml/ensemble/randomforest.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2022, NVIDIA CORPORATION.
* Copyright (c) 2019-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -71,7 +71,7 @@ struct RF_params {
* round(max_samples * n_samples) number of samples with replacement. More on
* bootstrapping:
* https://en.wikipedia.org/wiki/Bootstrap_aggregating
* If boostrapping is set to false, whole dataset is used to build each
* If bootstrapping is set to false, whole dataset is used to build each
* tree.
*/
bool bootstrap;
Expand Down
4 changes: 2 additions & 2 deletions cpp/include/cuml/fil/multi_sum.cuh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
* Copyright (c) 2020-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -20,7 +20,7 @@
template parameters: data [T]ype, reduction [R]adix
function parameters:
@data[] holds one value per thread in shared memory
@n_groups is the number of indendent reductions
@n_groups is the number of independent reductions
@n_values is the size of each individual reduction,
that is the number of values to be reduced to a single value
function returns: one sum per thread, for @n_groups first threads.
Expand Down
6 changes: 3 additions & 3 deletions cpp/include/cuml/genetic/genetic.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
* Copyright (c) 2020-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -87,7 +87,7 @@ void symRegPredict(const raft::handle_t& handle,
* @param handle cuML handle
* @param input device pointer to feature matrix
* @param n_rows number of rows of the feature matrix
* @param params host struct containg training hyperparameters
* @param params host struct containing training hyperparameters
* @param best_prog The best program obtained during training. Inferences are made using this
* @param output device pointer to output probability(in col major format)
*/
Expand All @@ -104,7 +104,7 @@ void symClfPredictProbs(const raft::handle_t& handle,
* @param handle cuML handle
* @param input device pointer to feature matrix
* @param n_rows number of rows of the feature matrix
* @param params host struct containg training hyperparameters
* @param params host struct containing training hyperparameters
* @param best_prog Best program obtained after training
* @param output Device pointer to output predictions
*/
Expand Down
4 changes: 2 additions & 2 deletions cpp/include/cuml/genetic/program.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
* Copyright (c) 2021-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -36,7 +36,7 @@ struct program {
* Now take the resulting 1D array and reverse it.
*
* @note The pointed memory buffer is NOT owned by this class and further it
* is assumed to be a zero-copy (aka pinned memory) buffer, atleast in
* is assumed to be a zero-copy (aka pinned memory) buffer, at least in
* this initial version
*/

Expand Down
10 changes: 5 additions & 5 deletions cpp/include/cuml/manifold/tsne.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2022, NVIDIA CORPORATION.
* Copyright (c) 2019-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -121,8 +121,8 @@ struct TSNEParams {
* @param[out] Y The column-major final embedding in device memory
* @param[in] n Number of rows in data X.
* @param[in] p Number of columns in data X.
* @param[in] knn_indices Array containing nearest neighors indices.
* @param[in] knn_dists Array containing nearest neighors distances.
* @param[in] knn_indices Array containing nearest neighbors indices.
* @param[in] knn_dists Array containing nearest neighbors distances.
* @param[in] params Parameters for TSNE model
* @param[out] kl_div (optional) KL divergence output
*
Expand Down Expand Up @@ -155,8 +155,8 @@ void TSNE_fit(const raft::handle_t& handle,
* @param[in] nnz The number of non-zero entries in the CSR.
* @param[in] n Number of rows in data X.
* @param[in] p Number of columns in data X.
* @param[in] knn_indices Array containing nearest neighors indices.
* @param[in] knn_dists Array containing nearest neighors distances.
* @param[in] knn_indices Array containing nearest neighbors indices.
* @param[in] knn_dists Array containing nearest neighbors distances.
* @param[in] params Parameters for TSNE model
* @param[out] kl_div (optional) KL divergence output
*
Expand Down
16 changes: 8 additions & 8 deletions cpp/include/cuml/manifold/umap.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2022, NVIDIA CORPORATION.
* Copyright (c) 2019-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -143,9 +143,9 @@ void fit_sparse(const raft::handle_t& handle,
* Dense transform
*
* @param[in] handle: raft::handle_t
* @param[in] X: pointer to input array to be infered
* @param[in] n: n_samples of input array to be infered
* @param[in] d: n_features of input array to be infered
* @param[in] X: pointer to input array to be inferred
* @param[in] n: n_samples of input array to be inferred
* @param[in] d: n_features of input array to be inferred
* @param[in] orig_X: pointer to original training array
* @param[in] orig_n: number of rows in original training array
* @param[in] embedding: pointer to embedding created during training
Expand All @@ -168,10 +168,10 @@ void transform(const raft::handle_t& handle,
* Sparse transform
*
* @param[in] handle: raft::handle_t
* @param[in] indptr: pointer to index pointer array of input array to be infered
* @param[in] indices: pointer to index array of input array to be infered
* @param[in] data: pointer to data array of input array to be infered
* @param[in] nnz: number of stored values of input array to be infered
* @param[in] indptr: pointer to index pointer array of input array to be inferred
* @param[in] indices: pointer to index array of input array to be inferred
* @param[in] data: pointer to data array of input array to be inferred
* @param[in] nnz: number of stored values of input array to be inferred
* @param[in] n: n_samples of input array
* @param[in] d: n_features of input array
* @param[in] orig_x_indptr: pointer to index pointer array of original training array
Expand Down
6 changes: 3 additions & 3 deletions cpp/include/cuml/metrics/metrics.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
* Copyright (c) 2021-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -177,7 +177,7 @@ double adjusted_rand_index(const raft::handle_t& handle,
*
* The KL divergence tells us how well the probability distribution Q
* approximates the probability distribution P
* It is often also used as a 'distance metric' between two probablity ditributions (not symmetric)
* It is often also used as a 'distance metric' between two probability distributions (not symmetric)
*
* @param handle: raft::handle_t
* @param y: Array of probabilities corresponding to distribution P
Expand All @@ -192,7 +192,7 @@ double kl_divergence(const raft::handle_t& handle, const double* y, const double
*
* The KL divergence tells us how well the probability distribution Q
* approximates the probability distribution P
* It is often also used as a 'distance metric' between two probablity ditributions (not symmetric)
* It is often also used as a 'distance metric' between two probability distributions (not symmetric)
*
* @param handle: raft::handle_t
* @param y: Array of probabilities corresponding to distribution P
Expand Down
4 changes: 2 additions & 2 deletions cpp/include/cuml/neighbors/knn.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2022, NVIDIA CORPORATION.
* Copyright (c) 2019-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -139,7 +139,7 @@ void knn_classify(raft::handle_t& handle,
/**
* @brief Flat C++ API function to perform a knn regression using
* a given a vector of label arrays. This supports multilabel
* regression by clasifying on multiple label arrays. Note that
* regression by classifying on multiple label arrays. Note that
* each label is classified independently, as is done in scikit-learn.
*
* @param[in] handle RAFT handle
Expand Down
4 changes: 2 additions & 2 deletions cpp/include/cuml/tree/decisiontree.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2022, NVIDIA CORPORATION.
* Copyright (c) 2019-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -56,7 +56,7 @@ struct DecisionTreeParams {
*/
CRITERION split_criterion;
/**
* Minimum impurity decrease required for spliting a node. If the impurity decrease is below this
* Minimum impurity decrease required for splitting a node. If the impurity decrease is below this
* value, node is leafed out. Default is 0.0
*/
float min_impurity_decrease = 0.0f;
Expand Down
4 changes: 2 additions & 2 deletions cpp/include/cuml/tsa/arima_common.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
* Copyright (c) 2020-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -315,7 +315,7 @@ struct ARIMAMemory {

if (r <= 5) {
// Note: temp mem for the direct Lyapunov solver grows very quickly!
// This solver is used iff the condition above is satisifed
// This solver is used iff the condition above is satisfied
append_buffer<assign>(I_m_AxA_dense, r * r * r * r * batch_size);
append_buffer<assign>(I_m_AxA_batches, batch_size);
append_buffer<assign>(I_m_AxA_inv_dense, r * r * r * r * batch_size);
Expand Down
8 changes: 4 additions & 4 deletions cpp/scripts/gitutils.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2019-2021, NVIDIA CORPORATION.
# Copyright (c) 2019-2023, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -85,7 +85,7 @@ def repo_version_major_minor():
def determine_merge_commit(current_branch="HEAD"):
"""
When running outside of CI, this will estimate the target merge commit hash
of `current_branch` by finding a common ancester with the remote branch
of `current_branch` by finding a common ancestor with the remote branch
'branch-{major}.{minor}' where {major} and {minor} are determined from the
repo version.

Expand Down Expand Up @@ -211,8 +211,8 @@ def modifiedFiles(pathFilter=None):
If inside a CI-env (ie. TARGET_BRANCH and COMMIT_HASH are defined, and
current branch is "current-pr-branch"), then lists out all files modified
between these 2 branches. Locally, TARGET_BRANCH will try to be determined
from the current repo version and finding a coresponding branch named
'branch-{major}.{minor}'. If this fails, this functino will list out all
from the current repo version and finding a corresponding branch named
'branch-{major}.{minor}'. If this fails, this function will list out all
the uncommitted files in the current branch.

Such utility function is helpful while putting checker scripts as part of
Expand Down
Loading