-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Branch 23.02 merge 22.12 #1051
Branch 23.02 merge 22.12 #1051
Conversation
This change adds a pre-commit hook to run on each commit, that will automatically run code linters and not allow you to commit changes if the linters fail. This is similar to how cudf uses pre-commit, as described in rapidsai/cudf#9309 . The pre-commit checks will also be run in CI as part of the check style script. This change also adds several new linters that weren't being run in CI before: * pydocstyle * mypy * black * isort * cmake-format * cmake-lint Of these new linters - mypy caught an issue where the test_comms.py would always be skipped (`python/raft-dask/raft_dask/test/test_comms.py:23: error: Module "raft_dask" has no attribute "Comms"` ), and Pydocstyle caught some minor formatting inconsistencies in the python docstrings. black/isort ensure consistent code formatting for the python raft code Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) - Ray Douglass (https://github.com/raydouglass) - Bradley Dice (https://github.com/bdice) URL: rapidsai#965
Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Divye Gala (https://github.com/divyegala) URL: rapidsai#1006
This prevents requiring full specification for all template parameters in downstream `make_mdspan` usage. A more complete discussion of how we want to use `memory_type` in the public API can be deferred to a later PR. Authors: - William Hicks (https://github.com/wphicks) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1005
Similar to rapidsai/cudf#12097, this adds codespell as a linter to raft - and fixes the spelling mistakes it found Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Mark Sadang (https://github.com/msadang) URL: rapidsai#1007
Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#995
Re-enable copy_prs ops-bot plugin
…i#1008) This allows the `pylibraft` APIs to process outputs in place or allocate and return new device memory for outputs in a way which benefits from `__cuda_array_interface__` compliance. Ultimatley, it allows us to stay decoupled from libraries like cupy while still affording the user the benefits of not having to explicitly allocate outputs up front. Closes rapidsai#1002 Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Frederickson (https://github.com/benfred) - Divye Gala (https://github.com/divyegala) URL: rapidsai#1008
This PR adds python wrappers to IVF-PQ. Authors: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#970
Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1020
This PR pins `dask` and `distributed` to `2022.11.0` for `22.12` release. xref: rapidsai/cudf#12165 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: rapidsai#1022
Add an indexing parameter `adaptive_centers` (false by default) to control whether `index.centers()` should be kept up-to-date with the cluster data or remain immutable. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1019
…r higher (rapidsai#939) -- 3xTF32 cutlass based L2 exp/cosine kernel provides 3.5x speedup for fp32 inputs compared to existing pairwise distance kernel for ampere or higher. -- DMMA cutlass based implementation for L2 exp/cosine provides 2.6x speedup instead of existing double precision FMA pipeline based kernel. -- add cutlass as header only dependency to RAFT. Authors: - Mahesh Doijade (https://github.com/mdoijade) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Robert Maynard (https://github.com/robertmaynard) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#939
Reverts rapidsai#1022 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: rapidsai#1023
…rim and improve performance for thin matrices (small `n_cols`) (rapidsai#979) This follows up on a discussion at rapidsai#652 (comment). The main goal of this PR is to make this helper accessible as a raft primitive. I also used the opportunity to look at the performance of this primitive, and have improved it for: - Thin matrices: less than 32 threads per row with shuffle-based reductions. - Thick matrices: cub-based reduction doing one row per block. Here is an overview of the before/after performance on A100: ![2022-11-11_normalize_perf_float_int32](https://user-images.githubusercontent.com/17441062/201403965-bf68d368-b64b-4a1f-92f0-a5de03b9d1a8.png) Authors: - Louis Sugy (https://github.com/Nyrio) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) - Artem M. Chirkin (https://github.com/achirkin) URL: rapidsai#979
A few optimizations to the `ivfpq_compute_similarity_kernel`: - Overhauled the way shmem/L1 carveout is selected - Introduced the block size selection logic based on the shmem/L1 split, occupancy, and the estimated cluster probes co-residency - Ported a new warp-sort module (`warp_sort_distributed`) - Transposed `pq_centers` to make loads coalesced - Changed layout of `pq_dataset` to make loads coalesced and vectorized - Optimized the loops to minimize ALU load Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#926
Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Frederickson (https://github.com/benfred) URL: rapidsai#1027
Similar to rapidsai/cuml#4985, this PR changes the docs theme for `raft` to be in-line with rest of the rapids docs theme. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#1026
PR rapidsai#939 introduced CUTLASS dependency. When compiled in debug mode, this leads to the following error: ``` ptxas error : Stack size for entry function '_ZN12raft_cutlass6KernelINS_...' cannot be statically determined ``` This would be normally just a warning, but we treat warnings as errors. This PR disables the warning in Debug mode. Authors: - Tamas Bela Feher (https://github.com/tfeher) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1033
…apidsai#1029) Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#1029
Fix some of the easier deprecated headers, leftovers from past refactorings. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1034
Use raft handle's lazy-loading helper `get_device_properties` instead of explicitly calling `cudaGetDeviceProperties` on every kernel launch, which is a costly operation. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#1035
…#1032) This PR removes the dlopen logic for libucp in ucp_helper.hpp in favor of calling the relevant APIs directly. It also adds a new CMake component `raft::distributed` that can be used by dependent libraries to indicate the dependency on parts of raft that require UCX. While it does not change any public APIs, I have marked this PR as breaking since it does mean that any C++ code linking to UCX must now ensure that UCX is available at link time. It is no longer sufficient to make the library available at runtime. Resolves rapidsai#1031. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1032
Add cython bindings for the cluster_cost function, to allow computing inertia from python. Closes rapidsai#972 Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1028
…ion` performance (rapidsai#1011) `dots_along_rows` in `ann_utils.cuh` was in some cases more performant than the corresponding raft primitive `rowNorm`, so I have improved that primitive in order to replace `dots_along_rows` without performance regressions. `rowNorm` for a row-major matrix calls `coalescedReduction`, which I have modified to conditionally select one of the following code paths based on the input dimensions: - Thin: for matrices with many small rows, one block processes multiple rows, with 2 to 32 threads collaborating on each row using a shuffle-based reduction. - Medium: the existing cub-based implementation with one block per row (I have only changed the reduction algorithm to raking which is more performant provided that the workload is big enough) - Thick: two-step implementation. In the first step, multiple blocks per row reducing to an intermediate buffer (`main_op` is applied but not `final_op`). In the second step, reduces the intermediate buffer using the thin kernel (this time `final_op` is applied but not `main_op`). Other changes included in this PR: - In order to properly support shuffle-based reductions, I have added generic shuffle helpers that support arbitrary types by cutting them into chunks (based on size/alignment). This was adapted from similar helpers in CUB. - I have added a helper for "logical" warp reduction, i.e sub-warps of 2, 4, 8, 16 or 32 threads, and added support for arbitrary reduction operations in the warp reduction. - I have consolidated tests with support for arbitrary types and operations and tested some operations that in particular use the index argument of `main_op` such as an argmax, and only for the coalesced reduction I have added test cases with `raft::KeyValuePair` Authors: - Louis Sugy (https://github.com/Nyrio) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#1011
… x and y (rapidsai#1040) Solves rapidsai#1036 Even when computing a sum of squares, the distance from a point to itself can apparently be `-0.0` in which case the square root is `nan` and comparisons are broken. Authors: - Louis Sugy (https://github.com/Nyrio) Approvers: - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1040
README tweaks: * Add a resources section with links to the generated HTML documentation * Add a build status badge * Add a section about installing with the new experimental pip packages Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1042
This PR enables building wheels for pylibraft and raft-dask. Authors: - Vyas Ramasubramani (https://github.com/vyasr) - Sevag H (https://github.com/sevagh) - Paul Taylor (https://github.com/trxcllnt) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#1013
This PR implements refinement for approximate nearest neighbor search. Refinement is a post processing step for ANN search, it follows an ANN search that returned `k0` neighbor candidates, and select `k` out of these candidates. The selection by calculating exact distances from the original dataset. Refinement can increase accuracy. It is useful for ANN methods that quantize the dataset and therefore loose accuracy during distance calculation (e.g. IVF-PQ). Authors: - Tamas Bela Feher (https://github.com/tfeher) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1038
Add an extra check for the alignment of the input matrices to avoid misaligned address errors. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1045
Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Frederickson (https://github.com/benfred) - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#1030
This PR is massive. Have the 22.12->23.02 merges been off for a long time, or were they blocked by the pre-commit linters run causing merge conflicts (I see that's the first unmerged commit)? |
Yeah - there was a merge conflict because of the pre-commit linters. That commit modified the All I did in this PR is follow the steps here https://docs.rapids.ai/maintainers/gpuci/#forward-mergers , and resolve the merge conflict |
Yup I'm sure you did the right thing with the forward merger, I was just a little concerned since the size of the PR makes it almost impossible to verify the correctness of the merge (at least for me, but I'm also no raft expert). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR to unblock forward merge #1018 -