-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELEASE] raft v22.12 #1063
Merged
Merged
[RELEASE] raft v22.12 #1063
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
…906) Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Divye Gala (https://github.com/divyegala) URL: #906
…ir` -> `raft::KeyValuePair` (#905) cc @Nyrio Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Divye Gala (https://github.com/divyegala) URL: #905
Part of #535. Implementation of the raft::stats API with mdspan, with the C++ tests 14/22 Files implemented. The remaining files will come in a following PR. Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #802
A few optimizations to the `ivfpq_compute_similarity_kernel`: - Overhauled the way shmem/L1 carveout is selected - Introduced the block size selection logic based on the shmem/L1 split, occupancy, and the estimated cluster probes co-residency - Ported a new warp-sort module (`warp_sort_distributed`) - Transposed `pq_centers` to make loads coalesced - Changed layout of `pq_dataset` to make loads coalesced and vectorized - Optimized the loops to minimize ALU load Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: #926
Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Frederickson (https://github.com/benfred) URL: #1027
Similar to rapidsai/cuml#4985, this PR changes the docs theme for `raft` to be in-line with rest of the rapids docs theme. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - AJ Schmidt (https://github.com/ajschmidt8) URL: #1026
PR #939 introduced CUTLASS dependency. When compiled in debug mode, this leads to the following error: ``` ptxas error : Stack size for entry function '_ZN12raft_cutlass6KernelINS_...' cannot be statically determined ``` This would be normally just a warning, but we treat warnings as errors. This PR disables the warning in Debug mode. Authors: - Tamas Bela Feher (https://github.com/tfeher) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1033
…1029) Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - AJ Schmidt (https://github.com/ajschmidt8) URL: #1029
Fix some of the easier deprecated headers, leftovers from past refactorings. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1034
Use raft handle's lazy-loading helper `get_device_properties` instead of explicitly calling `cudaGetDeviceProperties` on every kernel launch, which is a costly operation. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #1035
This PR removes the dlopen logic for libucp in ucp_helper.hpp in favor of calling the relevant APIs directly. It also adds a new CMake component `raft::distributed` that can be used by dependent libraries to indicate the dependency on parts of raft that require UCX. While it does not change any public APIs, I have marked this PR as breaking since it does mean that any C++ code linking to UCX must now ensure that UCX is available at link time. It is no longer sufficient to make the library available at runtime. Resolves #1031. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1032
Add cython bindings for the cluster_cost function, to allow computing inertia from python. Closes #972 Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1028
…ion` performance (#1011) `dots_along_rows` in `ann_utils.cuh` was in some cases more performant than the corresponding raft primitive `rowNorm`, so I have improved that primitive in order to replace `dots_along_rows` without performance regressions. `rowNorm` for a row-major matrix calls `coalescedReduction`, which I have modified to conditionally select one of the following code paths based on the input dimensions: - Thin: for matrices with many small rows, one block processes multiple rows, with 2 to 32 threads collaborating on each row using a shuffle-based reduction. - Medium: the existing cub-based implementation with one block per row (I have only changed the reduction algorithm to raking which is more performant provided that the workload is big enough) - Thick: two-step implementation. In the first step, multiple blocks per row reducing to an intermediate buffer (`main_op` is applied but not `final_op`). In the second step, reduces the intermediate buffer using the thin kernel (this time `final_op` is applied but not `main_op`). Other changes included in this PR: - In order to properly support shuffle-based reductions, I have added generic shuffle helpers that support arbitrary types by cutting them into chunks (based on size/alignment). This was adapted from similar helpers in CUB. - I have added a helper for "logical" warp reduction, i.e sub-warps of 2, 4, 8, 16 or 32 threads, and added support for arbitrary reduction operations in the warp reduction. - I have consolidated tests with support for arbitrary types and operations and tested some operations that in particular use the index argument of `main_op` such as an argmax, and only for the coalesced reduction I have added test cases with `raft::KeyValuePair` Authors: - Louis Sugy (https://github.com/Nyrio) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #1011
… x and y (#1040) Solves #1036 Even when computing a sum of squares, the distance from a point to itself can apparently be `-0.0` in which case the square root is `nan` and comparisons are broken. Authors: - Louis Sugy (https://github.com/Nyrio) Approvers: - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) URL: #1040
README tweaks: * Add a resources section with links to the generated HTML documentation * Add a build status badge * Add a section about installing with the new experimental pip packages Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1042
This PR enables building wheels for pylibraft and raft-dask. Authors: - Vyas Ramasubramani (https://github.com/vyasr) - Sevag H (https://github.com/sevagh) - Paul Taylor (https://github.com/trxcllnt) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - AJ Schmidt (https://github.com/ajschmidt8) URL: #1013
This PR implements refinement for approximate nearest neighbor search. Refinement is a post processing step for ANN search, it follows an ANN search that returned `k0` neighbor candidates, and select `k` out of these candidates. The selection by calculating exact distances from the original dataset. Refinement can increase accuracy. It is useful for ANN methods that quantize the dataset and therefore loose accuracy during distance calculation (e.g. IVF-PQ). Authors: - Tamas Bela Feher (https://github.com/tfeher) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) URL: #1038
Add an extra check for the alignment of the input matrices to avoid misaligned address errors. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1045
Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Frederickson (https://github.com/benfred) - Tamas Bela Feher (https://github.com/tfeher) URL: #1030
This PR adds Python wrapper for the ANN refinement method. Refinement can work both with device dataset and host dataset. Authors: - Tamas Bela Feher (https://github.com/tfeher) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1052
Pin `dask` and `distributed` for release
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
❄️ Code freeze for
branch-22.12
and v22.12 releaseWhat does this mean?
Only critical/hotfix level issues should be merged into
branch-22.12
until release (merging of this PR).What is the purpose of this PR?
branch-22.12
intomain
for the release