Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] raft v22.12 #1063

Merged
merged 112 commits into from
Dec 8, 2022
Merged

[RELEASE] raft v22.12 #1063

merged 112 commits into from
Dec 8, 2022

Conversation

GPUtester
Copy link
Contributor

❄️ Code freeze for branch-22.12 and v22.12 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-22.12 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-22.12 into main for the release

raydouglass and others added 30 commits September 23, 2022 11:39
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Part of #535.
Implementation of the raft::stats API with mdspan, with the C++ tests
14/22 Files implemented. The remaining files will come in a following PR.

Authors:
  - Micka (https://github.com/lowener)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #802
achirkin and others added 19 commits November 17, 2022 18:01
A few optimizations to the `ivfpq_compute_similarity_kernel`:

  - Overhauled the way shmem/L1 carveout is selected
  - Introduced the block size selection logic based on the shmem/L1 split, occupancy, and the estimated cluster probes co-residency
  - Ported a new warp-sort module (`warp_sort_distributed`)
  - Transposed `pq_centers` to make loads coalesced
  - Changed layout of `pq_dataset` to make loads coalesced and vectorized
  - Optimized the loops to minimize ALU load

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #926
Similar to rapidsai/cuml#4985, this PR changes the docs theme for `raft` to be in-line with rest of the rapids docs theme.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #1026
PR #939 introduced CUTLASS dependency. When compiled in debug mode, this leads to the following error:

```
ptxas error   : Stack size for entry function '_ZN12raft_cutlass6KernelINS_...' cannot be statically determined
```

This would be normally just a warning, but we treat warnings as errors. This PR disables the warning in Debug mode.

Authors:
  - Tamas Bela Feher (https://github.com/tfeher)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1033
…1029)

Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug

Authors:
  - Robert Maynard (https://github.com/robertmaynard)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #1029
Fix some of the easier deprecated headers, leftovers from past refactorings.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1034
Use raft handle's lazy-loading helper `get_device_properties` instead of explicitly calling `cudaGetDeviceProperties` on every kernel launch, which is a costly operation.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)

URL: #1035
This PR removes the dlopen logic for libucp in ucp_helper.hpp in favor of calling the relevant APIs directly. It also adds a new CMake component `raft::distributed` that can be used by dependent libraries to indicate the dependency on parts of raft that require UCX.

While it does not change any public APIs, I have marked this PR as breaking since it does mean that any C++ code linking to UCX must now ensure that UCX is available at link time. It is no longer sufficient to make the library available at runtime.

Resolves #1031.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1032
Add cython bindings for the cluster_cost function, to allow computing inertia from python.

Closes #972

Authors:
  - Ben Frederickson (https://github.com/benfred)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1028
…ion` performance (#1011)

`dots_along_rows` in `ann_utils.cuh` was in some cases more performant than the corresponding raft primitive `rowNorm`, so I have improved that primitive in order to replace `dots_along_rows` without performance regressions. `rowNorm` for a row-major matrix calls `coalescedReduction`, which I have modified to conditionally select one of the following code paths based on the input dimensions:

- Thin: for matrices with many small rows, one block processes multiple rows, with 2 to 32 threads collaborating on each row using a shuffle-based reduction.
- Medium: the existing cub-based implementation with one block per row (I have only changed the reduction algorithm to raking which is more performant provided that the workload is big enough)
- Thick: two-step implementation. In the first step, multiple blocks per row reducing to an intermediate buffer (`main_op` is applied but not `final_op`). In the second step, reduces the intermediate buffer using the thin kernel (this time `final_op` is applied but not `main_op`).

Other changes included in this PR:

- In order to properly support shuffle-based reductions, I have added generic shuffle helpers that support arbitrary types by cutting them into chunks (based on size/alignment). This was adapted from similar helpers in CUB.
- I have added a helper for "logical" warp reduction, i.e sub-warps of 2, 4, 8, 16 or 32 threads, and added support for arbitrary reduction operations in the warp reduction.
- I have consolidated tests with support for arbitrary types and operations and tested some operations that in particular use the index argument of `main_op` such as an argmax, and only for the coalesced reduction I have added test cases with `raft::KeyValuePair`

Authors:
  - Louis Sugy (https://github.com/Nyrio)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)

URL: #1011
… x and y (#1040)

Solves #1036 

Even when computing a sum of squares, the distance from a point to itself can apparently be `-0.0` in which case the square root is `nan` and comparisons are broken.

Authors:
  - Louis Sugy (https://github.com/Nyrio)

Approvers:
  - Ben Frederickson (https://github.com/benfred)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1040
README tweaks:

* Add a resources section with links to the generated HTML documentation
* Add a build status badge
* Add a section about installing with the new experimental pip packages

Authors:
  - Ben Frederickson (https://github.com/benfred)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1042
This PR enables building wheels for pylibraft and raft-dask.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Sevag H (https://github.com/sevagh)
  - Paul Taylor (https://github.com/trxcllnt)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #1013
This PR implements refinement for approximate nearest neighbor search.

Refinement is a post processing step for ANN search, it follows an ANN search that returned `k0` neighbor candidates,
and select `k` out of these candidates. The selection by calculating exact distances from the original dataset.

Refinement can increase accuracy. It is useful for ANN methods that quantize the dataset and therefore loose accuracy during distance calculation (e.g. IVF-PQ).

Authors:
  - Tamas Bela Feher (https://github.com/tfeher)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Artem M. Chirkin (https://github.com/achirkin)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1038
Add an extra check for the alignment of the input matrices to avoid misaligned address errors.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1045
This PR adds Python wrapper for the ANN refinement method. Refinement can work both with device dataset and host dataset.

Authors:
  - Tamas Bela Feher (https://github.com/tfeher)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1052
AyodeAwe and others added 2 commits December 1, 2022 12:47
@raydouglass raydouglass merged commit c16fa56 into main Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.