Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNMG ANN #231

Merged
merged 67 commits into from
Oct 3, 2024
Merged

SNMG ANN #231

merged 67 commits into from
Oct 3, 2024

Conversation

viclafargue
Copy link
Contributor

@viclafargue viclafargue commented Jul 18, 2024

This PR implements a distributed (single-node-multiple-GPUs) implementation of ANN indexes. It allows to build, extend and search an index on multiple GPUs.

Before building the index, the user has to choose between two modes :

Sharding mode : The index dataset is split, each GPU trains its own index with its respective share of the dataset. This is intended to both increase the search throughput and the maximal size of the index.
Index duplication mode : The index is built once on a GPU and then copied over to others. Alternatively, the index dataset is sent to each GPU to be built there. This intended to increase the search throughput.

SNMG indexes can be serialized and de-serialized. Local models can also be deserialized and deployed in index duplication mode.

bench

Migrated from rapidsai/raft#1993

@viclafargue viclafargue requested review from a team as code owners July 18, 2024 10:46
Copy link

copy-pr-bot bot commented Jul 18, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@viclafargue viclafargue requested a review from a team as a code owner July 18, 2024 16:08
@viclafargue viclafargue requested a review from jameslamb July 18, 2024 16:08
cpp/CMakeLists.txt Show resolved Hide resolved
cpp/CMakeLists.txt Show resolved Hide resolved
cpp/include/cuvs/neighbors/ann_mg.hpp Outdated Show resolved Hide resolved
cpp/include/cuvs/neighbors/ann_mg.hpp Outdated Show resolved Hide resolved
cpp/include/cuvs/neighbors/ann_mg.hpp Outdated Show resolved Hide resolved
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Victor for the PR, it looks really great! I have just a few smaller comments.

cpp/src/neighbors/ann_mg/nccl_helpers.cuh Outdated Show resolved Hide resolved
cpp/src/neighbors/ann_mg/ann_mg.cuh Outdated Show resolved Hide resolved
cpp/src/neighbors/ann_mg/ann_mg.cuh Outdated Show resolved Hide resolved
cpp/src/neighbors/ann_mg/ann_mg.cuh Outdated Show resolved Hide resolved
cpp/src/neighbors/ann_mg/ann_mg.cuh Outdated Show resolved Hide resolved
cpp/src/neighbors/ann_mg/ann_mg.cuh Outdated Show resolved Hide resolved
cpp/src/neighbors/ann_mg/ann_mg.cuh Outdated Show resolved Hide resolved
cpp/test/neighbors/ann_mg.cuh Outdated Show resolved Hide resolved
cpp/src/neighbors/ann_mg/ann_mg.cuh Outdated Show resolved Hide resolved
@viclafargue viclafargue requested a review from a team as a code owner July 26, 2024 13:17
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few additional comments

cpp/src/neighbors/ann_mg/ann_mg.cuh Outdated Show resolved Hide resolved
cpp/src/neighbors/ann_mg/ann_mg.cuh Outdated Show resolved Hide resolved
cpp/src/neighbors/ann_mg/ann_mg.cuh Outdated Show resolved Hide resolved
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are almost there, but the matrix extent type has to be fixed: we should distinguish the neighbor index type (IdxT, usually int64), and the mdspan extent type which is always int64. In a previous review I have marked such changes in ann_mp.cuh, but actually the public API in ann_mg.hpp has also a few places where this need to be fixed

cpp/src/neighbors/ann_mg/omp_checks.cu Outdated Show resolved Hide resolved
cpp/include/cuvs/neighbors/ann_mg.hpp Outdated Show resolved Hide resolved
cpp/include/cuvs/neighbors/ann_mg.hpp Outdated Show resolved Hide resolved
cpp/include/cuvs/neighbors/ann_mg.hpp Outdated Show resolved Hide resolved
cpp/include/cuvs/neighbors/ann_mg.hpp Outdated Show resolved Hide resolved
cpp/include/cuvs/neighbors/ann_mg.hpp Outdated Show resolved Hide resolved
cpp/include/cuvs/neighbors/ann_mg.hpp Outdated Show resolved Hide resolved
cpp/include/cuvs/neighbors/ann_mg.hpp Outdated Show resolved Hide resolved
cpp/include/cuvs/neighbors/ann_mg.hpp Outdated Show resolved Hide resolved
cpp/include/cuvs/neighbors/ann_mg.hpp Outdated Show resolved Hide resolved
@tfeher tfeher added feature request New feature or request non-breaking Introduces a non-breaking change labels Jul 29, 2024
@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@dantegd
Copy link
Member

dantegd commented Oct 3, 2024

CI error here seemed to have been a connection error:

[rapids-conda-retry] Exiting, no retryable mamba errors detected: 'ChecksumMismatchError:', 'ChunkedEncodingError:', 'CondaHTTPError:', 'CondaMultiError:', 'Connection broken:', 'ConnectionError:', 'DependencyNeedsBuildingError:', 'EOFError:', 'JSONDecodeError:', 'Multi-download failed', 'Timeout was reached', segfault exit code 139

@divyegala
Copy link
Member

/ok to test

@tfeher
Copy link
Contributor

tfeher commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/merge

@rapids-bot rapids-bot bot merged commit 3383f28 into rapidsai:branch-24.10 Oct 3, 2024
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp feature request New feature or request non-breaking Introduces a non-breaking change Python
Projects
Development

Successfully merging this pull request may close these issues.

9 participants