-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove specializations and split expensive headers #1415
Conversation
Thanks @ahendriksen for this PR! I had only a high level view so far. The savings in compile time and binary size are really great, and having an error message of an accidental instantiation is significant improvement! |
Thanks @tfeher and apologies for the large PR. I have tagged this PR as breaking. Officially, this is right, because of the removal of two symbols from So far, this has surfaced some missing includes. I intend to fix these issues upstream as soon as possible. |
Removing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Allard, started to look into the details, here is my first batch of comments
* | ||
* @return the constructed ivf-flat index | ||
*/ | ||
template <typename T, typename IdxT> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you have changed the the template parameters to be uniformly T
and IdxT
(earlier we had both value_t, idx_t and T, IdxT pairs). Is that the preferred naming convention @cjnolet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly did this for consistency. I am fine either way. The T, IdxT
convention would be consistent with the naming convention used in the distance APIs.
cpp/include/raft/neighbors/detail/ivf_flat_interleaved_scan.cuh
Outdated
Show resolved
Hide resolved
cpp/include/raft/neighbors/detail/ivf_flat_interleaved_scan-inl.cuh
Outdated
Show resolved
Hide resolved
The downstream PRs are building alright now. I have filed two PRs to fix missing includes (one for cuml and one for cugraph). Some of the test runs break due to runtime linking errors. In the case of rapidsai/cugraph#3490: The tests for cuda 11.2 and 11.4 are failing, but this seems to be due to them not picking up the libraft.so build from this PR. The tests for cuda 11.8 are passing, which is expected.
EDIT: I think I found the issue. Due to a typo, libcuml was built twice: once with RAFT from this PR and once with the 23.06 branch. |
Some files use `thrust::nullopt_t` but not do not include `thrust/optional.h` . This PR fixes that. The missing include was surfaced by #3490 This PR is necessary to prevent breakage when rapidsai/raft#1415 is merged. Authors: - Allard Hendriksen (https://github.com/ahendriksen) Approvers: - Chuck Hastings (https://github.com/ChuckHastings) URL: #3493
We define this internally (PRIVATE). In addition, we define RAFT_COMPILED both internally and externally (PUBLIC).
Correlation and Cosine distance both return (1 - similarity) in the pairwise distances apis, meaning that is_min_close is returning the wrong sort order for them. Fix. Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1419
Add public functions for reading and writing into individual ivf-pq lists (clusters), in the input space (reconstructed data) and in flat PQ codes. Partially solves (IVF-PQ) rapidsai#1205 Authors: - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) - Tamas Bela Feher (https://github.com/tfeher) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1298
Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1422
bfdf9fb
to
868ada3
Compare
Status update:
|
Compiling test/distance/gram.cu would fail otherwise, as it did not explicitly instantiate the raft::distance::distance.
Some files use `rmm::mr::get_current_device_resource()` but not do not include `rmm/mr/device/per_device_resource.hpp` . This PR fixes that. The missing include was surfaced by #5363. This PR is necessary to prevent breakage when rapidsai/raft#1415 is merged. Authors: - Allard Hendriksen (https://github.com/ahendriksen) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #5369
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Allard, I went through all the changes. On one hand this looks great: the bulk of the changes are just trivial refactoring, and it is great to have everything in one place to see the combined benefit on compilation time, and to test its effects downstream.
On the other hand, the size of the PR makes it tedious even to scroll through the files. That is not an issue in itself, but I fear that might slow down its acceptance. I think the points in the "additional changes" in the PR description would deserve their own PR, to have a focused discussion. Additionally, if we would need to revert any of the changes, then we could do that more targeted. Since I have already went through the PR, I do not insist on separating it, but it could be still beneficial to do so.
epilogue_op distance_epilogue); | ||
|
||
instantiate_raft_neighbors_brute_force_knn( | ||
int64_t, float, uint32_t, raft::row_major, raft::row_major, raft::identity_op); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question as for the distance prims: does it make sense to have instantiations for both uint32_t
and int64_t
as matrix_idx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense. The instantiations reflect the current state of things. We need these to make sure libraft.so compiles and the tests compile. I could create an issue to look into the instantiations of brute_force_knn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to create an issue to track these things. Indeed, we want to keep the current state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed issue #1449
// XXX: the uint32_t instance is not compiled in libraft.so. So we allow | ||
// instantiating the template here. | ||
// | ||
// TODO: consider removing this test or consider adding an instantiation to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I understand the benefit of testing for additional idx type, I do not like the fact that this test does not use precompiled insatnces. This leads to either slower development workflow or forces the user to comment out this test for quicker iteration (e.g. while adding / modifying test).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed issue #1448
For new developers, it might be not obvious when do we use the header.cuh / header-inl.cuh / header-ext.cuh triplet instead of just header.cuh. It would be good to add a section to the developer guide (around https://github.com/rapidsai/raft/blob/branch-23.06/docs/source/developer_guide.md#common-design-considerations), something like the header organization section in #1416. |
Only keep in logger-ext.hpp: I don't know how to document the logger class otherwise.
Agreed. I have added a section there. |
I have split up this PR in smaller pieces:
This should be easier to review. I am leaning towards closing this PR. |
This PR:
touch cpp/include/raft/distance/detail/pairwise_matrix/kernel_sm60.cuh
)This is achieved by following the strategy outlined in issue #1416.
In addition, the following changes have been made:
raft/logger.hpp
so thatspdlog
is not included during the compilation of most translation units.raft::get_pool_memory_resource
in anticipation of Move RMM_LOGGING_ASSERT into separate header rmm#1241 to achieve an additional 10s reduction in compilation time per translation unit.scan_interleaved_kernel
so that it only takes the maximum value for a datatype or 1. This halves the number of instantiations and should not reduce performance for sophisticated users (who already use powers of two for dataset dimensions). [Discussed with @tfeher]ALL_BENCH
andALL_TESTS
CMake targets.Breaking change:
raft::distance::MinAndDistanceReduceOp
andraft::distance::KVPMinReduce
fromfused_l2_nn.cuh
. They were not used externally (by cudf or cuml for instance), but were used in some tests and benchmarks. References in bench and tests have been replaced withraft::distance::detail::*
variants.