Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Fused L2 Expanded KNN kernel (#339)
-- adds fused L2 expanded kNN kernel, this is faster by at least 20-25% on higher dimensions (D >= 128) than L2 unexpanded version. -- also on smaller dimension (D <=32) L2 expanded is always faster by 10-15% -- slight improvement in updateSortedWarpQ device function by reducing redundant instruction. -- Fix incorrect output for NN >32 case when taking prod-cons knn merge path, this was caught in HDBSCAN pytest. Authors: - Mahesh Doijade (https://github.com/mdoijade) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Chuck Hastings (https://github.com/ChuckHastings) - Corey J. Nolet (https://github.com/cjnolet) URL: #339
- Loading branch information