Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Sparse semiring cleanup + hash table and batching strategies #207

Closed

Conversation

cjnolet
Copy link
Member

@cjnolet cjnolet commented Apr 19, 2021

This branch includes several new features and optimizations:

  1. Introduces a hash table strategy to sparsify the vector in the coo spmv shared memory
  2. Adds cucollections to build (necessary for item 1)
  3. Adds a batching strategy for rows with nnz too large to fit into shared memory
  4. Removes the need for the cusparse csrgemm
  5. Uses raft handle in distances_config_t rather than accepting each resource explicitly
  6. Removes the naive CSR semiring code

This PR is also required to merge #261, which introduces the remaining distances

@cjnolet cjnolet requested review from divyegala and a team as code owners April 19, 2021 23:00
@cjnolet cjnolet added feature request New feature or request improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 19, 2021
@cjnolet cjnolet removed the feature request New feature or request label Apr 19, 2021
Copy link
Member

@divyegala divyegala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review looks good to me (excepting strategies part which I will clean up soon)! Only one comment and one question - the tests for the new distances remain to be added, right?

cpp/include/raft/sparse/distance/l2_distance.cuh Outdated Show resolved Hide resolved
@cjnolet
Copy link
Member Author

cjnolet commented May 21, 2021

@dantegd, I've updated this PR for branch 21.06 but I'm not familiar with the new package management. I'm getting the following build error, though I do see cuco in the build directory. Is there something we need to add to the gtest build?

In file included from /home/cjnolet/workspace/raft/cpp/include/raft/sparse/distance/coo_spmv.cuh:20,
                 from /home/cjnolet/workspace/raft/cpp/test/sparse/dist_coo_spmv.cu:28:
/home/cjnolet/workspace/raft/cpp/include/raft/sparse/distance/coo_spmv_strategies/hash_strategy.cuh:21:10: fatal error: cuco/static_map.cuh: No such file or directory
   21 | #include <cuco/static_map.cuh>
      |          ^~~~~~~~~~~~~~~~~~~~~

@cjnolet
Copy link
Member Author

cjnolet commented May 21, 2021

Just marking this here so I don't forget- It looks like the current cuco API is slightly different than it was when the hash_table in this PR was written. I see two differences so far that are causing compile errors:

  1. The slot_type was removed from the static_map::device_mutable_view and it's unclear to me which one should be used in its place. I tried using from pair_atomic_type from static_map, which seems to remove this compile error but I'm still not sure if it's the correct one to use
  2. make_from_uninitialized_slots seems to have been removed from static_map altogether and it's not clear to me what should be used in its place.
  3. A seed argument was added to a constructor and I'm assuming this might need to be incorporated into cucollections.

These are the obvious errors I see at the top of the trace. I'm hoping fixing these might fix everything else? I'm not sure yet, but I think maybe we should push this change to 21.08.

@divyegala divyegala changed the base branch from branch-21.06 to branch-21.08 June 2, 2021 17:03
@cjnolet cjnolet changed the title [WIP] Final branch of sparse semiring optimizations [WIP] Sparse semiring hash table strategy Jun 8, 2021
@cjnolet cjnolet changed the title [WIP] Sparse semiring hash table strategy [WIP] Scale sparse semiring w/ hash table and batching strategies Jun 8, 2021
@cjnolet cjnolet changed the title [WIP] Scale sparse semiring w/ hash table and batching strategies [WIP] Sparse semiring cleanup + hash table and batching strategies Jun 8, 2021
@cjnolet
Copy link
Member Author

cjnolet commented Jun 15, 2021

Closing this as it's replaced by #269

@cjnolet cjnolet closed this Jun 15, 2021
rapids-bot bot pushed a commit that referenced this pull request Jul 12, 2021
This PR is intended to be merged after #207 (hash table strategy) has been merged.

This PR introduces the following distances:
- Hamming
- Jensen-Shannon
- Russell-Rao
- KL-Divergence
- Correlation

Most of the changes here are from #207 and will be reviewed in that PR. The only files that need to be reviewed for this PR are `sparse/distance/l2_distance.cuh`, `sparse/distance/bin_distance.cuh`, `sparse/distance/lp_distances.cuh`, and their corresponding gtests: `test/sparse/distance.cuh`

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Divye Gala (https://github.com/divyegala)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #261
dantegd pushed a commit to dantegd/raft that referenced this pull request Jul 23, 2024
Some minor fixes that are required to publish our rust bindings to crates.io:

 * using relative paths in the cuvs-sys cmake files didn't work, get around this by symlinking required files instead
 * Need to specify an actual version for cuvs-sys and ndarray-rand packages in the rust/cuvs/Cargo.toml file

Authors:
  - Ben Frederickson (https://github.com/benfred)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Ray Douglass (https://github.com/raydouglass)

URL: rapidsai/cuvs#207
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants