Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add static_map::insert_if. #118

Merged
merged 8 commits into from
Nov 19, 2021
Merged

Conversation

vyasr
Copy link
Collaborator

@vyasr vyasr commented Nov 11, 2021

This PR adds support for a predicated rather than unconditional insertion into a hash map. There is some code duplication introduce here, but for this PR I've prioritized getting a working implementation together to make sure that we can move forward. Once this feature is merged I plan to look into some refactorings, but I'd like to avoid conflicts with #115 while getting a working version of insert_if for rapidsai/cudf#9586.

Resolves #116.

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just some small changes around stream support. No need to worry about #115 right now cause it's mainly me trying things around and not sure how it goes eventually.

include/cuco/detail/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map.inl Outdated Show resolved Hide resolved
include/cuco/static_map.cuh Outdated Show resolved Hide resolved
include/cuco/detail/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map.inl Outdated Show resolved Hide resolved
@vyasr
Copy link
Collaborator Author

vyasr commented Nov 12, 2021

@PointKernel I should probably have posted some of this in my initial PR comment. There are a number of tasks that I noticed would be worth doing but I wasn't sure whether I should include in this PR. Some of them are things you commented on, so let me list here and we can scope out this PR before I make any further changes. I'm not sure that all of these are necessarily good ideas, but they are relevant so we may as well discuss them here

  • I thought about adding stream support, but I wasn't sure if it would make more sense to do it in/after [Review] Add stream support for static_map execution #113 because otherwise in the interim we'll have just one API accepting streams. What do you think?
  • I considered using the insert_if_n approach because we need to also look up the offset in the stencil. However, I would actually like to do a broader refactoring where we implement only a single insertion kernel insert_if which we call from static_map::insert using a trivial return true; predicate. Assuming the compiler optimizes that out successfully (something we'd need to benchmark) it would substantially reduce code duplication.
  • I didn't enable a CG API for insert_if, but I think we should. I would like to benchmark the performance of the current non-tiled insert implementation against the tiled implementation with a tile size of 1. Do you know if anyone has tested that? If the performance of those looks the same, I think we should be able to reorder the template arguments to give a default tile_size of 1 and always use that template, which would allow us to get rid of the other version.
  • This change is potentially a bit more questionable, it may work but I haven't looked deeply enough to really evaluate and maybe you'll know offhand: could we unify the kernels for static_map and static_multimap, relying on the __device__ methods of the device_view to handle the differences between the data structures? It seems to me that, in the case of insert for example, if we templated the kernel on the view type we could rely on view.insert to handle the behavior of insertion specific to the different data types, and aside from that the kernels would be identical. Thoughts?

CC @jrhemstad in case you had thoughts too. You pointed out #110 which seems to include some of these as well, so I think it's just a matter of determining what's the appropriate scope for this PR and what I should chunk out for future work.

@jrhemstad
Copy link
Collaborator

I considered using the insert_if_n approach

@PointKernel 's point is that you can't rely on all iterators to have an operator<, so we implement the kernel as a begin & size algorithm where we infer the size from the provided first/last inputs.

I would like to benchmark the performance of the current non-tiled insert implementation against the tiled implementation with a tile size of 1

We have benchmarked that, and using the CG code path with a tile size of 1 is definitely slower than the purely scalar code path. There is a significant amount of extra work in the CG code path that is eliminated in the scalar path.

@PointKernel
Copy link
Member

PointKernel commented Nov 12, 2021

  • [Review] Add stream support for static_map execution #113 focuses on existing APIs and the current PR is introducing a new API. I'm inclined to add stream argument here so it can avoid back and forth effort between those two PRs. The temporary oddness is not a big issue IMO.
  • There are significant duplicates across cuco implementations (e.g. single map, multimap, etc.). I think it makes more sense to remove/optimize them in/after the "grand" refactoring when addressing [FEA] Refactor of open address data structures #110. A lot of things will be changed once we figure out the base open addressing data structure and for now, it's not obvious to me how the eventual APIs look like.
  • As for benchmarking, it's definitely useful when evaluating CG-based algorithms against non-CG ones. I'm actually very curious to see how much improvement we can get by using CG. Based on my experience with multimap, I think the non-CG version might be more proper for single map. My big lesson here is that every parameter matters (CG vs non-CG, CG size, occupancy, data type, key distribution) so it's really easy to make misleading conclusions by overfitting one specific use case.

@vyasr
Copy link
Collaborator Author

vyasr commented Nov 17, 2021

@PointKernel I'm not sure why, but the Github UI is not permitting me to re-request review here (I tried a few different browsers). It's unclear whether that's a result of some restriction on the repo or just something wrong with Github right now. This should be ready for another pass.

@PointKernel
Copy link
Member

@PointKernel I'm not sure why, but the Github UI is not permitting me to re-request review here (I tried a few different browsers). It's unclear whether that's a result of some restriction on the repo or just something wrong with Github right now. This should be ready for another pass.

I guess we need to be in some specific groups cause I can't do it either. @jrhemstad maybe the time to grow cuco "maintainer" group?

include/cuco/detail/static_map_kernels.cuh Outdated Show resolved Hide resolved
include/cuco/detail/static_map_kernels.cuh Outdated Show resolved Hide resolved
include/cuco/detail/static_map_kernels.cuh Show resolved Hide resolved
include/cuco/detail/static_map_kernels.cuh Outdated Show resolved Hide resolved
include/cuco/detail/static_map_kernels.cuh Show resolved Hide resolved
include/cuco/detail/static_map_kernels.cuh Outdated Show resolved Hide resolved
include/cuco/detail/static_map_kernels.cuh Outdated Show resolved Hide resolved
include/cuco/detail/static_map_kernels.cuh Outdated Show resolved Hide resolved
include/cuco/static_map.cuh Show resolved Hide resolved
@vyasr
Copy link
Collaborator Author

vyasr commented Nov 17, 2021

@PointKernel Thanks for those comments on CG and atomics, I haven't used either yet and I learned a few new things. Should be helpful in future refactorings/optimizations.

@jrhemstad jrhemstad merged commit bb8c34d into NVIDIA:dev Nov 19, 2021
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request Jan 5, 2022
…nti joins (#9666)

This PR resolves #9586, replacing the hash table used in semi and anti joins with cuco::static_map. It depends on NVIDIA/cuCollections#118. At present the code is slower than the original version, so we'll probably want to make some optimizations in cuco before merging this.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - Conor Hoekstra (https://github.com/codereport)
  - Jake Hemstad (https://github.com/jrhemstad)

URL: #9666
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] static_map::insert_if
3 participants