Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Size computation slows bulk insert significantly #237

Open
esoha-nvidia opened this issue Oct 3, 2022 · 1 comment
Open

Size computation slows bulk insert significantly #237

esoha-nvidia opened this issue Oct 3, 2022 · 1 comment
Labels
P1: Should have Necessary but not critical type: feature request New feature request

Comments

@esoha-nvidia
Copy link

esoha-nvidia commented Oct 3, 2022

The size computation requires a small memcpy from device to host and then a synchronization. Each one is the cause of serious performance degradation.

CUCO_CUDA_TRY(cudaMemcpyAsync(
&h_num_successes, num_successes_, sizeof(atomic_ctr_type), cudaMemcpyDeviceToHost, stream));
CUCO_CUDA_TRY(cudaStreamSynchronize(stream));

The synchronization is bad because it means that other unrelated streams are unable to do work.

The memcpy is bad because future copies are queued behind this one in architectures that have a limited number of cuda copy engines.

I was able to get a significant performance improvement by deleting these lines.

There ought to be a better way to compute size. Perhaps a lazy method. If this is too difficult, you might consider using templates to allow the user to choose to not maintain size_ at all! Use templates to change the type of size_ from int to a struct that has no members. That way it doesn't take up any space. Provide no methods on this struct so that the size_ doesn't get accidentally used. It will still use some space on the host but that seems like no big deal.

#237 (comment)

@PointKernel PointKernel added type: feature request New feature request P1: Should have Necessary but not critical labels Oct 3, 2022
@PointKernel
Copy link
Member

Related to asynchronous size computation #102

@esoha-nvidia Thanks for reporting this. We are aware of this issue and it will be addressed during our refactoring work #110.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1: Should have Necessary but not critical type: feature request New feature request
Projects
None yet
Development

No branches or pull requests

2 participants