Size computation slows bulk insert significantly #237

esoha-nvidia · 2022-10-03T14:15:21Z

The size computation requires a small memcpy from device to host and then a synchronization. Each one is the cause of serious performance degradation.

cuCollections/include/cuco/detail/static_map.inl

Lines 149 to 151 in 8786234

    
           CUCO_CUDA_TRY(cudaMemcpyAsync( 
        
             &h_num_successes, num_successes_, sizeof(atomic_ctr_type), cudaMemcpyDeviceToHost, stream)); 
        
           CUCO_CUDA_TRY(cudaStreamSynchronize(stream));

The synchronization is bad because it means that other unrelated streams are unable to do work.

The memcpy is bad because future copies are queued behind this one in architectures that have a limited number of cuda copy engines.

I was able to get a significant performance improvement by deleting these lines.

There ought to be a better way to compute size. Perhaps a lazy method. If this is too difficult, you might consider using templates to allow the user to choose to not maintain size_ at all! Use templates to change the type of size_ from int to a struct that has no members. That way it doesn't take up any space. Provide no methods on this struct so that the size_ doesn't get accidentally used. It will still use some space on the host but that seems like no big deal.

#237 (comment)

The text was updated successfully, but these errors were encountered:

PointKernel · 2022-10-03T15:32:30Z

Related to asynchronous size computation #102

@esoha-nvidia Thanks for reporting this. We are aware of this issue and it will be addressed during our refactoring work #110.

cccl-authenticator-app bot added this to CCCL Oct 3, 2022

PointKernel added type: feature request New feature request P1: Should have Necessary but not critical labels Oct 3, 2022

PointKernel mentioned this issue Oct 3, 2022

[FEA] Refactor of open address data structures #110

Open

PointKernel added this to the Refactor Open Address Data Structures milestone Oct 5, 2022

jrhemstad removed this from CCCL Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Size computation slows bulk insert significantly #237

Size computation slows bulk insert significantly #237

esoha-nvidia commented Oct 3, 2022 •

edited by sleeepyjack

Loading

PointKernel commented Oct 3, 2022

Size computation slows bulk insert significantly #237

Size computation slows bulk insert significantly #237

Comments

esoha-nvidia commented Oct 3, 2022 • edited by sleeepyjack Loading

PointKernel commented Oct 3, 2022

esoha-nvidia commented Oct 3, 2022 •

edited by sleeepyjack

Loading