Fix correctness in cuda_mapreduce #2106

Sbozzolo · 2024-12-18T00:57:37Z

cuda_mapreduce was not working correctly with certain spaces.

Why was this happening?

I added a comment to describe the algorithm in the commit.

In a nutshell, the algorithm was not taking into account the fact that the final block is not completely filled with points to process. Therefore, the reduction included some elements that did not contain real points (but the value 0).

Closes #2097

test/DataLayouts/unit_mapreduce.jl

sriharshakandala · 2024-12-20T19:11:33Z

One option is to use binary-op appropriate initialization. For example,

function _init_val_for_reduction(f::Function, ::Type{T}) where {T}
    f == min && return typemax(T)
    f == max && return typemin(T)
    return T(0)
end

with

reduction[tidx] = _init_val_for_reduction(op, T)

Sbozzolo · 2024-12-20T19:18:31Z

One option is to use binary-op appropriate initialization. For example,

function _init_val_for_reduction(f::Function, ::Type{T}) where {T}
    f == min && return typemax(T)
    f == max && return typemin(T)
    return T(0)
end

with

reduction[tidx] = _init_val_for_reduction(op, T)

This would require defining the init value for every function, which doesn't seem optimal.

Is there any issue with the fix in this PR?

sriharshakandala · 2024-12-20T19:21:34Z

ext/cuda/data_layouts_mapreduce.jl

@@ -31,6 +31,34 @@ function mapreduce_cuda(
    weighted_jacobian = OnesArray(parent(data)),
    opargs...,
 )
+    # This function implements the following parallel reduction algorithm:
+    #
+    # Blocks processes multiple data points at the same time (n_ops_on_load)


Each thread loads multiple data points in shmem!

`cuda_mapreduce` was not working correctly with certain spaces. Why was this happening? I added a comment to describe the algorithm in the commit. In a nutshell, the algorithm was not taking into account the fact that the final block is not completely filled with points to process. Therefore, the reduction included some elements that did not contain real points (but the value 0).

Sbozzolo force-pushed the gb/fix_cuda_reductions branch from 63a7ef8 to cafbbcf Compare December 18, 2024 00:59

charleskawczynski reviewed Dec 18, 2024

View reviewed changes

test/DataLayouts/unit_mapreduce.jl Show resolved Hide resolved

charleskawczynski requested a review from sriharshakandala December 18, 2024 20:51

Sbozzolo force-pushed the gb/fix_cuda_reductions branch 5 times, most recently from b0eea6e to 0298139 Compare December 19, 2024 16:02

sriharshakandala reviewed Dec 20, 2024

View reviewed changes

sriharshakandala approved these changes Dec 20, 2024

View reviewed changes

Sbozzolo force-pushed the gb/fix_cuda_reductions branch from 0298139 to 8cdf3f3 Compare December 20, 2024 22:29

Sbozzolo enabled auto-merge December 20, 2024 22:29

Sbozzolo merged commit 6539b89 into main Dec 21, 2024
32 of 34 checks passed

Sbozzolo deleted the gb/fix_cuda_reductions branch December 21, 2024 00:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix correctness in cuda_mapreduce #2106

Fix correctness in cuda_mapreduce #2106

Sbozzolo commented Dec 18, 2024

sriharshakandala commented Dec 20, 2024

Sbozzolo commented Dec 20, 2024

sriharshakandala Dec 20, 2024

Fix correctness in cuda_mapreduce #2106

Fix correctness in cuda_mapreduce #2106

Conversation

Sbozzolo commented Dec 18, 2024

sriharshakandala commented Dec 20, 2024

Sbozzolo commented Dec 20, 2024

sriharshakandala Dec 20, 2024

Choose a reason for hiding this comment