Feature/reproducible #1446

maddyscientist · 2024-03-18T23:20:13Z

(this isn't ready to be merged, I'm creating this PR as a placeholder.)

…reduction type is changed

… legacy zero helper function

…ternal computation

…oduce new CMake set types: real_t (QUDA_SCALAR_TYPE) - the host side scalar precision, complex_t the complex version of this (replaces Complex), device_reduce_t (QUDA_REDUCTION_TYPE). Eventually we will be able to set these to non-double types, but we're there yet....

…done)

…educe_t are different types, e.g., double vs doubledouble

… a different type (needed when copying from deviation_t<doubledouble> to deviation_t<double> for example

…ble (need to split into 64-bit words) and small generic cleanup

…so updates the coalesced writing to sysmem to work with large reduce_t types, such that sizeof(device_reduce_t) / sizeof(atomic_type<device_reduce_t>) > warp_size, which previously was a restriction: we now use a warp-stride loop to do a coalesced write to sysmem

…MP at present and just a simple gather method for now

…to real_t done after the multi-process reduction

…r direct comparisons, use max error not error sum when multiple norms are used to check correctness, print out the deviation when verbosity >= QUDA_VERBOSE

…itations representing this being WIP (bin bounds LUT repeatadly recomputed on the host, bin bounds LUT presently in explicit constant, CG reduction not supported, warp reductions rather register heavy, etc.)

…to feature/reproducible

…nction)

… variables (remove bad cast)

…lizing the reduction. All tests passing, but seemingly break when a SANITIZE build is used

…to feature/reproducible

maddyscientist added 30 commits July 18, 2023 14:36

Fix compiler warning with dbldble

ccb1c73

Add array copy assignment from one type of array to another

2049be6

Remove use of zero function and fix caxpyxmazMR functor for when the …

81566c8

…reduction type is changed

Make math_helper.cuh safe to include in non CUDA-aware compiler

ce5d396

Add doubledouble support for host, add complex-number support, remove…

7a4e04f

… legacy zero helper function

Modify reduction kernels to use device_reduce_t and not double for in…

2d67d97

…ternal computation

Use same underlying reduction type on host as device

feccf89

Move get_scalar<deviation_t> overload to float_Vector.h

d70303a

Add *= and /= overloads for doubledouble

4a7061a

Fix heavy quark residual norm for non-double reduction type

7e40280

Add various functions to doubledouble needed for generic deployment

2a80b2f

Add isfinite method for doubledouble

a4e8f76

99% of double -> real_t replacement now done (MPI reductions not yet …

a7cc5f7

…done)

Updated ReduceArg::complete function to work when real_t and device_r…

008c632

…educe_t are different types, e.g., double vs doubledouble

Remove some legacy code

dc62b01

Fix some issues

3324b05

Add missing cast operator to deviation_t::operator= when copying from…

a16ff6c

… a different type (needed when copying from deviation_t<doubledouble> to deviation_t<double> for example

Add ostream << overlead for doubledouble type

2b5bac8

Update CUDA block_reduce_helper.h atomic types to work with doubledou…

9d69abd

…ble (need to split into 64-bit words) and small generic cleanup

transform_reduce now respects device_reduce_t and real_t

d5f914d

Add initial support for multi-process doubledouble reductions: only Q…

1a73132

…MP at present and just a simple gather method for now

Multi-process reduction now uses device_reduce_t with the conversion …

d76e57c

…to real_t done after the multi-process reduction

Updates for blas_test: use same basis for host and device to allow fo…

27ba8de

…r direct comparisons, use max error not error sum when multiple norms are used to check correctness, print out the deviation when verbosity >= QUDA_VERBOSE

Minor comment clean up

4b5aa52

Add single gpu support for doubledouble

bcde6ad

Small fix for doubledouble::operator>

2ee73a6

Initial version of reproduction reductions, fully works but a few lim…

9789820

…itations representing this being WIP (bin bounds LUT repeatadly recomputed on the host, bin bounds LUT presently in explicit constant, CG reduction not supported, warp reductions rather register heavy, etc.)

Merge branch 'feature/gaugefield_unity' of github.com:lattice/quda in…

67514d0

…to feature/reproducible

maddyscientist added 10 commits August 15, 2023 16:37

Fix io_test when not all precision compiled

d455000

Fix compiler warning

030836d

Reenable explicit zero support with rfa_t (fixes dilution_test)

08b9776

Fix gauge loop trace when using doubledouble precision reductions

64ed607

Fix doubledouble multi-GPU compilation (missing comm_allreduce_max fu…

ba96720

…nction)

Fix gauge_path_test loop trace test when using doubledouble reduction…

b7687b4

… variables (remove bad cast)

Rework of reproducible reductions to pre-compute the bins when initia…

bc74e7b

…lizing the reduction. All tests passing, but seemingly break when a SANITIZE build is used

Minor optimization of det_trace kernel

6a60bc3

Fix compiler warning

a8085dc

Merge branch 'feature/gaugefield_unity' of github.com:lattice/quda in…

a413153

…to feature/reproducible

maddyscientist requested review from a team as code owners March 18, 2024 23:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/reproducible #1446

Feature/reproducible #1446

maddyscientist commented Mar 18, 2024

Feature/reproducible #1446

Are you sure you want to change the base?

Feature/reproducible #1446

Conversation

maddyscientist commented Mar 18, 2024