Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gauge fixing, pure gauge and optimized gauge I/O routines #253

Merged
merged 128 commits into from
Jul 29, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
128 commits
Select commit Hold shift + click to select a range
144d45e
Code from master branch with FloatNOrder modified and including the g…
nmrcardoso Mar 20, 2015
555e0e1
Code from master branch with FloatNOrder modified and including the g…
nmrcardoso Mar 20, 2015
fca8ac6
Code from master branch with FloatNOrder modified and including the g…
nmrcardoso Mar 20, 2015
6b1e744
Modified gauge fixing code using FFTs in order to use less memory
nmrcardoso Mar 23, 2015
ebb7565
Modified gauge fixing code using FFTs in order to use less memory
nmrcardoso Mar 23, 2015
72df100
Modified gauge fixing code using FFTs in order to use less memory
nmrcardoso Mar 23, 2015
b4aa0f5
Fix single-GPU compilation for testing/su3_testing.cpp
maddyscientist May 22, 2015
7b3d0df
Fix link errors when GPU_UNITARIZE is not set.
maddyscientist May 22, 2015
205b88d
Merge branch 'develop' into feature/gauge-fix
maddyscientist May 22, 2015
5db6fe3
Added support for half-precision types in vectorized gauge::FloadNOrd…
maddyscientist May 22, 2015
d4d5426
Added updated register traits for vectorized short copy specializations.
maddyscientist May 22, 2015
71dac73
Gauge fixing and pure gauge algorithms are now enabled with the --ena…
maddyscientist May 22, 2015
91e718e
Added vectorized memory I/O for ghost routines in gauge::FloatNOrder.
maddyscientist May 22, 2015
933e8a7
Update configure.ac
nmrcardoso May 22, 2015
4f6dd35
Update configure
nmrcardoso May 22, 2015
1fc970b
Added gauge fixing routines for MILC interface
nmrcardoso May 22, 2015
1f31080
Update CUFFT_Plans.h
nmrcardoso Jun 1, 2015
b3f95fb
Update gauge_fix_fft.cu
nmrcardoso Jun 1, 2015
966eb34
Update gauge_fix_ovr.cu
nmrcardoso Jun 1, 2015
0caa014
Update gauge_fix_ovr_extra.cu
nmrcardoso Jun 1, 2015
1407308
Update gauge_fix_ovr_hit_devf.cuh
nmrcardoso Jun 1, 2015
a42fc61
Update gauge_tools.h
nmrcardoso Jun 1, 2015
1130905
Update gauge_tools.h
nmrcardoso Jun 1, 2015
6e0f1c5
Update gauge_fix_ovr.cu
nmrcardoso Jun 1, 2015
3cc3056
Update gauge_fix_fft.cu
nmrcardoso Jun 1, 2015
013c667
Update gauge_tools.h
nmrcardoso Jun 1, 2015
596b6a7
Update gauge_tools.h
nmrcardoso Jun 1, 2015
b02b2b2
Update gauge_fix_fft.cu
nmrcardoso Jun 1, 2015
a52edfb
Added doxyegn comments
nmrcardoso Jun 1, 2015
44fa4cf
Added doxyegn comments
nmrcardoso Jun 1, 2015
b90d28e
Added doxyegn comments
nmrcardoso Jun 1, 2015
25f4d5d
changed identation
nmrcardoso Jun 1, 2015
9958ac8
changed identation
nmrcardoso Jun 1, 2015
f4820c7
changed identation
nmrcardoso Jun 1, 2015
9c43cc1
Added doxyegn comments
nmrcardoso Jun 1, 2015
12b7bbd
Added doxyegn comments
nmrcardoso Jun 1, 2015
c5e1042
Added doxygen comments to the gauge fixing
nmrcardoso Jun 1, 2015
6decc61
Added doxygen comments to the gauge fixing
nmrcardoso Jun 1, 2015
1e08ffe
Added doxygen comments to the gauge fixing
nmrcardoso Jun 1, 2015
c72d08d
Added doxygen comments to the gauge fixing
nmrcardoso Jun 1, 2015
c406dd1
Merge branch 'develop' into feature/gauge-fix
Jun 1, 2015
294cf78
Merge branch 'develop' into feature/gauge-fix
maddyscientist Jun 19, 2015
76bac68
Updated gauge::FloatNOrder::load and gauge::FloatNOrder::save functio…
maddyscientist Jun 19, 2015
5646926
Cleanup debugging/unnecessary code
nmrcardoso Jun 19, 2015
5befe17
Added separate temporal-spatial plaquette
AlexVaq Jul 6, 2015
e8f4eef
Cleanup of plaquette computation. Updated flops/bytes for autotuning…
Jul 7, 2015
93d410d
Updated su3_test to compute the plaquette after downloading the gauge…
Jul 7, 2015
2e6b348
su3_test should work even when gauge tools have not been built.
Jul 7, 2015
3cc8b4c
Added Fortran interface for plaquette computation.
Jul 7, 2015
8723b94
Added profiling info for plaqQuda. Make the extended gauge field res…
maddyscientist Jul 7, 2015
34534da
Merge pull request #307 from lattice/feature/plaquette_spatial_temp
Jul 7, 2015
a584b51
add correctness check
nmrcardoso Jul 11, 2015
11f413f
add link determinant and link trace
nmrcardoso Jul 11, 2015
a1e81fa
add link reunitarization at the end of gauge fix
nmrcardoso Jul 11, 2015
ca7ade2
add pgauge_det_trace to makefile
nmrcardoso Jul 11, 2015
8880dd0
To calculate mean link determinant and trace
nmrcardoso Jul 11, 2015
1b8a9bf
Update pgauge_exchange.cu
nmrcardoso Jul 11, 2015
50013ce
Update su3_testing.cpp
nmrcardoso Jul 11, 2015
020ff13
Update gauge_fix_ovr.cu
nmrcardoso Jul 11, 2015
9a0ef6c
Update gauge_fix_fft.cu
nmrcardoso Jul 11, 2015
5e3e2ba
Update pgauge_exchange.cu
nmrcardoso Jul 11, 2015
d2628da
Update gauge_fix_ovr.cu
nmrcardoso Jul 11, 2015
7fdc14e
Update pgauge_init.cu
nmrcardoso Jul 11, 2015
f7b3550
Update pgauge_init.cu
nmrcardoso Jul 11, 2015
6cd992f
Update gauge_fix_ovr.cu
nmrcardoso Jul 11, 2015
aaa167f
Update pgauge_exchange.cu
nmrcardoso Jul 11, 2015
cd6d195
Fused exterior dslash kernels should only be built in multi-GPU mode.
maddyscientist Jul 16, 2015
93836c0
Fixed memory bandwidth computations for 4-d preconditioned 5-d dslash…
maddyscientist Jul 16, 2015
de17e4f
Merge pull request #317 from lattice/hotfix/dslash_cleanup
Jul 16, 2015
db7990a
Added clover_mapper trait for simplifying and reducing compile time f…
maddyscientist Jul 17, 2015
8f8ed54
Moved gauge_mapper accessor to gauge_field_order.h and applied to mul…
maddyscientist Jul 17, 2015
e075437
Merge pull request #319 from lattice/hotfix/compile-time
Jul 20, 2015
6f3b463
Fixed critical memory leak of message handles in persistent MPI commu…
maddyscientist Jul 21, 2015
b3ab600
Added Doxygen comments to MPI MshHandle_s struct.
maddyscientist Jul 21, 2015
cdb5644
Merge pull request #320 from lattice/hotfix/comms_leak
Jul 21, 2015
200011a
fix compilation error about undefined plq if GPU_GAUGE_TOOLS is not d…
Jul 23, 2015
408ad80
Merge pull request #322 from lattice/hotfix/gauge_plaq_compilation
maddyscientist Jul 23, 2015
89ec29e
fix cuda 6.5 compiler issue
nmrcardoso Jul 24, 2015
6b84e02
Update pgauge_det_trace.cu
nmrcardoso Jul 25, 2015
8952cfe
Update pgauge_plaquette.cu
nmrcardoso Jul 25, 2015
718e9e7
Update pgauge_plaquette.cu
nmrcardoso Jul 25, 2015
978613e
Update pgauge_heatbath.cu
nmrcardoso Jul 25, 2015
63b6c56
Update su3_testing.cpp
nmrcardoso Jul 27, 2015
5dd8047
Update pgauge_monte.h
nmrcardoso Jul 27, 2015
c3b62e1
Update Makefile
nmrcardoso Jul 27, 2015
4fcaf61
Delete pgauge_plaquette.cu
nmrcardoso Jul 27, 2015
3eb1186
BLAS functions are not built unless a Dirac operator is being built.
maddyscientist Jul 27, 2015
9b2b27a
Merge branch 'feature/gauge-fix' of https://github.com/lattice/quda i…
maddyscientist Jul 27, 2015
15438b7
Merge branch 'release/0.7.x' into feature/gauge-fix
maddyscientist Jul 27, 2015
9005d89
Update su3_testing.cpp
nmrcardoso Jul 27, 2015
50c5fe4
Update su3_testing.cpp
nmrcardoso Jul 27, 2015
7f14968
FFT and overrelaxation gauge fixing now uses gauge_mapper to reduce c…
maddyscientist Jul 27, 2015
c985904
Added method for setting the precision and corresponding internal fi…
maddyscientist Jul 27, 2015
26b9906
Applied gauge_mapper to pgauge_det_trace.cu pgauge_exchange.cu kernel…
maddyscientist Jul 27, 2015
2641be5
moved test related build decisions to tests/makefile
Jul 27, 2015
ef43219
split the su3_testing test into independent test and call it gauge_al…
Jul 27, 2015
26d4794
More compile-time reduction, this time for link unitarization.
maddyscientist Jul 27, 2015
16a4159
Merge branch 'feature/gauge-fix' of https://github.com/lattice/quda i…
maddyscientist Jul 27, 2015
a571fe3
first version of gauge_alg_test test that uses google test
Jul 27, 2015
aa2dab5
Merge branch 'feature/gauge-fix' of https://github.com/lattice/quda i…
Jul 27, 2015
0bf91c4
removed warning in gauge_alg_test
Jul 27, 2015
1601dae
Fixed plaquette flops counter, minor clean up of Tune label for unita…
maddyscientist Jul 28, 2015
c4916cf
Merge branch 'feature/gauge-fix' of https://github.com/lattice/quda i…
maddyscientist Jul 28, 2015
c19d2cc
acknowledge command line options in gauge_alg_test
Jul 28, 2015
0fa3833
Added loop unroll for computeValue kernel which gives a speedup.
maddyscientist Jul 28, 2015
738e9ce
Added generic library to QUDA, which provides generic support for __l…
maddyscientist Jul 28, 2015
aa7049a
gauge_field::FloatNOrder can now use __ldg loads. Generally improves…
maddyscientist Jul 28, 2015
cf96f2e
Merge branch 'feature/gauge-fix' of github.com:lattice/quda into feat…
maddyscientist Jul 28, 2015
45a0b19
Applid gauge_mapper to heatbath code to reduce compilation time.
maddyscientist Jul 28, 2015
c85dad3
Removed legacy unitarization routine, and generalized replacement to …
maddyscientist Jul 28, 2015
53308ef
potential fix for MPI issues with gauge_alg_test
Jul 28, 2015
bf6d3cb
Merge branch 'feature/gauge-fix' of https://github.com/lattice/quda i…
Jul 28, 2015
88ae420
Merge remote-tracking branch 'origin/develop' into feature/gauge-fix
maddyscientist Jul 29, 2015
64a94d2
Fixed bug in setting dslash_type in staggered_dslash_test.
maddyscientist Jul 29, 2015
d5fc364
Reduced compilation time for field-strength tensor using gauge_mapper.
maddyscientist Jul 29, 2015
dbfcb30
modified gauge_alg_test to use generation before every test
Jul 29, 2015
72b1cb2
Merge branch 'feature/gauge-fix' of https://github.com/lattice/quda i…
Jul 29, 2015
d5db5c1
Moved replicated atomicAdd(double*,double) definition to new header f…
maddyscientist Jul 29, 2015
96e4271
Update gauge_fix_fft.cu
nmrcardoso Jul 29, 2015
27df381
Update gauge_fix_ovr.cu
nmrcardoso Jul 29, 2015
c2a8cdb
Update unitarize_links_quda.cu
nmrcardoso Jul 29, 2015
d67e395
Delete su3_testing.cpp
nmrcardoso Jul 29, 2015
3f8b873
Added tune option to su3_test.
maddyscientist Jul 29, 2015
6b05307
Clean up of replicated indexing functions, moving them into index/ind…
maddyscientist Jul 29, 2015
8202051
Updated Makefile for recent header additions.
maddyscientist Jul 29, 2015
8883381
Update Makefile
nmrcardoso Jul 29, 2015
d6d6e28
Merge remote-tracking branch 'origin/develop' into feature/gauge-fix
maddyscientist Jul 29, 2015
112f05d
Merge branch 'feature/gauge-fix' of https://github.com/lattice/quda i…
maddyscientist Jul 29, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 36 additions & 7 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,42 @@ furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

QUDA is supported by NVIDIA, and includes the NVIDIA-licensed
libraries cub and generics.

Copyright (c) 2011-2015, NVIDIA Corporation
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the <organization> nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT
HOLDER> BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


QUDA leverages Google Test for unit testing, contained within tests/gtest.h
Expand Down
31 changes: 30 additions & 1 deletion configure
Original file line number Diff line number Diff line change
Expand Up @@ -618,6 +618,7 @@ BUILD_MULTI_GPU
DYNAMIC_CLOVER
BUILD_CONTRACT
BUILD_SSTEP
BUILD_GAUGE_ALG
BUILD_GAUGE_TOOLS
BUILD_HISQ_FORCE
BUILD_FERMION_FORCE
Expand Down Expand Up @@ -723,6 +724,7 @@ enable_gauge_force
enable_staggered_force
enable_hisq_force
enable_gauge_tools
enable_gauge_alg
enable_sstep
enable_contract
enable_dynamic_clover
Expand Down Expand Up @@ -1406,6 +1408,8 @@ Optional Features:
--enable-gauge-tools Build auxilary gauge tools: plaquette, gauge
evolver, APE, extended gauge routines (default:
disabled)
--enable-gauge-alg Build gauge fixing and pure gauge algorithms
(default: disabled)
--enable-sstep Build s-step linear solvers (default: disabled)
--enable-contract Build bilinear contraction code (default: disabled)
--enable-dynamic-clover Invert dynamically the clover term for
Expand Down Expand Up @@ -2247,6 +2251,15 @@ else
fi


# Check whether --enable-gauge-alg was given.
if test "${enable_gauge_alg+set}" = set; then :
enableval=$enable_gauge_alg; build_gauge_alg=${enableval}
else
build_gauge_alg="no"

fi


# Check whether --enable-sstep was given.
if test "${enable_sstep+set}" = set; then :
enableval=$enable_sstep; build_sstep=${enableval}
Expand Down Expand Up @@ -4174,6 +4187,13 @@ yes|no);;
;;
esac

case ${build_gauge_alg} in
yes|no);;
*)
as_fn_error $? " invalid value for --enable-gauge-alg " "$LINENO" 5
;;
esac

case ${build_sstep} in
yes|no);;
*)
Expand Down Expand Up @@ -4255,6 +4275,11 @@ $as_echo "$as_me: Enabling Multi-GPU" >&6;}
$as_echo "$as_me: Asqtad fermion force doesn't support multi-GPU yet: disabling " >&6;}
build_staggered_force="no";

if test "X${build_gauge_alg}X" = "XyesX"; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: Gauge fixing with FFTs only supported for single-GPU. Use gauge fixing with overrelaxation in multi-GPU mode. " >&5
$as_echo "$as_me: WARNING: Gauge fixing with FFTs only supported for single-GPU. Use gauge fixing with overrelaxation in multi-GPU mode. " >&2;}
fi


if test "X${qmp_home}X" = "XX"; then
# if test "X${mpi_home}X" = "XX"; then
Expand Down Expand Up @@ -4404,6 +4429,11 @@ $as_echo "$as_me: Setting BUILD_GAUGE_TOOLS = ${build_gauge_tools} " >&6;}
BUILD_GAUGE_TOOLS=${build_gauge_tools}


{ $as_echo "$as_me:${as_lineno-$LINENO}: Setting BUILD_GAUGE_ALG = ${build_gauge_alg} " >&5
$as_echo "$as_me: Setting BUILD_GAUGE_ALG = ${build_gauge_alg} " >&6;}
BUILD_GAUGE_ALG=${build_gauge_alg}


{ $as_echo "$as_me:${as_lineno-$LINENO}: Setting BUILD_SSTEP = ${build_sstep} " >&5
$as_echo "$as_me: Setting BUILD_SSTEP = ${build_sstep} " >&6;}
BUILD_SSTEP=${build_sstep}
Expand Down Expand Up @@ -5720,4 +5750,3 @@ if test -n "$ac_unrecognized_opts" && test "$enable_option_checking" != no; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: unrecognized options: $ac_unrecognized_opts" >&5
$as_echo "$as_me: WARNING: unrecognized options: $ac_unrecognized_opts" >&2;}
fi

23 changes: 22 additions & 1 deletion configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,12 @@ AC_ARG_ENABLE(gauge-tools,
[ build_gauge_tools="no" ]
)

AC_ARG_ENABLE(gauge-alg,
AC_HELP_STRING([--enable-gauge-alg], [ Build gauge fixing and pure gauge algorithms (default: disabled)]),
[ build_gauge_alg=${enableval} ],
[ build_gauge_alg="no" ]
)

AC_ARG_ENABLE(sstep,
AC_HELP_STRING([--enable-sstep], [ Build s-step linear solvers (default: disabled)]),
[ build_sstep=${enableval} ],
Expand Down Expand Up @@ -516,14 +522,22 @@ yes|no);;
;;
esac

dnl Build Hisq force
dnl Build gauge tools
case ${build_gauge_tools} in
yes|no);;
*)
AC_MSG_ERROR([ invalid value for --enable-gauge-tools ])
;;
esac

dnl Build gauge algorithms
case ${build_gauge_alg} in
yes|no);;
*)
AC_MSG_ERROR([ invalid value for --enable-gauge-alg ])
;;
esac

dnl Build sstep
case ${build_sstep} in
yes|no);;
Expand Down Expand Up @@ -619,6 +633,10 @@ then

AC_MSG_NOTICE([Asqtad fermion force doesn't support multi-GPU yet: disabling ])
build_staggered_force="no";

if test "X${build_gauge_alg}X" = "XyesX"; then
AC_MSG_WARN([Gauge fixing with FFTs only supported for single-GPU. Use gauge fixing with overrelaxation in multi-GPU mode. ])
fi


if test "X${qmp_home}X" = "XX"; then
Expand Down Expand Up @@ -729,6 +747,9 @@ AC_SUBST( BUILD_HISQ_FORCE, [${build_hisq_force}])
AC_MSG_NOTICE([Setting BUILD_GAUGE_TOOLS = ${build_gauge_tools} ] )
AC_SUBST( BUILD_GAUGE_TOOLS, [${build_gauge_tools}])

AC_MSG_NOTICE([Setting BUILD_GAUGE_ALG = ${build_gauge_alg} ] )
AC_SUBST( BUILD_GAUGE_ALG, [${build_gauge_alg}])

AC_MSG_NOTICE([Setting BUILD_SSTEP = ${build_sstep} ] )
AC_SUBST( BUILD_SSTEP, [${build_sstep}])

Expand Down
44 changes: 44 additions & 0 deletions include/atomic.cuh
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#pragma once

/**
@file atomic.cuh

@section Description

Provides definitions of atomic functions that are not native to
CUDA. These are intentionally not declared in the namespace to
avoid confusion when resolving the native atomicAdd functions.
*/

/**
Implementation of double-precision atomic addition using compare
and swap.

@param addr Address that stores the atomic variable to be updated
@param val Value to be added to the atomic
*/
static inline __device__ double atomicAdd(double *addr, double val){
double old = *addr, assumed;
do {
assumed = old;
old = __longlong_as_double( atomicCAS((unsigned long long int*)addr,
__double_as_longlong(assumed),
__double_as_longlong(val + assumed)));
} while ( __double_as_longlong(assumed) != __double_as_longlong(old) );

return old;
}

/**
Implementation of double2 atomic addition using two
double-precision additions.

@param addr Address that stores the atomic variable to be updated
@param val Value to be added to the atomic
*/
static inline __device__ double2 atomicAdd(double2 *addr, double2 val){
double2 old = *addr;
old.x = atomicAdd((double*)addr, val.x);
old.y = atomicAdd((double*)addr + 1, val.y);
return old;
}
6 changes: 6 additions & 0 deletions include/clover_field.h
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,12 @@ namespace quda {
const void* V(bool inverse=false) const { return inverse ? cloverInv : clover; }
const void* Norm(bool inverse=false) const { return inverse ? invNorm : norm; }

/**
This function returns true if the field is stored in an
internal field order for the given precision.
*/
bool isNative() const;

double* TrLog() const { return trlog; }

QudaCloverFieldOrder Order() const { return order; }
Expand Down
13 changes: 13 additions & 0 deletions include/clover_field_order.h
Original file line number Diff line number Diff line change
Expand Up @@ -300,4 +300,17 @@ namespace quda {
};


// Use traits to reduce the template explosion
template<typename Float,int N=72> struct clover_mapper { };

// double precision uses Float2
template<int N> struct clover_mapper<double,N> { typedef FloatNOrder<double, N, 2> type; };

// single precision uses Float4
template<int N> struct clover_mapper<float,N> { typedef FloatNOrder<float, N, 4> type; };

// half precision uses Float4
template<int N> struct clover_mapper<short,N> { typedef FloatNOrder<short, N, 4> type; };


}
34 changes: 34 additions & 0 deletions include/cub_helper.cuh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#pragma once
#include <cub/cub.cuh>

/**
@file cub_helper.cuh

@section Description

Provides helper functors for custom datatypes for cub algorithms.
*/

namespace quda {

/**
Helper functor for generic addition reduction.
*/
template <typename T>
struct Summ {
__host__ __device__ __forceinline__ T operator() (const T &a, const T &b){
return a + b;
}
};

/**
Helper functor for double2 addition reduction.
*/
template <>
struct Summ<double2>{
__host__ __device__ __forceinline__ double2 operator() (const double2 &a, const double2 &b){
return make_double2(a.x + b.x, a.y + b.y);
}
};

}
25 changes: 18 additions & 7 deletions include/gauge_field.h
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,18 @@ namespace quda {
else errorQuda("Error: invalid link type(%d)\n", link_type);
for (int d=0; d<nDim; d++) r[d] = 0;
}

/**
Helper function for setting the precision and corresponding
field order for QUDA internal fields.
@param precision The precision to use
*/
void setPrecision(QudaPrecision precision) {
this->precision = precision;
order = (precision == QUDA_DOUBLE_PRECISION || reconstruct == QUDA_RECONSTRUCT_NO) ?
QUDA_FLOAT2_GAUGE_ORDER : QUDA_FLOAT4_GAUGE_ORDER;
}

};

std::ostream& operator<<(std::ostream& output, const GaugeFieldParam& param);
Expand Down Expand Up @@ -153,12 +165,6 @@ namespace quda {
/** Whether the staggered phase factor has been applied */
bool staggeredPhaseApplied;

/**
This function returns true if the field is stored in an
internal field order for the given precision.
*/
bool isNative() const;

public:
GaugeField(const GaugeFieldParam &param);
virtual ~GaugeField();
Expand All @@ -178,7 +184,6 @@ namespace quda {
const int* R() const { return r; }
QudaGhostExchange GhostExchange() const { return ghostExchange; }
QudaStaggeredPhase StaggeredPhase() const { return staggeredPhaseType; }

/**
Apply the staggered phase factors to the gauge field.
*/
Expand All @@ -194,6 +199,12 @@ namespace quda {

void checkField(const GaugeField &);

/**
This function returns true if the field is stored in an
internal field order for the given precision.
*/
bool isNative() const;

size_t Bytes() const { return bytes; }
size_t PhaseBytes() const { return phase_bytes; }
size_t PhaseOffset() const { return phase_offset; }
Expand Down
Loading