Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gauge fixing, pure gauge and optimized gauge I/O routines #253

Merged
merged 128 commits into from
Jul 29, 2015

Conversation

maddyscientist
Copy link
Member

This pull request will add the following features to QUDA

  • Gauge fixing
  • Pure gauge generation (overrelaxation and heatbath algorithms)
  • Added gauge fixing to interface_quda.cpp and milc_interface.cpp
  • Optimized gauge::FloatNOrder::load and gauge::FloatNOrder::save routines (vectorized I/O)
  • Modified unitarize_links_quda.cu in order to support link unitarization for 12/8 parameters
  • Modified copy_gauge_extended.cu in order to support copy from extended to regular gauge

Gauge fixing files:
lib: gauge_fix_ovr_extra.cu, gauge_fix_fft.cu, gauge_fix_ovr_extra.h, gauge_fix_ovr.cu, gauge_fix_ovr_hit_devf.cuh, CUFFT_Plans.h
For pure gauge config generation:
pgauge_exchange.cu, pgauge_init.cu, pgauge_heatbath.cu, pgauge_plaquette.cu, random.cu

This pull request replaces #252.

@maddyscientist maddyscientist changed the title Feature/gauge fix Add gauge fixing, pure gauge and optimized gauge I/O routines May 22, 2015
@maddyscientist maddyscientist changed the title Add gauge fixing, pure gauge and optimized gauge I/O routines Gauge fixing, pure gauge and optimized gauge I/O routines May 22, 2015
@maddyscientist maddyscientist added this to the QUDA 0.8 milestone May 22, 2015
@maddyscientist
Copy link
Member Author

Some benchmarks for copyGauge, taken at V=16^4 (old -> new)

  • double -> single
    • 18 -> 18 (139 -> 176 GB/s)
    • 12 -> 12 (118 -> 175 GB/s)
    • 8 -> 8 (86 -> 150 GB/s)
  • single -> half
    • 18 -> 18 (121 -> 247 GB/s)
    • 12 -> 12 (82 -> 213 GB/s)
    • 8 -> 8 (55 -> 113 GB/s)

While some of the half precision numbers look a bit too fast (L2 cache between runs perhaps?) the speedup is undeniable. It looks like all kernels that use the new gauge::FloatNOrder accessor are much faster. This really shows the strength of using generic accessor code, optimizing the accessor gives a speedup across the board.

@maddyscientist
Copy link
Member Author

I've added a new flag --enable-gauge-alg to enable these new algorithms. This is very much needed as the gauge fixing code takes a long time to compile.

Nuno, having now had a cursory look at the gauge fixing and overrelaxation codes, I can see there is a lot of code and very little comments as to what is happening. Also, the indentation is inconsistent with the rest of QUDA, which is mostly 2 space indents. It would be nice to get this fixed before it it merged into develop (I know there are many other parts of QUDA that has similar issues as well).

@nmrcardoso
Copy link
Contributor

Thanks a lot Mike to share the bechmarks and for this configuration option.

I will try to address all this issues today, since tomorrow i'll be
traveling.

It also would be nice to have a formatter.
Some weeks ago a looked at several formatters,
https://github.com/lattice/quda/wiki/agenda-call-2015-05-07
and the most promising seems to be the uncrustify, we only need to setup a
configuration file, and then with a simple script run it for every code
file.

On Fri, May 22, 2015 at 2:31 AM, mikeaclark [email protected]
wrote:

I've added a new flag --enable-gauge-alg to enable these new algorithms.
This is very much needed as the gauge fixing code takes a long time to
compile.

Nuno, having now had a cursory look at the gauge fixing and overrelaxation
codes, I can see there is a lot of code and very little comments as to what
is happening. Also, the indentation is inconsistent with the rest of QUDA,
which is mostly 2 space indents. It would be nice to get this fixed before
it it merged into develop (I know there are many other parts of QUDA that
has similar issues as well).


Reply to this email directly or view it on GitHub
#253 (comment).

@mathiaswagner
Copy link
Member

Thanks for reminding me of the tools. I have created #254 to remind us to define some format guide lines that we can then also use for tools.

Nuno, will you create an issue to remind us of the missing MPI support if we want to use FFT gauge fixing w/ multi GPUs?

It would be good to update the README file commenting on this restriction and (optional, if possible) catch this in the configuration process / at compilation time and not at runtime ( like we do for the asqtad force that also does not support Multi GPU).

@nmrcardoso
Copy link
Contributor

I created a remind issue for the gauge fixing with FFTs.
If you can add this to the configuration file would be great,
since I've never done a configure file.

Can you tell me a file code that follows the proper standard style
to see and apply the same style to the gauge fixing code?

On Fri, May 22, 2015 at 9:08 AM, Mathias Wagner [email protected]
wrote:

Thanks for reminding me of the tools. I have created #254
#254 to remind us to define some
format guide lines that we can then also use for tools.

Nuno, will you create an issue to remind us of the missing MPI support if
we want to use FFT gauge fixing w/ multi GPUs?

It would be good to update the README file commenting on this restriction
and (optional, if possible) catch this in the configuration process / at
compilation time and not at runtime ( like we do for the asqtad force that
also does not support Multi GPU).


Reply to this email directly or view it on GitHub
#253 (comment).

@maddyscientist
Copy link
Member Author

@nmrcardoso I've created a quick guide on how to update the configure / makefile here: https://github.com/lattice/quda/wiki/Adding-new-QUDA-features

@maddyscientist
Copy link
Member Author

I've just push a change to this branch that enables the vectorization for the ghost I/O routines as well. This brings a performance boost to these also, though not as significant as with bulk I/O routines: this is because the data volumes are much smaller here so they are more latency bound. Nevertheless, this should give a nice boost to any extended gauge routines.

Added warning msg when using --enable-gauge-alg and --enable-multi-gpu
"Gauge fixing with FFTs only supported for single-GPU. Use gauge fixing with overrelaxation in multi-GPU mode."
Added warning msg when using --enable-gauge-alg and --enable-multi-gpu
"Gauge fixing with FFTs only supported for single-GPU. Use gauge fixing with overrelaxation in multi-GPU mode."
@nmrcardoso
Copy link
Contributor

Thanks a lot Mike, the guide is very helpful.
I added a warning msg when using --enable-gauge-alg and --enable-multi-gpu
"Gauge fixing with FFTs only supported for single-GPU. Use gauge fixing with overrelaxation in multi-GPU mode."
to configure.ac and updated the configure.

@maddyscientist
Copy link
Member Author

Glad it helps. Going forward, I'd like to instill a policy that whenever anyone asks a question on QUDA (like how to add a feature, make a change or a question on some parameter or whatever), that instead of someone writing an email response they spend 5 minutes more updating wiki pages and / or doxygen to answer the question. This way we'll get much better documentation and the same questions will stop being asked :)

Mathias Wagner and others added 25 commits July 27, 2015 19:21
…dg and __shfl instructions. Modified the LICENSE to include the NVIDIA license.
…performance across the board, but some regressions at 12/8 reconstruct so left switched off for now (USE_LDG macro in include/gauge_field_order.h).
…deal with separate input/output fields with potentially differeing reconstruction types. Renamed hisq_links_quda.h to more appropriate unitarization_links.h.
Added reunitarization flops and bytes
Added reunitarization flop and byte count to the performance results
Added flops and bytes count
deleted su3_testing
maddyscientist added a commit that referenced this pull request Jul 29, 2015
Gauge fixing, pure gauge and optimized gauge I/O routines
@maddyscientist maddyscientist merged commit 7a91e92 into develop Jul 29, 2015
@maddyscientist maddyscientist deleted the feature/gauge-fix branch July 29, 2015 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants