hisq_paths_force_test --gauge-order milc crashes with Segmentation fault #163

mathiaswagner · 2014-10-16T20:10:21Z

I just wanted to see whether the issues in #158 can be reproduced with the quad tests and noted for a single GPU QUDA build:

[mwagner@cream tests]$ ./hisq_paths_force_test --gauge-order milc --prec double
running the following fermion force computation test:
link_precision           link_reconstruct           space_dim(x/y/z)         T_dimension       Gauge_order
double                       18                         24/24/24                  24                milc
[...]
Using device 0: Tesla K40c
[...]
Segmentation fault

The same thing happens for single precision.

maddyscientist · 2014-10-16T20:33:00Z

Quick test: can you run this with valgrind / gdb to locate where the crash is occurring?

This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by

reply email and destroy all copies of the original message.

mathiaswagner · 2014-10-16T21:01:33Z

gdb output so far

0x00002aaaab02eea9 in ?? () from /usr/lib64/libcuda.so
(gdb) bt
#0  0x00002aaaab02eea9 in ?? () from /usr/lib64/libcuda.so
#1  0x00002aaaaaad1c82 in ?? () from /home/mwagner/cuda-6.0/lib64/libcudart.so.6.0
#2  0x00002aaaaaac4e1e in ?? () from /home/mwagner/cuda-6.0/lib64/libcudart.so.6.0
#3  0x00002aaaaaab9ee8 in ?? () from /home/mwagner/cuda-6.0/lib64/libcudart.so.6.0
#4  0x00002aaaaaae354c in cudaMalloc () from /home/mwagner/cuda-6.0/lib64/libcudart.so.6.0
#5  0x0000000001684b81 in quda::device_malloc_ (func=0x17528e0 "cudaGaugeField", file=0x17523ef "cuda_gauge_field.cu", line=42, size=99532800)
    at malloc.cpp:153
#6  0x0000000000450349 in quda::cudaGaugeField::cudaGaugeField (this=0x1a2bbc60, param=...) at cuda_gauge_field.cu:42
#7  0x0000000000405953 in hisq_force_init () at hisq_paths_force_test.cpp:362
#8  0x0000000000406055 in hisq_force_test () at hisq_paths_force_test.cpp:553
#9  0x00000000004069e0 in main (argc=3, argv=0x7fffffffe7e8) at hisq_paths_force_test.cpp:764

maddyscientist · 2014-10-16T22:52:21Z

The bug only happens when milc ordering is used, it seems to be fine using qdp gauge field ordering.

maddyscientist · 2014-10-16T22:57:37Z

This is a trivial bug - the cpu field ordering is hard-coded to be qdp, and so when you set a different field order from the command line there is a mismatch.

…ing (closes #163)

mathiaswagner added this to the QUDA 0.7.0 milestone Oct 16, 2014

maddyscientist added a commit that referenced this issue Oct 16, 2014

Fixed hisq_paths_force_test so as to not hardcode the cpu field order…

f567f66

…ing (closes #163)

maddyscientist closed this as completed Oct 16, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hisq_paths_force_test --gauge-order milc crashes with Segmentation fault #163

hisq_paths_force_test --gauge-order milc crashes with Segmentation fault #163

mathiaswagner commented Oct 16, 2014

maddyscientist commented Oct 16, 2014

mathiaswagner commented Oct 16, 2014

maddyscientist commented Oct 16, 2014

maddyscientist commented Oct 16, 2014

hisq_paths_force_test --gauge-order milc crashes with Segmentation fault #163

hisq_paths_force_test --gauge-order milc crashes with Segmentation fault #163

Comments

mathiaswagner commented Oct 16, 2014

maddyscientist commented Oct 16, 2014

reply email and destroy all copies of the original message.

mathiaswagner commented Oct 16, 2014

maddyscientist commented Oct 16, 2014

maddyscientist commented Oct 16, 2014