Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hisq_paths_force_test --gauge-order milc crashes with Segmentation fault #163

Closed
mathiaswagner opened this issue Oct 16, 2014 · 4 comments
Closed
Milestone

Comments

@mathiaswagner
Copy link
Member

I just wanted to see whether the issues in #158 can be reproduced with the quad tests and noted for a single GPU QUDA build:

[mwagner@cream tests]$ ./hisq_paths_force_test --gauge-order milc --prec double
running the following fermion force computation test:
link_precision           link_reconstruct           space_dim(x/y/z)         T_dimension       Gauge_order
double                       18                         24/24/24                  24                milc
[...]
Using device 0: Tesla K40c
[...]
Segmentation fault

The same thing happens for single precision.

@mathiaswagner mathiaswagner added this to the QUDA 0.7.0 milestone Oct 16, 2014
@maddyscientist
Copy link
Member

Quick test: can you run this with valgrind / gdb to locate where the crash is occurring?


This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by

reply email and destroy all copies of the original message.

@mathiaswagner
Copy link
Member Author

gdb output so far

0x00002aaaab02eea9 in ?? () from /usr/lib64/libcuda.so
(gdb) bt
#0  0x00002aaaab02eea9 in ?? () from /usr/lib64/libcuda.so
#1  0x00002aaaaaad1c82 in ?? () from /home/mwagner/cuda-6.0/lib64/libcudart.so.6.0
#2  0x00002aaaaaac4e1e in ?? () from /home/mwagner/cuda-6.0/lib64/libcudart.so.6.0
#3  0x00002aaaaaab9ee8 in ?? () from /home/mwagner/cuda-6.0/lib64/libcudart.so.6.0
#4  0x00002aaaaaae354c in cudaMalloc () from /home/mwagner/cuda-6.0/lib64/libcudart.so.6.0
#5  0x0000000001684b81 in quda::device_malloc_ (func=0x17528e0 "cudaGaugeField", file=0x17523ef "cuda_gauge_field.cu", line=42, size=99532800)
    at malloc.cpp:153
#6  0x0000000000450349 in quda::cudaGaugeField::cudaGaugeField (this=0x1a2bbc60, param=...) at cuda_gauge_field.cu:42
#7  0x0000000000405953 in hisq_force_init () at hisq_paths_force_test.cpp:362
#8  0x0000000000406055 in hisq_force_test () at hisq_paths_force_test.cpp:553
#9  0x00000000004069e0 in main (argc=3, argv=0x7fffffffe7e8) at hisq_paths_force_test.cpp:764

@maddyscientist
Copy link
Member

The bug only happens when milc ordering is used, it seems to be fine using qdp gauge field ordering.

@maddyscientist
Copy link
Member

This is a trivial bug - the cpu field ordering is hard-coded to be qdp, and so when you set a different field order from the command line there is a mismatch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants