Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSL integration produces different results if adding a printing statement to the integrand in X-PSI installed in Lisa #270

Closed
thjsal opened this issue Feb 7, 2023 · 6 comments · Fixed by #300
Assignees

Comments

@thjsal
Copy link
Contributor

thjsal commented Feb 7, 2023

This might be hardware specific problem, but reporting it still here (have not managed to re-produce this same problem in other systems than in Lisa cluster so far):

The integration over this function:

cdef double marginal_integrand(double B, void *params) nogil:

using GSLgsl_integration_cquadfunction here:
gsl_integration_cquad(&f, lower, upper,

gives a different result, depending on whether a print statement (just printf("????")) is added somewhere inside or before the for-loop of that function. Without adding the print statement, the result is in contradiction with the result obtained in other systems. After adding the print statement the result agrees with what is obtained elsewhere.

Here are the input values for the gsl_integration_cquad function (same in both cases):
lower,upper,epsabs,epsrel,w=0.00452244, 0.00638843, 0.00000000, 0.00000001, 1227.38282439
And here are the output values without the print statement in Lisa:
result, abserr, nevals = 0.00000000, 0.00000000, 33
And here are the output values with the print statement in Lisa, and always in other systems:
result, abserr, nevals = 0.00023387, 0.00000000, 299

This difference causes a big effect on the final result since in the former case, likelihood evaluation is not performed to the end, but a negative number close to minus infinity is returned instead (there is an if-condition requiring that the result variable must be larger than zero).

This problem happens when running examples/examples_modeling_tutorial/TestRun_BB.py in Lisa with X-PSI version 2.0.0-rc, GSL version 2.7, and Cython version 0.29.32.

Would need to still separate the function from X-PSI to test if the problem exists also without X-PSI.

@sguillot
Copy link
Contributor

If you have a suggestion to test the gsl_integration_cquad (or marginal_integrand) function separate from X-PSI, I'll try it

@ychatonnier
Copy link

I found this:
https://stackoverflow.com/questions/28085423/the-notorious-printf-fix
it seems this magic printf fix is known, there are some answer track in this post.

@thjsal
Copy link
Contributor Author

thjsal commented Feb 27, 2023

@sguillot : I was thinking of making a simple Cython program that runs this same GSL integration (over the same marginal_integrand function) with the same input parameters I shared above. And see if that integration still gives a different result depending on whether or not having that print statement inside the integrand. Before the integration, we need probably also call the gsl_integration_cquad_workspace_alloc() to get it working.

@ychatonnier: Thanks for sharing this discussion! I think have also before encountered/heard about issues like this. Unlike there, however, our code goes back to "non-working" when taking the print statement away.

@thjsal thjsal self-assigned this Mar 18, 2023
@thjsal
Copy link
Contributor Author

thjsal commented Mar 18, 2023

Looks like in Lisa this problem only occurs when installing X-PSI using Intel-compiler. If installing using GCC, I get the correct answer without additional printings. I am still working on a simple test program. I noticed I still need to find out the values saved in a.data, a.n, a.SCALE, a.A, and a.star before calling this function and see if they are always the same.

EDIT: Values of these input parameters seem to be always the same, and the following:

 a.SCALE = 30759.61456563
 a.n =  32
 a.A = 39027.92894509

For a.data and a.star .... see the DATA and STAR arrays from the test program (in the zipped file) I posted in the next message.

@thjsal
Copy link
Contributor Author

thjsal commented Mar 19, 2023

test_gsl.zip
Attached some simplified test program. Can be compiled by either by:
LDSHARED="icc -shared" CC=icc python setup.py install
or
CC=gcc python setup.py install
And then tested by running:
python test_gsl_py.py

Based on running this test in Lisa, it seems that the problem is indeed something related to GSL, or how it works together with Intel-compilers. If compiling the test program with gcc, the output is always the same. If compiling the test program with icc, I get the (presumably) correct output only with the "printf trick" (uncommenting the line 66 in test_gsl.pyx).

I also tried different versions of GSL in Lisa, one already existing in the cluster (version called GSL/2.7-GCC-11.3.0), some installed myself using icc, but the problem seems to exist regardless of GSL version and the way of installing it.

On the other hand, in Snellius cluster Intel compiler seems to still work fine with GSL (i.e. no printf tricks needed), at least with the GSL version 2.5 that already natively exists there, and also with the latest GSL version (2.7.1) installed from the source (same that did not work for Lisa). When installing that in Snellius, however, I had to add some additional "-fPIC" and "-host" flags ../configure FC=ifort CC=icc CFLAGS='-O3 -xAVX -axCORE-AVX2 -mieee-fp -fPIC -funroll-loops' --prefix=$HOME/gsl -host=x86_64, to make the test program compilation to work (following otherwise instructions here: https://github.com/xpsi-group/xpsi/blob/v0.7.12/docs/source/SURFsara_systems.rst).

Intel compiler version should be the same both in Lisa and Snellius (icc (ICC) 2021.6.0 20220226).

@thjsal thjsal linked a pull request Mar 30, 2023 that will close this issue
@thjsal
Copy link
Contributor Author

thjsal commented Mar 30, 2023

So after a lot of testing, it appears the problem is caused by over-aggressive optimization, caused by numerical issues in the integrand function of X-PSI, when using Intel-compilers on Intel CPUs (which are used in Lisa but not in Snellius). The suggested solution to make integration and likelihood calculation work on all systems is provided in the linked pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants