-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uninitialized local variables in land code #1717
Comments
Yea, I should have noted this sooner, but I also found 2
I see this on edison and cori-knl (and likely cori-haswell) with GNU. |
@ndkeen Thanks for the information. Did you run with threading on cori-knl? I know it's the default PE layout on edison. If so, It's actually a separate issue. There are race conditions in BGC-related code. |
Yes, these tests used threading (as is the default with acme_developer). I also just ran the test on KNL with GNU using 1 thread and they both passed.
|
I also just ran the 2 tests on cori-haswell with GNU just to verify. They pass. But this machine does not request this test to use threading. |
@ndkeen Thanks for confirming. I'm not aware of any supported test machine other than Melvin that doesn't set uninitialized local variable by default. |
@ndkeen which version of gnu? |
The version of GNU on edison/cori is 6.3 This same thing has happened to us on other projects -- Intel compilers happily init vars to zero, while GNU will catch it (ie cause problems). Testing with both (or more) compilers is great, of course. I am unaware of flags that will do what we want, but there are always things we can improve upon. |
For some reason, gnu 5.3.0 on melvin is the only compiler doing this. That's suspicious. @jgfouca does melvin have some flags other gnu machines don't? |
Oh and I guess gnu 6.3 on edison/cori with threading. |
If it helps, we can easily try the following versions on Cori: Yes, it looks like this is only something that happens with threading. |
@jqyin do the machines where you couldn't reproduce this with gnu use threading? |
I tried building these 2 tests with DEBUG=TRUE on KNL with GNU and they both failed right away here:
Which is in: May not be related to error above. casedir: Which does bring up something. A while ago I suggested I think our tests should be a set independent of the compiler flags. Then we should run all of these tests with DEBUG=TRUE, DEBUG=FALSE, and any other way we can think of. |
According to the ifort man page:
(so -nozero is the default) but this applies only to saved variables. |
However, ifort does have:
|
@rljacob No, all my tests were without threading and DEBUG=FALSE. It's the setting on Melvin and the same setting I used to test on other machines. I haven't seen it fail except on Melvin and confirmed that it passed with adding flag -finit-local-zero. There are currently 3 different issues: 1) uninitialized local variables 2) data race (PR #1468 fixed a bug but I tested there are still race conditions) 3) DEBUG flag (maybe related to issue #1686 ). All 3 are legacy issues and are in current master. I agree there should be tests for threading, DEBUG mode, etc in the first place. Currently, I'm working on 1) . |
@worleyph Thanks for the tips about Intel compiler. Gnu has a " -Wuninitialized " flag as well. I haven't found a way to disable the zeroing so I can test it on my local cluster though. |
@rljacob I don't believe melvin is using any special flags. you can see the flags in config_compilers.xml |
Does it use threading in the land model? |
I think we should change the default PE layout on Melvin (and all testing machines) so that it uses more than 1 threads by default. As I understand it, Melvin uses just 1 thread by default which defeats the purpose of all tests which vary threads. For example, ERP test initiate two simulations, one with default threads and other with half the number of default threads. If a default simulation is using just one thread, the second simulation will also use just one thread (which essentially converts ERP to ERS test). |
That's a great idea. We should revisit land tests to add/modify tests to have all these variations.
NAG compiler is another option we can use to track it down. NAG is better than any other compiler out there to catch uninitialized variables. Lahey compiler is another good one but it doesn't support advanced Fortran features. |
Initialize unset local variables in CNAllocationMod Fixes #1717 [non-BFB]
Initialize unset local variables in CNAllocationMod Fixes #1717 [non-BFB]
It have been confirmed that the 2 clm-eca tests DIFFs on Melvin when merging PR #1649 and #1468 were caused by uninitialized local variables in existing land code (most likely in BGC related code). Any BGC-related code changes are likely to give non-BFB results.
The text was updated successfully, but these errors were encountered: