Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failing with "ERROR: Requested field initialCarbon not available" #5204

Closed
xylar opened this issue Sep 25, 2022 · 24 comments
Closed

Test failing with "ERROR: Requested field initialCarbon not available" #5204

xylar opened this issue Sep 25, 2022 · 24 comments
Assignees

Comments

@xylar
Copy link
Contributor

xylar commented Sep 25, 2022

I tried running the ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel test to check #5202. The command I used was:

./create_test --wait --walltime 01:00:00 -g -b master_20220925 --baseline-root /lcrc/group/e3sm/ac.xylar/e3sm_baselines/ ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel

I am seeing error files:

/lcrc/group/e3sm/ac.xylar/scratch/chrys/ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel.G.20220925_121407_uoiz9s/run/log.ocean.0*

that show:

ERROR: Requested field initialCarbon not available
CRITICAL ERROR: xml_stream_parser failed.
Logging complete.  Closing file at 2022/09/25 12:37:02

It does seem that the analysis member is not enables:

&am_conservationcheck
 config_am_conservationcheck_compute_interval = 'dt'
 config_am_conservationcheck_compute_on_startup = .false.
 config_am_conservationcheck_enable = .false.
 config_am_conservationcheck_output_stream = 'conservationCheckOutput'
 config_am_conservationcheck_restart_stream = 'conservationCheckRestart'
 config_am_conservationcheck_write_on_startup = .false.
 config_am_conservationcheck_write_to_logfile = .true.
/

The field initialCarbon is, indeed, in the streams file:

<stream name="conservationCheckOutput"
        type="output"
        io_type="pnetcdf"
        filename_template="ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel.G.20220925_121407_uoiz9s.mpaso.hist.am.conservationCheck.$Y-$M-$D.nc"
        filename_interval="00-01-00_00:00:00"
        reference_time="01-01-01_00:00:00"
        output_interval="00-01-00_00:00:00"
        clobber_mode="append"
        packages="conservationCheckAMPKG">

...
<var name="initialCarbon"/>
<var name="finalCarbon"/>
<var name="carbonChange"/>
<var name="netCarbonFlux"/>
<var name="absoluteCarbonError"/>
<var name="relativeCarbonError"/>
<var name="accumulatedAbsoluteCarbonError"/>
<var name="accumulatedRelativeCarbonError"/>
</stream>

but as I understand it, this should be fine. If the package is disabled, this field simply shouldn't be allocated or written out.

@xylar
Copy link
Contributor Author

xylar commented Sep 25, 2022

@mark-petersen and @maltrud, I'm hoping you can help me figure this out, as this is one of my go-to tests.

@mark-petersen
Copy link
Contributor

Thanks @xylar for reporting this. I reproduced this error. This is puzzling to me because it looks as if the carbon variables are in the package and streams exactly like the other conservation variables for energy, mass, and salt. I'll see if I can figure it out.

@mark-petersen
Copy link
Contributor

mark-petersen commented Sep 26, 2022

Wow, that is really weird. There is an include line here:

components/mpas-ocean/src/driver/mpas_ocn_core_interface.F
712 #include "../inc/structs_and_variables.inc"

and in the build directory for this run all the new carbon variables are in the .inc file but not in the .f90 file. That just doesn't seem possible.

cd /lcrc/group/e3sm/ac.mpetersen/scratch/chrys/ERS.ne11_oQU240.WCYCL1850NS.chrysalis_gnu.20220926_135114_4zr02v/bld/cmake-bld/core_ocean

grep -in 'ocn_generate_pool_conservationCheckEnergy' */*
driver/mpas_ocn_core_interface.f90:99187:   subroutine ocn_generate_pool_conservationCheckEnergyAM(block, structPool, dimensionPool, packagePool)
driver/mpas_ocn_core_interface.f90:100076:   end subroutine ocn_generate_pool_conservationCheckEnergyAM
driver/mpas_ocn_core_interface.f90:102234:      call ocn_generate_pool_conservationCheckEnergyAM(block, structPool, dimensionPool, packagePool)
inc/structs_and_variables.inc:98952:   subroutine ocn_generate_pool_conservationCheckEnergyAM(block, structPool, dimensionPool, packagePool)
inc/structs_and_variables.inc:99861:   end subroutine ocn_generate_pool_conservationCheckEnergyAM
inc/structs_and_variables.inc:101924:      call ocn_generate_pool_conservationCheckEnergyAM(block, structPool, dimensionPool, packagePool)

grep -in 'ocn_generate_pool_conservationCheckCarbon' */*
inc/structs_and_variables.inc:101142:   subroutine ocn_generate_pool_conservationCheckCarbonAM(block, structPool, dimensionPool, packagePool)
inc/structs_and_variables.inc:101814:   end subroutine ocn_generate_pool_conservationCheckCarbonAM
inc/structs_and_variables.inc:101930:      call ocn_generate_pool_conservationCheckCarbonAM(block, structPool, dimensionPool, packagePool)

It writes these files new every time. I also tested with different order, putting Carbon variables first, but it is always initialCarbon etc that is missing. The cause of this error is that the new pool for conservationCheckCarbonAM is actually not generated.

@mark-petersen
Copy link
Contributor

The identical thing happens on cori, so this is not a strange i/o filesystem delay:

cori07:core_ocean$ pwd
/global/cscratch1/sd/mpeterse/e3sm_scratch/cori-haswell/ERS.ne11_oQU240.WCYCL1850NS.cori-haswell_gnu.20220926_153456_86snuu/bld/cmake-bld/core_ocean


grep -in 'ocn_generate_pool_conservationCheckEnergy' */*
driver/mpas_ocn_core_interface.f90:96396:   subroutine ocn_generate_pool_conservationCheckEnergyAM(block, structPool, dimensionPool, packagePool)
driver/mpas_ocn_core_interface.f90:97259:   end subroutine ocn_generate_pool_conservationCheckEnergyAM
driver/mpas_ocn_core_interface.f90:99375:      call ocn_generate_pool_conservationCheckEnergyAM(block, structPool, dimensionPool, packagePool)
inc/structs_and_variables.inc:98952:   subroutine ocn_generate_pool_conservationCheckEnergyAM(block, structPool, dimensionPool, packagePool)
inc/structs_and_variables.inc:99861:   end subroutine ocn_generate_pool_conservationCheckEnergyAM
inc/structs_and_variables.inc:101924:      call ocn_generate_pool_conservationCheckEnergyAM(block, structPool, dimensionPool, packagePool)

grep -in 'ocn_generate_pool_conservationCheckCarbon' */*
inc/structs_and_variables.inc:101142:   subroutine ocn_generate_pool_conservationCheckCarbonAM(block, structPool, dimensionPool, packagePool)
inc/structs_and_variables.inc:101814:   end subroutine ocn_generate_pool_conservationCheckCarbonAM
inc/structs_and_variables.inc:101930:      call ocn_generate_pool_conservationCheckCarbonAM(block, structPool, dimensionPool, packagePool)

@mark-petersen
Copy link
Contributor

mark-petersen commented Sep 26, 2022

This does not happen with stand-alone, which uses make. E3SM uses cmake (though I can't imagine how that changes a simple include statement)

pwd
/usr/projects/climate/mpeterse/repos/E3SM/master/components/mpas-ocean/src

grep -in 'ocn_generate_pool_conservationCheckCarbon' */*
driver/mpas_ocn_core_interface.f90:103584:   subroutine ocn_generate_pool_conservationCheckCarbonAM(block, structPool, dimensionPool, packagePool)
driver/mpas_ocn_core_interface.f90:104260:   end subroutine ocn_generate_pool_conservationCheckCarbonAM
driver/mpas_ocn_core_interface.f90:105172:      call ocn_generate_pool_conservationCheckCarbonAM(block, structPool, dimensionPool, packagePool)
inc/structs_and_variables.inc:101676:   subroutine ocn_generate_pool_conservationCheckCarbonAM(block, structPool, dimensionPool, packagePool)
inc/structs_and_variables.inc:102352:   end subroutine ocn_generate_pool_conservationCheckCarbonAM
inc/structs_and_variables.inc:103264:      call ocn_generate_pool_conservationCheckCarbonAM(block, structPool, dimensionPool, packagePool)

@xylar
Copy link
Contributor Author

xylar commented Sep 27, 2022

@philipwjones, any insight on the make vs. cmake difference in behavior?

@maltrud
Copy link
Contributor

maltrud commented Sep 27, 2022

thanks for looking into this, @mark-petersen. I'm on board for blaming cmake.

@jonbob
Copy link
Contributor

jonbob commented Sep 29, 2022

@xylar - could you please try this with "--project e3sm" in the create_test line?

@xylar
Copy link
Contributor Author

xylar commented Sep 30, 2022

@jonbob, yes, that worked. Should I always use --project e3sm? Just on Chrysalis?

Also, why did that fix the problem, any idea?

@xylar
Copy link
Contributor Author

xylar commented Sep 30, 2022

@jonbob, sorry, I spoke too soon. I accidentally ran the test on #5202 and it worked fine but running on master, I still get the same error with --project e3sm.

@jonbob
Copy link
Contributor

jonbob commented Sep 30, 2022

@xylar -- I think the project is required on chrysalis, but not on anvil. But that is so strange that it fails on master but not on #5202 -- the failures just seem a bit random? This may take a bit of work -- I was hoping there was just something odd happening when the project wasn't specified....

@xylar
Copy link
Contributor Author

xylar commented Sep 30, 2022

@jonbob, I believe #5202 just doesn't include the commits that introduced initialCarbon. I can to a test merge and try it. That was my plan anyway, I just wanted a baseline. It doesn't seem random to me.

@jonbob
Copy link
Contributor

jonbob commented Sep 30, 2022

@xylar -- thanks. I tried the same test on a merge of next with PR #5172 and it worked fine, which is why I thought it might be random. But I haven't looked at it as closely as you have. If your test merge shows the same failure, we may have to dive a bit deeper on Monday?

@xylar
Copy link
Contributor Author

xylar commented Sep 30, 2022

I'm seeing the same with master on Anvil, so it seems reproducible to me.

@xylar
Copy link
Contributor Author

xylar commented Sep 30, 2022

As an aside, building on Anvil is much faster than on Chrysalis. Any idea why?

@jonbob
Copy link
Contributor

jonbob commented Sep 30, 2022

I don't understand why that's true, but pretty much everyone sees how slow the build is on chrysalis. And I'll check out master and try the test again. Are you only seeing this with gnu?

@xylar
Copy link
Contributor Author

xylar commented Sep 30, 2022

I'm testing with intel, not gnu.

@xylar
Copy link
Contributor Author

xylar commented Sep 30, 2022

I tested a test merge of #5202 and it went fine. I'm trying to check out master again and re-test.

@xylar
Copy link
Contributor Author

xylar commented Sep 30, 2022

The fact that @mark-petersen was able to reproduce this on both Chrysalis and Cori-Haswell makes me think it's not random but the fact that a test merge of #5202 with master worked fine really is leaving me confused. Is it consistent for a given branch but random for different branches?!

@xylar
Copy link
Contributor Author

xylar commented Sep 30, 2022

A fresh check out of master worked fine. So I think what I was getting (and maybe @mark-petersen too) is a remnant from updating a previous master to the latest, rather than a completely clean check-out. I'm going to close this. Thanks for your patience.

@mark-petersen, maybe take this as a lesson to start clean rather than trying to update master in place.

@xylar xylar closed this as completed Sep 30, 2022
@maltrud
Copy link
Contributor

maltrud commented Sep 30, 2022

I've also checked out master this morning and it seems to be fine on anvil, chrysalis and compy.

@mark-petersen
Copy link
Contributor

I was able to confirm that this error does not occur with a fresh clone of master. What on odd problem, since the include files are created in the case build directory, which is recreated with each test. Well, that is something to remember. These all pass

./create_test SMS_D.T62_oQU120_ais20.MPAS_LISIO_TEST.chrysalis_gnu --project e3sm --walltime 30:00
./create_test ERS.ne11_oQU240.WCYCL1850NS.chrysalis_gnu --project e3sm --walltime 30:00
./create_test SMS_D_Ln9.T62_oQU120_ais20.MPAS_LISIO_TEST.cori-haswell_gnu -q debug --walltime 00:30:00
./create_test ERS.ne11_oQU240.WCYCL1850NS.cori-haswell_gnu -q debug --walltime 00:30:00

@mark-petersen
Copy link
Contributor

mark-petersen commented Sep 30, 2022

Ha! I found it. There is a file stashed here that does not get updated, but appears to be used in the build:

components/mpas-ocean/src/inc/structs_and_variables.inc

In my directory that produces this error, the old Energy variables are there but the new Carbon variables are not:

grep -in 'ocn_generate_pool_conservationCheckEnergy' ./components/mpas-ocean/src/inc/structs_and_variables.inc
97288:   subroutine ocn_generate_pool_conservationCheckEnergyAM(block, structPool, dimensionPool, packagePool)
98177:   end subroutine ocn_generate_pool_conservationCheckEnergyAM
100335:      call ocn_generate_pool_conservationCheckEnergyAM(block, structPool, dimensionPool, packagePool)

grep -in 'ocn_generate_pool_conservationCheckCarbon' ./components/mpas-ocean/src/inc/structs_and_variables.inc

@xylar if you have still have a local repo that produced this error, you should see the same thing. A fresh checkout does not have that file at all.

I think this is caused by building MPAS-Ocean stand-alone, and then running E3SM jobs without a make clean in the stand-alone directory. E3SM must just copy the include directories whole-sale, and not recreate the files if they already exist. The solution is to run stand-alone from a different local directory from E3SM simulations, or to make clean on stand-alone before E3SM runs if variables have been added. The better solution is for cmake to not copy those files, but I'm not sure where that is done right now.

On a separate note, I noticed that some include files were inadvertently added to the repo in these *_inc directories:
https://github.com/E3SM-Project/E3SM/tree/master/components/mpas-ocean/src/analysis_members

@xylar
Copy link
Contributor Author

xylar commented Sep 30, 2022

Thanks @mark-petersen, that explains a lot! I was, indeed, using the same directory for standalone and E3SM testing.

On a separate note, I noticed that some include files were inadvertently added to the repo in these *_inc directories:
https://github.com/E3SM-Project/E3SM/tree/master/components/mpas-ocean/src/analysis_members

I don't think that is correct. I think these are not auto-generated include files and that they are there intentionally to simplify code redundancy in the respective analysis members.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants