Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aerosol-related PET log error message #1888

Open
DeniseWorthen opened this issue Sep 7, 2023 · 21 comments
Open

Aerosol-related PET log error message #1888

DeniseWorthen opened this issue Sep 7, 2023 · 21 comments
Labels
bug Something isn't working EPIC Support Requested

Comments

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Sep 7, 2023

Description

I compiled and ran both the cpld_control_nowave_noaero_p8 and cpld_bmark_p8 in debug mode using @ulmononian's feature/add_c5 branch. Both tests run, but the bmark test (which has aerosols) produces an PET log error message on PEs 000 through 0766. There is no error message in the 0767 log. The aerosol component runs on PEs 0:767. I don't know why the last Aerosol PE does not produce the error.

The error message is:

20230907 124949.789 INFO             PET0766 UFS Aerosols: Advancing from 2013-04-01T00:00:00 to 2013-04-01T00:05:00
20230907 125011.127 ERROR            PET0766 ESMCI_Info.C:668 Info::erase() Not found  - [json.exception.out_of_range.403] key 'GridCornerLons:' not found
20230907 125011.127 ERROR            PET0766 ESMCI_Info.C:688 Info::erase() Not found  - Internal subroutine call returned Error
20230907 125011.127 ERROR            PET0766 ESMC_InfoCDef.C:243 ESMC_InfoErase() Not found  - Internal subroutine call returned Error
20230907 125011.127 ERROR            PET0766 ESMF_Info.F90:2656 ESMF_InfoRemove() Not found  - Internal subroutine call returned Error
20230907 125011.127 ERROR            PET0766 src/Superstructure/AttributeAPI/interface/ESMF_Attribute.F90:46022 ESMF_AttributeRemoveAttPackGrid( Not found  - Internal subroutine call returned Error
20230907 125011.127 ERROR            PET0766 ESMCI_Info.C:668 Info::erase() Not found  - [json.exception.out_of_range.403] key 'GridCornerLats:' not found
20230907 125011.127 ERROR            PET0766 ESMCI_Info.C:688 Info::erase() Not found  - Internal subroutine call returned Error
20230907 125011.127 ERROR            PET0766 ESMC_InfoCDef.C:243 ESMC_InfoErase() Not found  - Internal subroutine call returned Error
20230907 125011.127 ERROR            PET0766 ESMF_Info.F90:2656 ESMF_InfoRemove() Not found  - Internal subroutine call returned Error
20230907 125011.127 ERROR            PET0766 src/Superstructure/AttributeAPI/interface/ESMF_Attribute.F90:46022 ESMF_AttributeRemoveAttPackGrid( Not found  - Internal subroutine call returned Error
20230907 125011.127 ERROR            PET0766 ESMCI_Info.C:668 Info::erase() Not found  - [json.exception.out_of_range.403] key 'GridCornerLons:' not found
20230907 125011.127 ERROR            PET0766 ESMCI_Info.C:688 Info::erase() Not found  - Internal subroutine call returned Error
20230907 125011.127 ERROR            PET0766 ESMC_InfoCDef.C:243 ESMC_InfoErase() Not found  - Internal subroutine call returned Error
20230907 125011.127 ERROR            PET0766 ESMF_Info.F90:2656 ESMF_InfoRemove() Not found  - Internal subroutine call returned Error
20230907 125011.127 ERROR            PET0766 src/Superstructure/AttributeAPI/interface/ESMF_Attribute.F90:46022 ESMF_AttributeRemoveAttPackGrid( Not found  - Internal subroutine call returned Error
20230907 125011.127 ERROR            PET0766 ESMCI_Info.C:668 Info::erase() Not found  - [json.exception.out_of_range.403] key 'GridCornerLats:' not found
20230907 125011.127 ERROR            PET0766 ESMCI_Info.C:688 Info::erase() Not found  - Internal subroutine call returned Error
20230907 125011.127 ERROR            PET0766 ESMC_InfoCDef.C:243 ESMC_InfoErase() Not found  - Internal subroutine call returned Error
20230907 125011.127 ERROR            PET0766 ESMF_Info.F90:2656 ESMF_InfoRemove() Not found  - Internal subroutine call returned Error
20230907 125011.127 ERROR            PET0766 src/Superstructure/AttributeAPI/interface/ESMF_Attribute.F90:46022 ESMF_AttributeRemoveAttPackGrid( Not found  - Internal subroutine call returned Error
20230907 125021.303 INFO             PET0766 Model Advance: before wrtcomp run

To Reproduce:

compile and run the feature/add_c5 branch using DEBUG for the bmark P8 test.

Additional context

Output

@DeniseWorthen DeniseWorthen added the bug Something isn't working label Sep 7, 2023
@DeniseWorthen DeniseWorthen changed the title Aeorsol-related PET log error message on Gaea C5 Aerosol-related PET log error message on Gaea C5 Sep 7, 2023
@ulmononian
Copy link
Collaborator

i had 100% of tests pass about a week ago or so on c5. so the debug test runs to completion here but you are seeing these errors anyway?

@DeniseWorthen
Copy link
Collaborator Author

Yes, the test runs, so it is not fatal. I don't know what it means actually. It must be in MAPL?

@natalie-perlin
Copy link
Collaborator

@DeniseWorthen @ulmononian -
does spack-stack need to have esmf-debug module for running the cpld_control_nowave_noaero_p8 and cpld_bmark_p8 in debug mode ?

@DeniseWorthen
Copy link
Collaborator Author

No, we no longer use esmf built w/ debug.

@zach1221
Copy link
Collaborator

Hi, @DeniseWorthen . Below are my experiment directories for cpld_control_nowave_noaero_p8_intel & cpld_bmark_p8_intel on Gaea C5, if you want to take a look at the PET logs. If you dont have access to view let me know.

/lustre/f2/scratch/Zachary.Shrader/FV3_RT/rt_76751/cpld_control_nowave_noaero_p8_intel
/lustre/f2/scratch/Zachary.Shrader/FV3_RT/rt_76751/cpld_bmark_p8_intel

@DeniseWorthen
Copy link
Collaborator Author

@zach1221 The same behaviour appears to be present. No PET error messages in the cpld_control_nowave_noaero_p8_intel test, which does not include aerosols. In the cpld_bmark_p8_intel test, the only PET log which does not contain the error message is PET0767 (the last atm PET).

@natalie-perlin
Copy link
Collaborator

@DeniseWorthen @zach1221

@DeniseWorthen
Copy link
Collaborator Author

@zach1221 @jkbk2004 The behavior on hercules w/rt these two cases is the same on Hercules. That is, there is no ERROR on the test w/o aerosols, ie, cpld_control_noaero_p8_intel. There is an PET log error on all but the last ATM PE for the cpld_bmark_p8 test. I also checked the low-resolution case, cpld_control_p8_intel and it shows the same thing---ERROR on all but the last ATM PE. Note, Aerosols run on the same PEs as the ATM. I believe this most likely a MAPL issue and not ESMF.

@natalie-perlin
Copy link
Collaborator

@DeniseWorthen -
A working directory on Gaea-c5 with the cpld_bmark_p8 test is /lustre/f2/scratch/ncep/Natalie.Perlin/FV3_RT/rt_221558/cpld_bmark_p8_intel
In case you may take a look to see whether this is the same behavior.

@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Nov 15, 2023

@natalie-perlin I had checked earlier w/ @zach1221 run directories and confirmed that the same error message is present as in the original issue. I also later confirmed that the same message is present on a hercules run so it isn't specific to C5. I believe it to be a MAPL issue most likely.

@jkbk2004
Copy link
Collaborator

I think GSFC people might be able to access to hercules. It will be a great starting point if we can set up a MAPL debugging installation and experiment. @mathomp4 Is Jiang still available?

@mathomp4
Copy link

I think GSFC people might be able to access to hercules. It will be a great starting point if we can set up a MAPL debugging installation and experiment. @mathomp4 Is Jiang still available?

@jkbk2004 Yes. @weiyuan-jiang should be able to help. That said, I'm not sure he (or any of us) have access to Hercules. We do have access to Orion.

@weiyuan-jiang
Copy link
Collaborator

I think I have access to Hercules. Judging from the logging error, it is more like a problem from ESMF ( due to the building ? ). But I will take a look at it.

@DeniseWorthen
Copy link
Collaborator Author

@weiyuan-jiang Thanks for checking. The reason I suspect MAPL is that we do not see the error in cases w/o the aerosol component.

@weiyuan-jiang
Copy link
Collaborator

What version of gocart and MAPL are you using? @DeniseWorthen

@DeniseWorthen
Copy link
Collaborator Author

I'm not sure how to tell which version of gocart we are using, but for MAPL we are using 2.35.2-esmf-8.4.2

@jkbk2004
Copy link
Collaborator

@zach1221
Copy link
Collaborator

Looks like 2.1.1 for gocart

@weiyuan-jiang
Copy link
Collaborator

I have talked to Ben. He thought there might be a small chance that an attribute is not there in the MAPL_GetHorzIJIndex call. ( It is confusing though, because it should not produce error) . Anyway, I have replaced that call in the branch . Would you please try this new MAPL?

@DeniseWorthen
Copy link
Collaborator Author

Since this error is not specific to C5 (it also shows up on Hercules), I've edited the issue title.

@DeniseWorthen DeniseWorthen changed the title Aerosol-related PET log error message on Gaea C5 Aerosol-related PET log error message Dec 1, 2023
@NickSzapiro-NOAA
Copy link
Collaborator

@DeniseWorthen I see this error in GEFS RT intel debug mode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working EPIC Support Requested
Projects
Status: No status
Status: No status
Development

No branches or pull requests

8 participants