Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add esmf managed threading test and fix inline post issues in P8 #1305

Merged
merged 43 commits into from
Aug 9, 2022

Conversation

junwang-noaa
Copy link
Collaborator

@junwang-noaa junwang-noaa commented Jun 30, 2022

PR Checklist

  • This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.

  • Results for one or more of the regression tests change and the reasons for the changes are understood and explained below.

  • New or updated input data is required by this PR. If checked, please work with the code managers to update input data sets on all platforms.

Instructions: All subsequent sections of text should be filled in as appropriate.

The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsibility to keep the PR up-to-date with the develop branch of ufs-weather-model.

Description

This PR will add two ESMF managed threading tests in the UFS WM coupled test. Also ESMF profile setting is added the job_card to help tracking the computational performance.

Issue(s) addressed

Link the issues to be closed with this PR, whether in this repository, or in another repository.
(Remember, issues must always be created before starting work on a PR branch!)

Testing

How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)

  • hera.intel
  • hera.gnu
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • opnReqTest for newly added/changed feature
  • CI

Dependencies

If testing this branch requires non-default branches in other repositories, list them. Those branches should have matching names (ideally).

Do PRs in upstream repositories need to be merged first?
If so add the "waiting for other repos" label and list the upstream PRs

tests/run_test.sh Outdated Show resolved Hide resolved
Copy link
Collaborator

@theurich theurich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I left a comment about tests/run_test.sh.

Copy link
Collaborator

@theurich theurich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me now.

@theurich
Copy link
Collaborator

theurich commented Jul 5, 2022

@junwang-noaa I see that the PR indicates "1 unresolved conversation". However, I cannot figure out how to mark it resolved. I just want to make sure this PR isn't hanging there for an odd technicality. Thanks.

1 similar comment
@theurich
Copy link
Collaborator

theurich commented Jul 5, 2022

@junwang-noaa I see that the PR indicates "1 unresolved conversation". However, I cannot figure out how to mark it resolved. I just want to make sure this PR isn't hanging there for an odd technicality. Thanks.

@junwang-noaa
Copy link
Collaborator Author

The code is updated with the comments. Now it is resolved.

@theurich
Copy link
Collaborator

theurich commented Jul 5, 2022

Why is the merge still blocked? Dusan and myself have approved, and it looks like 2 approving reviews are required.

1 similar comment
@theurich
Copy link
Collaborator

theurich commented Jul 5, 2022

Why is the merge still blocked? Dusan and myself have approved, and it looks like 2 approving reviews are required.

@junwang-noaa
Copy link
Collaborator Author

junwang-noaa commented Jul 5, 2022

We need two code managers to approve it. Also we have several commits before this PR, I will keep updating the branch.

@theurich
Copy link
Collaborator

theurich commented Jul 5, 2022

Got it. Thank you.

@junwang-noaa
Copy link
Collaborator Author

junwang-noaa commented Aug 5, 2022 via email

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Aug 5, 2022

Sorry the inline post change was not kept during the merge this morning. I just updated the code.

On Fri, Aug 5, 2022 at 1:53 PM Brian Curtis @.> wrote: Automated RT Failure Notification Machine: hera Compiler: intel Job: BL [BL] Repo location: /scratch1/NCEPDEV/nems/emc.nemspara/autort/pr/984370435/20220805140012/ufs-weather-model [BL] Baseline creation and move successful [RT] Repo location: /scratch1/NCEPDEV/nems/emc.nemspara/autort/pr/984370435/20220805163507/ufs-weather-model [RT] Error: Test cpld_restart_bmark_p8 011 failed in check_result failed [RT] Error: Test cpld_restart_bmark_p8 011 failed in run_test failed Please make changes and add the following label back: hera-intel-BL — Reply to this email directly, view it on GitHub <#1305 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TI4IVGUAMOFXB6W4A3VXVIKTANCNFSM52K64ZIA . You are receiving this because you were mentioned.Message ID: @.>

Sure! I will catch up with tests.

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: gaea
Compiler: intel
Job: BL
[BL] Repo location: /lustre/f2/pdata/ncep/emc.nemspara/autort/pr/984370435/20220805170007/ufs-weather-model
[BL] Error: Test compile_011 failed in run_compile failed
Please make changes and add the following label back: gaea-intel-BL

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: jet
Compiler: intel
Job: BL
[BL] Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/984370435/20220805170012/ufs-weather-model
[BL] Error: Test control_c384_progsigma 019 failed in run_test failed
[BL] Error: Test control_p8_lndp 025 failed in run_test failed
[BL] Error: Test rap_lndp_debug 061 failed in run_test failed
[BL] Error: Test hafs_regional_atm 077 failed in run_test failed
[BL] Error: Test hafs_regional_atm_ocn_wav 081 failed in run_test failed
[BL] Error: Test datm_cdeps_control_cfsr 085 failed in run_test failed
Please make changes and add the following label back: jet-intel-BL

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Aug 8, 2022

cpld_bmark_p8 and cpld_bmark_esmfthreads_p8 crash on cheyenne/intel: MPT ERROR: Rank 570(g:570) received signal SIGSEGV(11) and MPT: shepherd terminated. For cpld_bmark_p8, I double checked with develop branch and the develop branch ran ok. I tried esmf-8.4.0b08 but it seems like there is still pio issue. I used pio-2.5.7/esmf-8.4.0b08. Build doesn't go thru on cheyenne/intel/mpt. Do you think it might be worth to try another version of emsf like 8.3.1 or so?

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Aug 8, 2022

cpld_bmark_p8 cheyenne/gnu crashes as well.

@junwang-noaa
Copy link
Collaborator Author

@jkbk2004 There is no change on the pio/ESMF library in this PR. Since we turned on inline post, can you increase the write tasks per group to see if that can be an issue? I don't think we can run cpld_bmark_p8 (S2SWA) with gnu yet.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Aug 9, 2022

all tests passed. we can move on for merging in the pr. @DeniseWorthen @ChunxiZhang-NOAA will you leave final comments and move for approval? @junwang-noaa I am moving to fv3atm pr#569.

.gitmodules Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add inline post in the cpld_bmark_p8 related tests Add ESMF managed threading tests
6 participants