Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource adjustments and updated obsproc/prepobs packages for tcvitals bug fix #862

Merged
merged 6 commits into from
Jun 16, 2022

Conversation

KateFriedman-NOAA
Copy link
Member

@KateFriedman-NOAA KateFriedman-NOAA commented Jun 15, 2022

Description

This PR includes memory updates and new obsproc/prepobs package versions:

  • Memory increases for some non-exclusive jobs that were getting the Cgroup mem limit exceeded warning message on WCOSS2.
  • Also translated a few more high res WCOSS2 resource settings from config.resources.nco.static to config.resources.emc.dyn and from config.fv3.nco.static to config.fv3.emc.dyn.
  • New obsproc R&D v1.0.2, includes WCOSS2 ops updates and tcvitals bug fix. Updated obsproc_run_ver in versions/*.ver files for WCOSS2, Hera, Orion.
  • New prepobs R&D v1.0.1, includes WCOSS2 ops updates and tcvitals bug fix. Updated prepobs_run_ver in versions/*.ver files for WCOSS2, Hera, Orion.

New obsproc/prepobs packages have been installed on WCOSS2, Hera, and Orion.

Will merge these changes into the release/gfs.v16.3.0 branch next.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

  • Clone and build tests on WCOSS2, Hera, Orion (obsproc/prepobs packages)
  • Cycled test on WCOSS2

Refs #665, #744

- Increase memory request values for some non-exclusive jobs that
have been getting Cgroup mem warning messages on WCOSS2.
- Translate additional high res resource values from
config.resources.nco.static into config.resources.emc.dyn.

Refs: NOAA-EMC#665, NOAA-EMC#744
- Adjust/increase memory requests for some non-exclusive jobs that were
getting the Cgroup mem warning messages on WCOSS2.
- Some additional memory adjustments to wave jobs in resource configs.

Refs: NOAA-EMC#665, NOAA-EMC#744
- Update obsproc_run_ver to 1.0.2-rd
- Update prepobs_run_ver to 1.0.1-rd

New obsproc/prepobs versions include tcvitals bug fix.

Refs NOAA-EMC#665
@KateFriedman-NOAA KateFriedman-NOAA added the maintenance Regular updates and maintenance work label Jun 15, 2022
@KateFriedman-NOAA KateFriedman-NOAA added this to the WCOSS2 - GFSv16.2.0 milestone Jun 15, 2022
@KateFriedman-NOAA KateFriedman-NOAA self-assigned this Jun 15, 2022
Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing major, but a few questions.

parm/config/config.resources.emc.dyn Show resolved Hide resolved
parm/config/config.resources.emc.dyn Outdated Show resolved Hide resolved
export npe_node_fcst=32
export npe_node_fcst_gfs=42
export npe_node_fcst_gfs=24
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do the values on L200 and L199 differ from the calculations on L195 and L196?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L195 = 128/3 = 42.6, which rounds up to 43 in the xml
L196 = 128/5 = 25.6, which rounds up to 26 in the xml

Neither value lays nicely across the WCOSS2 nodes.

George V and I did a lot of testing to get the values on L199 and L200. We want users using these values for C768 on WCOSS2. May also want them for lower resolutions too but haven't fully vetted those resolutions on WCOSS2 yet.

This reminded me I forgot to adjust the C768 block in config.fv3.emc.dyn with the tested values for layout_x_gfs and WRTTASK_PER_GROUP_GFS from config.fv3.nco.static. Please see that change in this PR now too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. So these values are specific to C768.
May be then please add a note to that effect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I added a check for CASE=768 to make sure those values aren't used outside of high res on WCOSS2.

- Update the C768 block in config.fv3.emc.dync with the tested values
from config.fv3.nco.static.
- Will be used on WCOSS2 for C768 but not on R&D machines because of
npe_node_max checks.

Refs: NOAA-EMC#665
Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just add a note about C768 for those calculations and that they will be revised for other resolutions when we get to it.

- Add a check that CASE=768 in the fcst block of
config.resources.emc.dyn where specific values of npe_node_fcst are set
for WCOSS2.

Refs: NOAA-EMC#665
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Regular updates and maintenance work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants