Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalability of the Coupled Model #1367

Open
DeniseWorthen opened this issue Aug 14, 2022 · 12 comments
Open

Scalability of the Coupled Model #1367

DeniseWorthen opened this issue Aug 14, 2022 · 12 comments
Assignees

Comments

@DeniseWorthen
Copy link
Collaborator

The EPIC includes issues the scalability issues and solutions in the coupled P8 runs. It includes the following tasks:

  1. Create scalability profile for each component used in coupled P8 runs in standalone mode, identify scalability issues.

  2. Identify issues in coupling mode.

  3. Identify scalability issues in high resolution coupled runs (e.g. C768mx025).

@DeniseWorthen DeniseWorthen added enhancement New feature or request Epic and removed enhancement New feature or request labels Aug 14, 2022
@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Aug 29, 2022

MOM6: @jiandewang

  1. Changing IO layout to (4,2) instead of (1,1) in test case resulted in ~4% speedup (history/restart files both).
  2. MOM6 can read the additional restart files, but this will require changes to downstream. Combining utility is available for history and restart files. Combining history files would need to be implemented w/ post workflow.
  3. Land block elimination has been tested in MOM6 standalone mode but not in coupled model. Need to generate small fix file to specify the land domain

CICE6: @DeniseWorthen No scalability analysis has been performed. However, CICE6 is cheap to run and is unlikely to impact coupled model performance significantly.

WW3: George Vandenberg and Matt Masarik have used gprof to identify bottlenecks in WW3. The SR init_get_jsea_isproc has been identified as an issue. Solutions include removing a call which utilizes the _isproc routine (and which scales as num sea points*(num PE)^2) and in-lining other locations. Inlining impact not large; testing now removing w3nmin call or omp threading the w3nmin call.

GOCART: @bbakernoaa reports that GOCART currently has no threading capability; adding OMP calls to the NUOPC Cap is being examined. There are also downstream issues in UPP (having problem computing all the diag fields in UPP.)

GOCART and FV3 share nodes because of shared memory concerns and more communication. ESMF-managed threading can run different threading levels on same nodes (DE-sharing). NASA reports that MPI scaling is good for GOCART so have not examined threading options. DE-sharing may be less invasive and leverage existing MPI scalability of GOCART.

ATM: George will look at ATM after finishing w/ WW3.

@jiandewang
Copy link
Collaborator

jiandewang commented Aug 30, 2022

testing of land block elimination approach in UFS doesn't work. I suspect in cap its mesh requires all subdomain information. Error information can be found in ocean PET files, for example, in PET503.ESMF_LogFile
MeshCap::meshcreateredistelems() Internal error

run directory can be found at /scratch1/NCEPDEV/climate/Jiande.Wang/working/MOM6-scalability/UFS-land-mask/T1

my testing is based on latest UFS (hash # 5477338), using cpld_bmark_p8 as a template, modified nems.comfigure on PE numbers for ocean, and added "mask_table.8.10x12" inside INPUT directory. MOM_override is setup as
LAYOUT=10,12
MASKTABLE=mask_table.8.10x12 !120-8=112

@DeniseWorthen
Copy link
Collaborator Author

@jiandewang The cice cap has the added capability for land block elimination but the MOM6 cap does not. That will need to be added. How do you set up the mask_table? Can you make one for easy testing w/ the mx100 ocean?

@jiandewang
Copy link
Collaborator

@DeniseWorthen my previous test is based on mx025, I will have a mx100 for you and a README file for set up et. al

@DeniseWorthen
Copy link
Collaborator Author

@jiandewang
Copy link
Collaborator

@DeniseWorthen mx1x1 is only using 20PE for ocean, every PE will contain ocean points for whatever X-Y layout I tried. So I have a sample mx05 for you at /scratch1/NCEPDEV/climate/Jiande.Wang/working/MOM6-scalability/mask-PE/05
inside check_mask there is a generate-mask-table.sh which is used to generate PE mask table, you can see the usage in the comment lines there. cpld_control_c192_p8 is the run dir I tried.

@DeniseWorthen
Copy link
Collaborator Author

@jiandewang reports not much gain for c384 with land block elimination. Will need someone to add feature to NUOPC cap if required.

Netcdf compression is not supported in current FMS2io code. GFDL says this option can be turned on but may require new release.

Matt will have a PR to remove bottleneck identified via gprof. 3 remaining routines have bulk of impact. First take getting some performance gains w/ OMP gave mixed results. Most likely exhausted quick fixes.

@DeniseWorthen
Copy link
Collaborator Author

Denise can talk to Tony about presenting CICE6 results at Scalability meeting.

@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Sep 12, 2022

Gerhard reminds us that MOM6 output is synchronous (on forecast tasks), so when MOM6 writes, it holds the system up. MOM6 forecast w/in time inner loop takes (already ready when next update is ready).

Suggestion is that the valid metric to use is not the overall run time impact, but how much IO costs you when it happens, relative to normal cycle (one w/o IO).

@DeniseWorthen
Copy link
Collaborator Author

Ali provided a 10min and 15min unstructured mesh to test for scalability of the coupled model using the unstructured mesh.

@junwang-noaa
Copy link
Collaborator

This is an ongoing task. The progress can be tracked at google sheet: GFSv17 highres and GFSv1 google sheets at:

https://docs.google.com/spreadsheets/d/1-plAZ7h7iLoCzOH9rkjklKmeN42dE-2-1mdCLugk4xI/edit#gid=1272699869

GFSv17S2S HR1 and HR2 scalability analysis has been conducted. We will look into HR3 when it becomes available.

@DeniseWorthen
Copy link
Collaborator Author

I've created a feature branch to implement the same timer logging feature in WW3 as was done for CICE. It includes a feature to over-write the timesteps in the mod_def file via configuration variables, which allows the same mod_def file to be used for either the inner or outerloop coupling. https://github.com/DeniseWorthen/WW3/tree/feature/logtimer-nosync

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

4 participants