Switch `Step` attributes from `cores` to `ntasks` and `cpus_per_task` #413

xylar · 2022-06-17T18:31:04Z

This merge is a vital step in @altheaden's work on parallel tasks. Up to now, we have been able to get by in compass without differentiating between "tasks" and "CPUs per task". However, there is an important distinction between these concepts in Slurm and other job managers. For task parallelism to function properly, we need to specify both --cpus_per_task (-c for short) and --ntasks (-n for short) to each Slurm job step we launch with srun.

In anticipation of this need, this merge replaces the cores attribute of the Step class with ntasks and cpus_per_task, and the min_cores attribute with min_tasks and min_cpus_per_task.

Nearly ever test case is affected (thus, 96 files are changed).

I made a separate commit for each land-ice test group but ran out of steam for the ocean test groups and switched (based on the experience gained from landice) to bulk search-and-replace in ocean test groups.

pep8speaks · 2022-06-17T18:31:27Z

Hello @xylar! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-07-29 09:41:03 UTC

xylar · 2022-06-17T18:51:46Z

compass/step.py

+            raise ValueError('Unsupported step configuration in which '
+                             'ntasks > 1 and  '
+                             'cpus_per_task != min_cpus_per_task')


To preserve general sanity, I'm enforcing a rule that cpus_per_task == min_cpus_per_task if ntasks > 1. If you're doing MPI jobs, you will always know that cpus_per_task = min_cpus_per_task = openmp_thread, since you're using cpus_per_task for OpenMP threading. (This will be enforced in a future update.) If you are running a multithreaded or multiprocessing job on a single node, you will always have ntasks = 1 and the number of threads or processes will be controlled by cpus_per_task and min_cpus_per_task.

Should cpus_per_task = min_cpus_per_task = openmp_thread be cpus_per_task = min_cpus_per_task * openmp_thread?

No, if you're running with threads, the number of OpenMP threads is the same thing as the number of CPUs per MPI task. This is just by definition.

In addition to that, what I'm also saying here is that things get too confusing if cpus_per_task is different from min_cpus_per_task. That is, it gets very confusing if a developer allows for some number of MPI tasks in a range between min_tasks and ntasks, and at the same time they also allow for a range of threads between min_cpus_per_task and cpus_per_task. I don't want to deal with that complicated (maybe even impossible) optimization problem. So I suggest we also require that the number of OpenMP threads is always fixed for tasks with multiple MPI tasks, so that cpus_per_task = openmp_threads and min_cpus_per_task = cpus_per_task.

Does that make things any clearer?

Yes, that makes sense, thanks.

xylar · 2022-06-17T19:00:14Z

Testing

I have run the ocean pr and land-ice full_integration test suites on Anvil with Intel/Intel-MPI and Gnu/MVAPICH, respectively. All results are bit-for-bit and runtimes are similar. Results are here:

/lcrc/group/e3sm/ac.xylar/compass_1.1/anvil/test_20220617/pr_slurm_terms4

and here:

/lcrc/group/e3sm/ac.xylar/compass_1.1/anvil/test_20220617/integration_slurm_terms

I also set up (but did not try to run) all the tests in compass, to make sure at least through the setup stage there were no references to missing core, min_core or thread attributes.

xylar · 2022-06-17T19:00:28Z

I'm still working on updating the documentation.

xylar · 2022-06-17T20:53:25Z

@altheaden, it would be great if you could run a few tests, maybe even the pr test suite on Chrysalis, since I ran on Anvil.

@matthewhoffman and @trhille, if you are willing to do some testing on your favorite machines to make sure I haven't broken tests on the landice side. I only tests those test cases that are in the full_integration test suite so if you want to run some tests outside of that suite, that might be most important.

@mark-petersen, similarly on the ocean side. I ran the pr test suite but I can't feasibly run all of the ocean tests. If you want to test a few that are outside the pr suite, that might be the most helpful.

Obviously, I'm also keen for feedback from all of you.

xylar · 2022-06-17T21:10:20Z

I reran tests after finding a few more cores-->ntasks that I missed. Tests still pass and I updated the output directories above with the latest. I'm signing off. Have a nice weekend!

altheaden · 2022-06-21T17:24:49Z

@xylar In the cosine bell QU90 test, we're seeing that it's using 128 threads when it should be just using one. It looks like the --exclusive flag isn't working as expected.

xylar · 2022-06-21T17:25:17Z

@mark-petersen and @matthewhoffman. please hold off on reviewing until @altheaden and I figure out the issue above.

xylar · 2022-06-21T19:16:02Z

Okay, issue figured out. I had forgotten to actually use the environment variable (OMP_NUM_THREADS) that I had set. That leads to some mighty strange behavior!

xylar · 2022-07-29T10:32:15Z

@trhille, I was able to run the full_integration test suite successfully on Anvil with Gnu and MVAPICH, and all tests were bit-for-bit. I wasn't able to reproduce the issues with any of the following on Anvil:

landice/eismint2/restart_test
landice/eismint2/enthalpy_restart_test
landice/dome/2000m/fo_decomposition_test
landice/greenland/fo_decomposition_test
landice/greenland/fo_restart_test

It could be that a recent rebase took care of these or it could be that they're specific to Cori. I will test full_integration there next.

I haven't investigated any of the calving tests yet.

xylar · 2022-07-29T12:37:49Z

@trhille,
On Cori, I'm seeing the following failures during test validation (not execution) with the current master:

landice/greenland/fo_restart_test
landice/thwaites/restart_test
landice/humboldt/mesh-3km_restart_test/velo-fo_calving-von_mises_stress_damage-threshold_faceMelting
I don't think you reported the middle one in your comment above (Switch Step attributes from cores to ntasks and cpus_per_task #413 (comment)).

I see the same failures with this branch but no other failures (i.e. not the others you mentioned in #413 (comment)) and the baseline comparison passes for all tests (even these).

Baseline:

/global/cscratch1/sd/xylar/compass_1.2/test_20220729/landice_full_baseline

This branch:

/global/cscratch1/sd/xylar/compass_1.2/test_20220729/landice_full_slurm_terms

I believe permissions should be such that you can take a look.

I did need to copy the components/mpas-framework/Makefile over from the latest E3SM-Project submodule to get the build to work out.

trhille · 2022-07-29T15:26:08Z

@xylar, I'm wondering if the issue is with my spack packages or MPAS-Tools, because the test execution failures seem to arise from MPAS Cell Culler in the eistmint2 cases and from an incorrect layerThicknessFractions in the mesh for the dome/2000m/fo_decomposition_test case. I'll test again and update you.

trhille · 2022-07-29T16:15:12Z

Okay, I am no longer getting the test execution errors in full_integration, and all tests are passing baseline comparison, although the following tests are failing validation, as you also saw:
landice_greenland_fo_restart_test
landice_thwaites_restart_test
landice_humboldt_mesh-3km_restart_test_velo-fo_calving-von_mises_stress_damage-threshold_faceMelting

It's possible I accidentally had an old branch of MPAS-Tools checked out when I ran the test before, but I'm not sure.

xylar · 2022-07-30T10:57:38Z

@trhille, I think the more likely thing is just that your conda environment wasn't up-to-date in on or the other branch. Since the 1.1.0 release and changing to 1.2.0-alpha.1 on master, some dependencies have changed that might make a difference.

xylar · 2022-08-10T10:27:20Z

@matthewhoffman and @mark-petersen, could you make this PR a priority (relative to other reviews -- I know these are very busy times). This is holding up other work that @altheaden and I would like to do before summer is over and she has to resume classes.

mark-petersen

Thanks @xylar and @altheaden for your work on this. I can see how specifying both ntasks and cpus_per_tasks would be needed to match with the srun format.

I tested this compass PR on cori with intel and badger with gnu, running today's E3SM master. I ran the nightly test suite on both and everything was successful.

I also tried some extra test cases on both:

cori10:pr$ compass setup -p /global/homes/m/mpeterse/repos/E3SM/master/components/mpas-ocean/ -w $n/ocean_model_220816_c4b7eacb_co_intel-nersc_openmp_debug_master -n 196 200 168 183
Setting up test cases:
  ocean/soma/32km/default
  ocean/soma/32km/three_layer
  ocean/internal_wave/default
  ocean/planar_convergence/horizontal_advection
target cores: 272
minimum cores: 27

These appear to run correctly except that ocean/soma/32km/three_layer fails in forward mode both before and after this PR, so that is unrelated.

I do get a warning on cori that appears to be related to this PR:

 pwd
/global/cscratch1/sd/mpeterse/runs/n/ocean_model_220816_c4b7eacb_co_intel-nersc_openmp_debug_master/ocean/soma/32km/three_layer
(dev_compass_1.2.0-alpha.1) nid00961:three_layer$ compass run
ocean/soma/32km/three_layer
/global/u2/m/mpeterse/repos/compass/pr/compass/parallel.py:93: UserWarning: Slurm found 64 cpus per node but config from mache was 32
  warnings.warn(f'Slurm found {cpus_per_node} cpus per node but '
compass calling: compass.ocean.tests.soma.soma_test_case.SomaTestCase.run()
  inherited from: compass.testcase.TestCase.run()
  in /global/u2/m/mpeterse/repos/compass/pr/compass/testcase.py

Running steps: initial_state, forward
  * step: initial_state

It looks like the warning is just telling us that we are not taking advantage of all the cores available on the node, so would be expected.

xylar · 2022-08-16T14:37:37Z

Thanks a bunch @mark-petersen!

It's worth figuring out what's wrong with 3-layer SOMA but not here.

The warning:

Slurm found 64 cpus per node but config from mache was 32

should be fixed in mache and isn't related to this PR. This happened because NERSC turned on hyperthreading on Cori-Haswell and it now looks like there are 64 cores per node rather than 32. compass trusts what the SLURM variable says (i.e. 64) rather than what mache thinks the number should be, so it's fine.

matthewhoffman

I tested the landice integration suite on Badger including baseline comparison and all tests pass. I think that is sufficient for MALI interests.

xylar · 2022-08-16T20:07:34Z

Thank you very much @trhille, @matthewhoffman and @mark-petersen!!! This will be a big deal for allowing @altheaden's work to move forward. Sorry it's so disruptive.

xylar added enhancement New feature or request ocean land ice framework labels Jun 17, 2022

xylar requested review from mark-petersen, matthewhoffman and altheaden June 17, 2022 18:31

xylar self-assigned this Jun 17, 2022

xylar force-pushed the switch_steps_to_slurm_terminology branch from 0ee98a2 to a6fe7b3 Compare June 17, 2022 18:36

xylar commented Jun 17, 2022

View reviewed changes

xylar force-pushed the switch_steps_to_slurm_terminology branch 2 times, most recently from 7c214f1 to a892190 Compare June 17, 2022 20:45

xylar requested a review from trhille June 17, 2022 20:50

xylar added the in progress This PR is not ready for review or merging label Jun 21, 2022

xylar marked this pull request as draft June 21, 2022 17:25

xylar force-pushed the switch_steps_to_slurm_terminology branch from 517d32b to 267e8d5 Compare June 21, 2022 18:42

xylar removed the in progress This PR is not ready for review or merging label Jun 21, 2022

xylar marked this pull request as ready for review June 21, 2022 19:14

xylar force-pushed the switch_steps_to_slurm_terminology branch 3 times, most recently from 6a2e411 to 5fa2549 Compare June 24, 2022 14:31

xylar added 8 commits July 29, 2022 10:52

Update hydro_radial for new terminology

f9a0449

Update mismipplus for new terminology

7f1de38

Update thwaites for new terminology

6b12dc5

Switch to new terminology in ocean framework

4e1acad

Update to new terminology in all ocean test groups

ffae4e5

A bit of clean up of internal_waves

bbc45cb

Update documentation

0b0af1f

Update hurricane for new terminology

bf8ca45

xylar force-pushed the switch_steps_to_slurm_terminology branch from abaec21 to bf8ca45 Compare July 29, 2022 08:57

xylar added 3 commits July 29, 2022 11:15

Remove unused Mesh step in Cosine Bell

95a0617

Update spherical mesh steps with new terminology

e69f34c

Update MonthlyOutputTest for Slurm terminology

e01ee2d

xylar mentioned this pull request Aug 12, 2022

Move run operations from testcase to run/serial #428

Merged

mark-petersen approved these changes Aug 16, 2022

View reviewed changes

matthewhoffman approved these changes Aug 16, 2022

View reviewed changes

trhille approved these changes Aug 16, 2022

View reviewed changes

xylar merged commit 9b29ac3 into MPAS-Dev:master Aug 16, 2022

xylar deleted the switch_steps_to_slurm_terminology branch August 16, 2022 20:07

This was referenced Aug 16, 2022

Port legacy dam break test case #417

Merged

New Mesh: Add ARRM10to60E2r1 mesh #414

Merged

Switch global_ocean and hurricane to framework base and culled mesh #429

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch `Step` attributes from `cores` to `ntasks` and `cpus_per_task` #413

Switch `Step` attributes from `cores` to `ntasks` and `cpus_per_task` #413

xylar commented Jun 17, 2022

pep8speaks commented Jun 17, 2022 •

edited

Loading

xylar Jun 17, 2022

matthewhoffman Jul 12, 2022

xylar Jul 12, 2022

matthewhoffman Jul 13, 2022

xylar commented Jun 17, 2022 •

edited

Loading

xylar commented Jun 17, 2022

xylar commented Jun 17, 2022

xylar commented Jun 17, 2022

altheaden commented Jun 21, 2022

xylar commented Jun 21, 2022

xylar commented Jun 21, 2022

xylar commented Jul 29, 2022

xylar commented Jul 29, 2022 •

edited

Loading

trhille commented Jul 29, 2022 •

edited

Loading

trhille commented Jul 29, 2022 •

edited

Loading

xylar commented Jul 30, 2022

xylar commented Aug 10, 2022

mark-petersen left a comment

xylar commented Aug 16, 2022

matthewhoffman left a comment

xylar commented Aug 16, 2022

Switch Step attributes from cores to ntasks and cpus_per_task #413

Switch Step attributes from cores to ntasks and cpus_per_task #413

Conversation

xylar commented Jun 17, 2022

pep8speaks commented Jun 17, 2022 • edited Loading

Comment last updated at 2022-07-29 09:41:03 UTC

xylar Jun 17, 2022

Choose a reason for hiding this comment

matthewhoffman Jul 12, 2022

Choose a reason for hiding this comment

xylar Jul 12, 2022

Choose a reason for hiding this comment

matthewhoffman Jul 13, 2022

Choose a reason for hiding this comment

xylar commented Jun 17, 2022 • edited Loading

Testing

xylar commented Jun 17, 2022

xylar commented Jun 17, 2022

xylar commented Jun 17, 2022

altheaden commented Jun 21, 2022

xylar commented Jun 21, 2022

xylar commented Jun 21, 2022

xylar commented Jul 29, 2022

xylar commented Jul 29, 2022 • edited Loading

trhille commented Jul 29, 2022 • edited Loading

trhille commented Jul 29, 2022 • edited Loading

xylar commented Jul 30, 2022

xylar commented Aug 10, 2022

mark-petersen left a comment

Choose a reason for hiding this comment

xylar commented Aug 16, 2022

matthewhoffman left a comment

Choose a reason for hiding this comment

xylar commented Aug 16, 2022

Switch `Step` attributes from `cores` to `ntasks` and `cpus_per_task` #413

Switch `Step` attributes from `cores` to `ntasks` and `cpus_per_task` #413

pep8speaks commented Jun 17, 2022 •

edited

Loading

xylar commented Jun 17, 2022 •

edited

Loading

xylar commented Jul 29, 2022 •

edited

Loading

trhille commented Jul 29, 2022 •

edited

Loading

trhille commented Jul 29, 2022 •

edited

Loading