Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compiler modifier for MAX_TASKS_PER_NODE, MAX_MPITASKS_PER_NODE, and mpirun arguments #2965

Closed
mrnorman opened this issue Jan 4, 2019 · 14 comments · Fixed by #4088
Closed
Assignees
Labels

Comments

@mrnorman
Copy link
Contributor

mrnorman commented Jan 4, 2019

On Summit, we have an issue where the PE layout will be different for GPU and CPU runs. CPU runs should use 84 tasks per node while GPU runs should use 36 or fewer. I'd like to use the compiler as the modifier for this, e.g.,

<MAX_TASKS_PER_NODE compiler="pgigpu">36</MAX_TASKS_PER_NODE>
<MAX_MPITASKS_PER_NODE compiler="pgigpu">36</MAX_MPITASKS_PER_NODE>
<MAX_TASKS_PER_NODE compiler="!pgigpu">84</MAX_TASKS_PER_NODE>
<MAX_MPITASKS_PER_NODE compiler="!pgigpu">84</MAX_MPITASKS_PER_NODE>

<mpirun mpilib="spectrum-mpi">
<!-- Use a helper script to tweak jsrun options -->
<executable>/gpfs/alpinetds/world-shared/csc190/e3sm/mpirun.summit</executable>
<!-- <executable>jsrun</executable> -->
<arguments compiler="pgigpu">
  <arg name="num_tasks"> -n {{ total_tasks }} -N $MAX_MPITASKS_PER_NODE -gpu </arg>
</arguments>
<arguments compiler="!pgigpu">
  <arg name="num_tasks"> -n {{ total_tasks }} -N $MAX_MPITASKS_PER_NODE </arg>
</arguments>
</mpirun>

It seems that CIME is not currently able to use a compiler modifier for these fields when parsing the XML file, but I think this will be necessary for CIME to support Summit's configurations.

@jedwards4b
Copy link
Contributor

Have you considered defining two different machines? summit and summit-gpu? I think that this might be a cleaner solution.

@mrnorman
Copy link
Contributor Author

mrnorman commented Jan 4, 2019

That's what I'm having to do now. But I don't want the user to have to specify -mach "summit" when they're already on summit. It would seem better to allow the compiler to control these things and leave the machine defaulting to "summit". It would avoid duplicating entries in the XML files as well.

@jedwards4b
Copy link
Contributor

That specification of machine only needs to happen at create_newcase time. In your proposed change you would have to specify a compiler option in the same location - so what would you gain? In the case of two machines I would presume summit would be the default and the machine only needs to be specified for summit-gpu.

@mrnorman
Copy link
Contributor Author

mrnorman commented Jan 4, 2019

You avoid large duplications in the XML files...

There's no need to litter things further with duplicate batch, compiler options, modules, and environment variables when most of that will be identical.

@jedwards4b
Copy link
Contributor

How much duplication is there? The compiler and compiler options are different, many of the modules will be different and I suspect many of the environment variables will be too.

@mrnorman
Copy link
Contributor Author

mrnorman commented Jan 4, 2019

I've already mentioned what's duplicated, and no the modules and environment variables are not that different. I should also mention a compiler modifier is normal for many other XML entries in the machine file. I don't think I'm requesting something foreign to current practice in the CIME infrastructure.

@jedwards4b
Copy link
Contributor

We have several machines with similar issues - cori and stampede, pleiades are three I can think of. The current practice is to have multiple machine definitions.

@mrnorman
Copy link
Contributor Author

mrnorman commented Jan 4, 2019

I think that's a poor practice. If it's simply too much work to add this functionality to CIME, I suppose I can understand that. But that should be the basis of discussion, not duplicating entire machine entries because of a different compiler choice that changes only a few entries in the machine file.

@jedwards4b
Copy link
Contributor

Not too much work, I am willing to yield but feel a need to argue for the current practice.

@jedwards4b
Copy link
Contributor

I think that it may work to put the compiler argument in the mpilib statement - have you tried that?

<mpirun mpilib="spectrum-mpi" compiler="pgigpu">

@mrnorman
Copy link
Contributor Author

mrnorman commented Jan 4, 2019

The code above should be the only things different between the GPU and CPU runs. For modules, we just need to add cuda for GPU runs. Otherwise, they're identical. I feel it will be easier to support and cleaner in the XML files not to duplicate the machine files. That situation may be different for other machines.

I'll give the mpirun modification a try rather than the arguments. Thanks for that suggestion. I suppose the max task entries are the only things that would need a change then.

@jgfouca jgfouca self-assigned this Jan 7, 2019
@jgfouca
Copy link
Contributor

jgfouca commented Jan 7, 2019

We also need to consider how to make it more clear which attribute "selectors" are available for which XML entries.

@mrnorman
Copy link
Contributor Author

Thanks Jim. Yeah, that would be helpful as well.

@jgfouca jgfouca assigned jhkennedy and unassigned jgfouca Apr 12, 2019
@rljacob rljacob assigned jgfouca and unassigned jhkennedy Oct 2, 2019
@rljacob
Copy link
Member

rljacob commented Oct 2, 2019

@jgfouca the ECP project needs this.

jgfouca added a commit that referenced this issue Sep 10, 2021
Change E3SM to use cmake macro file system

This is a big change for E3SM but should have no impact on other models and did not require big changes to CIME code.

This PR lays the groundwork for E3SM leaving the config_compilers.xml/BuildTools.configure behind for good, replacing that system with a Cmake-based cache file system that can, if needed, be used to generate a Makefile macro when needed (for some sharedlibs).

I will lay out detail documentation for this change when I create the corresponding PR in E3SM that activates these new macros.

Test suite: scripts_regression_tests (both with and without new macros in the host repo)
Test baseline:
Test namelist changes:
Test status: [bit for bit, roundoff, climate changing]

Fixes #3287
Fixes #3446
Fixes #3341
Fixes #2965

User interface changes?:

Update gh-pages html (Y/N)?:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants