-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add compiler modifier for MAX_TASKS_PER_NODE, MAX_MPITASKS_PER_NODE, and mpirun arguments #2965
Comments
Have you considered defining two different machines? summit and summit-gpu? I think that this might be a cleaner solution. |
That's what I'm having to do now. But I don't want the user to have to specify -mach "summit" when they're already on summit. It would seem better to allow the compiler to control these things and leave the machine defaulting to "summit". It would avoid duplicating entries in the XML files as well. |
That specification of machine only needs to happen at create_newcase time. In your proposed change you would have to specify a compiler option in the same location - so what would you gain? In the case of two machines I would presume summit would be the default and the machine only needs to be specified for summit-gpu. |
You avoid large duplications in the XML files... There's no need to litter things further with duplicate batch, compiler options, modules, and environment variables when most of that will be identical. |
How much duplication is there? The compiler and compiler options are different, many of the modules will be different and I suspect many of the environment variables will be too. |
I've already mentioned what's duplicated, and no the modules and environment variables are not that different. I should also mention a compiler modifier is normal for many other XML entries in the machine file. I don't think I'm requesting something foreign to current practice in the CIME infrastructure. |
We have several machines with similar issues - cori and stampede, pleiades are three I can think of. The current practice is to have multiple machine definitions. |
I think that's a poor practice. If it's simply too much work to add this functionality to CIME, I suppose I can understand that. But that should be the basis of discussion, not duplicating entire machine entries because of a different compiler choice that changes only a few entries in the machine file. |
Not too much work, I am willing to yield but feel a need to argue for the current practice. |
I think that it may work to put the compiler argument in the mpilib statement - have you tried that?
|
The code above should be the only things different between the GPU and CPU runs. For modules, we just need to add cuda for GPU runs. Otherwise, they're identical. I feel it will be easier to support and cleaner in the XML files not to duplicate the machine files. That situation may be different for other machines. I'll give the mpirun modification a try rather than the arguments. Thanks for that suggestion. I suppose the max task entries are the only things that would need a change then. |
We also need to consider how to make it more clear which attribute "selectors" are available for which XML entries. |
Thanks Jim. Yeah, that would be helpful as well. |
@jgfouca the ECP project needs this. |
Change E3SM to use cmake macro file system This is a big change for E3SM but should have no impact on other models and did not require big changes to CIME code. This PR lays the groundwork for E3SM leaving the config_compilers.xml/BuildTools.configure behind for good, replacing that system with a Cmake-based cache file system that can, if needed, be used to generate a Makefile macro when needed (for some sharedlibs). I will lay out detail documentation for this change when I create the corresponding PR in E3SM that activates these new macros. Test suite: scripts_regression_tests (both with and without new macros in the host repo) Test baseline: Test namelist changes: Test status: [bit for bit, roundoff, climate changing] Fixes #3287 Fixes #3446 Fixes #3341 Fixes #2965 User interface changes?: Update gh-pages html (Y/N)?:
On Summit, we have an issue where the PE layout will be different for GPU and CPU runs. CPU runs should use 84 tasks per node while GPU runs should use 36 or fewer. I'd like to use the compiler as the modifier for this, e.g.,
It seems that CIME is not currently able to use a compiler modifier for these fields when parsing the XML file, but I think this will be necessary for CIME to support Summit's configurations.
The text was updated successfully, but these errors were encountered: