Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-coupler option for multi-instance simulations #1704

Closed
gold2718 opened this issue Jun 23, 2017 · 11 comments
Closed

Add multi-coupler option for multi-instance simulations #1704

gold2718 opened this issue Jun 23, 2017 · 11 comments

Comments

@gold2718
Copy link

Currently, multi-instance simulations share a single coupler instance which must handle coupling for each instance in a serial fashion. In order to alleviate this bottleneck, @rmontuoro has implemented a modification allowing one coupler instance per model instance. In order to manage this feature from the CIME case control system, we need to specify the user interface (arguments to create_newcase and create_test along with any relevant XML variable changes) and also specify the supported semantics to allow for proper error checking.

@gold2718
Copy link
Author

gold2718 commented Jun 23, 2017

I see the following supported use cases for the multi-coupler:

  1. For single instance simulations, any multi-coupler optional argument would have no effect and therefore may be ignored, generate a warning, or, generate an error (I have no preference here).
  2. For multi-instance simulations with N instances and one coupler, every non-stub component must either be run with N instances, or with 1 instance. Currently, prognostic components must be run with N instances. The exception is any ESP component which is currently always only one instance.
  3. For multi-instance simulations with N instances and N couplers, every non-stub component must either be run with N instances. The exception is any ESP component which is currently always only one instance.

Are there any other use cases? @billsacks? @mvertens? @jedwards4b?

@jedwards4b
Copy link
Contributor

With respect to your item 3 ESP will currently have three instances - we would need to add code in the driver or the ESP component to ignore the call if instance > 1.

@gold2718
Copy link
Author

For a multi-coupler case, if there is a non-stub component which does not support multiple instances (e.g., Wave Watch?), an error MUST be generated (with the usual exception for ESP components which do not interact with the coupler). To me, this means that the case-control system must know whether the selected case components support multi-instance.

@jedwards4b
Copy link
Contributor

Doesn't this indicate a bug in the component interface? All components should work with multiple instances.

@gold2718
Copy link
Author

Yes, it can be considered a bug (for Wave Watch, ESP is specifically for models which work outside the coupled system) but I do not think we have the power to reject components which only support a single instance so I suppose that the options are to add in a check or fix Wave Watch before bringing in the multi-coupler functionality.

@jedwards4b
Copy link
Contributor

Both ERS_C3.T62_g16.CECO.cheyenne_intel.pop-ecosys and ERS_N3.T62_g16.CECO.cheyenne_intel.pop-ecosys
fail with error message ERROR: WW3 does not have multi instance functionality yet
from the ww3 buildnml script.

@billsacks
Copy link
Member

@gold2718 I don't know of any use cases other than the ones you mention. But, unless things have changed recently, I think this one actually has more stringent requirements

  1. For multi-instance simulations with N instances and one coupler, every non-stub component must either be run with N instances, or with 1 instance. The exception is any ESP component which is currently always only one instance.

If I remember correctly, all prognostic components must be run with N instances; all non-prognostic components must either be run with N instances or with 1 instance. (I'm not sure what happens if you try to run a prognostic component with 1 or some other number of instances right now.)

Regarding WW3: I recently had to introduce a B1850GWs compset simply for the sake of being able to test multi-instance capability in B1850G (@fischer-ncar is the one who pointed out this was needed). I guess we need to do something like this for other compsets that want to test NCK, _N3, _C3, etc.

@gold2718
Copy link
Author

That is better than nothing but it would be better if handled in the case-control system.

@billsacks
Copy link
Member

I agree - I was actually just about to edit my comment to say that.

@gold2718
Copy link
Author

WW3 may even end up getting fixed but as we move forward with more collaboration (read accepting more models into CESM), this issue is going to keep popping up.

@gold2718
Copy link
Author

@billsacks, yes, there is a special check for prognostic components (starting at line 1611 in cesm_comp_mod.F90). I have updated my list above to make that clear.

gold2718 pushed a commit that referenced this issue Sep 6, 2017
Implementation of multiple MCT drivers as an option for multi-instance simulations. If multi-instance is enabled, N drivers are run, each with one instance.

Also (changes not directly related to multi-driver):

    Changed interface of check_lockedfiles (check_lockedfiles.py) to take a case instead of a caseroot.
    Use case.get_env instead of EnvBuild in check_lockedfiles.py
    Changed check_case (case_submit.py) to not take a caseroot input.
    Cleaned up memleak testing in _check_for_memleak (system_tests_common.py)
    Fixed bad format in build_xcpl_nml (buildnml.py)

Test suite: scripts_regression_tests.py
Test baseline: NA
Test namelist changes: NA
Test status: bit for bit
Fixes: #1704
Fixes: #1714

User interface changes?: new --multi-driver option to create_newcase and _C# modifier to tests

Update gh-pages html (Y/N)?: Y

Code review: @gold2718
@ghost ghost removed the in progress label Sep 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants