Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIME V6 Planning! #3886

Open
jgfouca opened this issue Mar 17, 2021 · 25 comments
Open

CIME V6 Planning! #3886

jgfouca opened this issue Mar 17, 2021 · 25 comments

Comments

@jgfouca
Copy link
Contributor

jgfouca commented Mar 17, 2021

CIME V6 Planning

Let's use this issue to discuss CIME 6 plans. Once we have solidified a feature list, I can look at creating and organizing related issues. Feel free to assign anyone or edit this document. None of the content of this document is official until the team signs off; this is just me organizing my thoughts and getting the discussion started.

Background

In the last couple major CIME versions, the system transitioned from a loose collection of Perl and csh scripts, to a somewhat more centralized Perl system, and finally to a highly centralized Python system. We have made excellent progress in the areas of system cohesion, software design/engineering, performance, robustness, and testing while expanding capabilities. CIME also went from being closely coupled to CESM to being a more-independent infrastructure package supporting several climate models.

On the less-positive side, some of this progress came at the expense of added complexity and less transparency for users. I've had numerous interactions with users over the years where it was clear they were not happy with their CIME experience. This could be selection bias since happy users are usually quieter than unhappy ones, so it's hard to say if users are unhappier overall than they were before these centralization efforts.

Current Situation

CIME is in a decent state but I think with some renewed energy and the freedom to break backwards compatibility, we can make it even better.

On the E3SM side of things, we have new developer resources in @WesCoomber (.4 FTE) and @jasonb5 (1.0 FTE, full time!) which I want to take full advantage of. I want Wes and Jason to be excited to do CIME development, enthusiastic about (and contributing to) the vision for the project, and eventually stepping up to fill my role as E3SM's main CIME resource. Between me, Rob, Wes, and Jason, I think this is the most that E3SM has ever had invested in CIME and this is one of the main reasons for me creating this document.

In general, transformative changes to CIME have slowed down a bit in the last couple years. The reasons for this are many: the system is naturally maturing, the resources for core CIME development have been limited (especially on the E3SM side, just me and @rljacob have had longterm commitment to CIME), limitations in testing have made us afraid to break each other (more on this later), and changing behavior / breaking backwards incompatibility is very painful in a production system.

The meta goals of V6 will be to accelerate the next rounds of transformative CIME changes while getting our new developers invested in CIME and improving the user experience.

Technical goals

Solidify role of namelists in the system and how model components should interact with them

We've have some discussion of this in the past here: #1278

Namelists are input files for the model execution so, in theory, we should be able to do nml generation once at the start of the RUN phase and that's it. What happens in practice is that namelists are generated during SETUP, BUILD, SUBMIT, and RUN phases. Back when generating namelists was computationally expensive, this was a significant performance bottleneck for the case control system. Now that it's pretty fast, the problem is more of an issue of technical debt and unnecessary complexity.

During my work on the build system, it quickly became clear that most of our major components (like eam (our atm) and elm (our lnd)) were using buildnml as a sort of pre-build configuration step. See this diagram:

before_cmake

The components know that CIME promises to call buildnml before the BUILD phase, so they took the opportunity to have buildnml setup the key Filepath and CCSM_cppdefs files that the build system depends on even though these files don't have anything to do with namelists. In a sense, buildnml became a catch-all for all pre-build component-specific setup. Adding to the problem is that, for our bigger components, the buildnml and configure (the script the generates Filepath and CCSM_cppdefs) scripts have become multi-thousand-line piles of Perl and so it's difficult to see exactly what they doing. At a glance, our configure scripts do not seem to be accessing or modifying namelists directly, so they appear to be decouple-able from buildnml.

Proposal

At the very least, configure should be decoupled from buildnml and integrated into the BUILD phase since that's what it's for. On the E3SM side, we could even integrate it into our CMake system since that's what CMake is for, configuring your build. This should allow us to remove the buildnml calls from both the SETUP and BUILD phases. If any component is using namelists to store/maintain general case data, that is a violation of their contract with CIME and those instances should be immediately changed to use the env XML system instead. If any components need custom setup actions, we should provide an extension point for that in CIME that is not buildnml; something like $component/cime_config/setup.

Additional investigation is needed to see if about the buildnml class in SUBMIT and RUN. I think it's possible the one in SUBMIT can be removed without too much difficulty as well.

Build system

Some of this effort will likely be specific to E3SM only since we already diverged significantly from the classic CIME build system when we went to a CMake-based system two years ago.

Related issues:
#3446
#3341
#3287

I'd like to continue to reorganize and unify the build system around CMake. The first part of this will be refactoring how we handle sharedlibs which currently require lots of special handling in CIME. The SHAREDLIB_BUILD phase of the case-control system iterates over a list of sharedlibs that it thinks the case needs and calls cime/src/build_scripts/buildlib.$libname. What happens then varies greatly between sharedlibs. Some leverage the classic CIME scripts/Tools/Makefile, some have CMake, some have their own Makefiles. It would be nice if we could have all the shared lib builds use CMake under-the-hood. That way, we could begin to unwind some of the complexity of our various systems for managing compilation settings and just put all that stuff in CMake directly. There's a significant amount of complexity in the code that generates Macro files because it needs to support multiple build system languages (Make and CMake) and language-neutral config_compilers.xml. If the build system was fully cmake-ified , we could potentially just write all the compiler/flag stuff in hierarchical cmake cache files. Ideally, I think all the info in the Depends files, config_compilers.xml, and hardcoded stuff in CMakeLists.txt could all be nicely encapsulated in a cmake cache file system. This would remove layers of CIME magic between users and their compilation settings.

Another very important topic is thinking about how to reduce the amount of building that goes on when doing test suites. The default behavior is to do a full build for every case. E3SM test suites offer the ability to mark a suite as shared build but this feature is not yet widely used and is use-at-your-own-risk. It would be interesting to see if the sharedlib system could be expanded to work for components.

Proposal

  1. A deep-dive into CMake-ifying the sharedlibs similar to the CMake deep-dive that was done for the components two years ago
  2. A consolidation of flag/compiler settings into a single system, preferably a hierarchical cmake cache system..
  3. Sharedlibs are not current shared across cases for E3SM. That needs to change.
  4. Investigate how to further reduce amount of time spent building cases when running test suites

XML env database

Relevant issues:
#2161
#3338
#2965

We've made good progress on CIME's env XML "database", especially in the areas of robustness, performance/caching, and encapsulating python's XML ElementTree. I think more progress can be made in formalizing the guarantees/invariants of the system, further standardization of syntax across env xml files, expanding and standardizing the attribute selector concept, and all-around simplification.

Even as co-author of the XML system, I often get a bit lost in our XML code because the execution path, even for fairly simple actions like get_value, is so complex. I'd like for a developer to do a deep dive into CIME/XML/*.py and try to find sources of complexity and potential remedies.

Proposal

  1. Env XML database system guarantees, invariants, restrictions, etc are well-documented
  2. Users should be able to use attribute selectors on any field using any previously defined field as a selector
  3. Deep-dive for complexity reduction in implementation of the system

Testing

Relevant issues:
#2521

The problem is nicely described in that issue, so I won't repeat too much here. My hunch is that the best thing to do would be to expand upon the GitHub CI system to cover more system, model, and compiler combinations. We've had great results doing this for my other project SCREAM using Jenkins and autotester.

Test scheduling

We currently have create_test/test_scheduler.py that works fine if you are on the machine where you want to run. Can things be improved by using CWL?

Code / repo layout

Relevant issues:
#3432
#3393

Drop support for python2, let's move to requiring a newish version of python3. I'd like to be able to use python's latest string formatting syntax, pathlib, and modern concurrency libraries.

Complete the separation of fortran science packages (cpl/drv data models, etc) by model. I think we have an issue for this but I couldn't find it.

It would be nice to be able to do model-specific extensions to CIME without touching the CIME repo. Things like model-specific provenance should be modifiable from the host repo.

Finally, I got beat up pretty badly in an E3SM all-hands a few years ago for the non-standard, non-"pythonic" organization of CIME's python code tree under CIME/scripts. This was causing problems for developers who use python IDEs and confusion with importing , PYTHONPATH etc. This should be pretty easy to clean up, so I think it's worth addressing even if most of us are using text editors to develop CIME. Potentially even look into integrating CIME within the Python ecosystem, PIP, anaconda, etc.

CIME development process

As I look through our open issues, I see lots of old issues falling through the cracks, including bug reports and other items that look high priority. We occasionally go through open issues during our Wed meetings but that is time consuming and not much fun. I don't have any concrete proposal to deal with this, but it seems like we need some additional mechanisms for organizing, prioritizing, and shepherding tickets. It would be ideal if we could achieve this without additional meeting overhead.

@billsacks
Copy link
Member

Thanks for opening this @jgfouca ! I didn't get a chance to read it before our meeting yesterday, but just did. These are great ideas, so thanks for thinking this through and articulating them so well!

@jedwards4b
Copy link
Contributor

Regarding the non-"pythonic" comments. We really need to tell people to submit actionable issues on this. I tried an IDE this morning "PyCharm" and it took me a few minutes of reading documentation and configuring the environment to get it working properly. This required no modifications to the cime source code.

@jgfouca
Copy link
Contributor Author

jgfouca commented Apr 28, 2021

@jedwards4b , I looked all over for a github issue related to the code structure and was not able to find out. I am reaching out to @jhkennedy to see if I can get some concrete information.

@billsacks
Copy link
Member

@jedwards4b , I looked all over for a github issue related to the code structure and was not able to find out. I am reaching out to @jhkennedy to see if I can get some concrete information.

And I just searched the meeting notes and couldn't find any reference to this discussion.

@jhkennedy
Copy link
Contributor

Importantly, It's been ~18 months since I've looks at CIME in any detail, so anything I have to contribute right now might be rather outdated.


I do remember the conversation, and here "pythonic" is not so much about the code itself but how someone who's familiar with the scientific python ecosystem would expect CIME to be structured. For example

  • "CIME", in terms of the user interface is really everything contained in cime/scripts/ and the core python package is buried in cime/scripts/lib/CIME.
    • I would expect that CIME directory to be a top level (at the repository root) package
  • "Installation" requires adding a bunch of directories to the PYTHONPATH and what those directories are isn't (wasn't) terribly clear
    • I'd expect a setup.py file at the repository root and pip (or conda) would handle all of that for me
  • the user user interface is a collection of scripts that get called directly
    > cd cime/scripts
    > ./create_newcase --case mycase --compset X --res f19_g16
    > cd mycase
    > ./case.setup
    > ./case.build
    > ./case.submit
    
    • I would expect these to be provided as entrypoints inside a setup.py and the code itself contained inside CIME
    • Additionally, these could be all coalesced into a CLI that would look like:
    cime create ...
    cime setup ... 
    cime build ...
    cime submit ...
    
    which leads to much better portability (it doesn't matter where you run things from; help is available directly)

On the code side, CIME is highly abstracted OOP but a lot of the abstraction are just pass-throughs and many of the objects/interfaces could be condensed/simplified/refactored (that's not a criticism just a possible future refactor target).


Overall, much of this is "tech-debt" or design legacy (you can "see" the sh --> perl --> python conversion), but tackling it would have made development a bit easier/faster for me.

@xylar may or may not have additional comments in this regard.

@xylar
Copy link

xylar commented Apr 29, 2021

I would echo everything that @jhkennedy has said. As a python developer, these are the things that made CIME feel really inaccessible to me when I tried to debug issues I had run into years ago.

If you do decide to move toward a more conventional python package design, I would be more than happy to beta test and provide whatever assistance I can.

@jedwards4b
Copy link
Contributor

@xylar , @jhkennedy Thank you for taking the time to respond. This is quite helpful.

@billsacks
Copy link
Member

@jhkennedy and @xylar - thanks a lot for sharing these thoughts. This is very helpful.

I am not well-versed in standard python package organization, so I'm asking the following truly out of curiosity / naivety:

  1. Regarding directory structure: most examples / standards seem based around the assumption that your repository is a python package. Are there different recommendations (if any) when your repository is a mix of a python package and some other stuff? (After the upcoming cime split, cime will be close to being just a python package, though there will still be a bit of other stuff; but up until now, there has been a lot more to cime than just the python. I'm also asking this a bit for the sake of other projects, such as CTSM, where I put the python in its own subdirectory at the top level: https://github.com/ESCOMP/CTSM/tree/master/python/ctsm)

  2. Regarding "installation": It seems like a lot of what has driven cime to be non-standard is that we don't expect users to install cime and then use it. Instead, it is common for a given user to be working with multiple versions of CESM/E3SM at once, each with its own version of cime. It is important that, if you run create_newcase (etc.) from a given CESM/E3SM checkout, that it use the cime version in that checkout. Moreover, once you have created a case, it is important that any cime scripts (case.setup, xmlchange, etc.) run from that case use the cime version contained in the appropriate model checkout on disk. It hasn't been obvious to me if something like that is possible using standard pythonic methods.

  3. Somewhat connected with (2): the distinction between user and developer seems more blurred with cime than with many python packages. I think this was another push towards a model of "use whatever code exists right here on disk" rather than "pip install cime then use the installed version". Again, I don't know how much (if at all) this makes it harder to use standard pythonic methods.

Again, I'm not trying to defend the status quo: I have truly felt lost when thinking of how we could handle these things in a more pythonic way.

@xylar
Copy link

xylar commented Apr 29, 2021

Regarding directory structure: most examples / standards seem based around the assumption that your repository is a python package. Are there different recommendations (if any) when your repository is a mix of a python package and some other stuff? (After the upcoming cime split, cime will be close to being just a python package, though there will still be a bit of other stuff; but up until now, there has been a lot more to cime than just the python. I'm also asking this a bit for the sake of other projects, such as CTSM, where I put the python in its own subdirectory at the top level: https://github.com/ESCOMP/CTSM/tree/master/python/ctsm)

In such cases, I have typically seen a subdirectory (often called python) that looks exactly like a typical python package. That directory would contain a setup.py for installing the package and a CIME directory (but python has a strong preference for all lowercase, so cime) that contains the actual package.

Regarding "installation": It seems like a lot of what has driven cime to be non-standard is that we don't expect users to install cime and then use it. Instead, it is common for a given user to be working with multiple versions of CESM/E3SM at once, each with its own version of cime. It is important that, if you run create_newcase (etc.) from a given CESM/E3SM checkout, that it use the cime version in that checkout. Moreover, once you have created a case, it is important that any cime scripts (case.setup, xmlchange, etc.) run from that case use the cime version contained in the appropriate model checkout on disk. It hasn't been obvious to me if something like that is possible using standard pythonic methods.

The standard way to handle this would be to have cime releases (possibly quite frequently). Each could have different dependencies on other python packages, with constraints on what versions it is compatible with. (I think this is an issue that @jhkennedy was struggling with -- there is not currently a good way for cime to have dependencies.) Ideally, a python (e.g. conda) environment should be created with the appropriate cime version installed in it, and building a case would involve activating the environment with the appropriate cime version.

A less desirable alternative is to continue to have cime as a submodule, in which case cime would not be installed and would instead need to be referenced locally with python -m cime create, etc. This doesn't seem to me to have any clear advantages over the current approach, so I think if cime is to become a proper python package, it really would need to be installed, not treated as a submodule.

Somewhat connected with (2): the distinction between user and developer seems more blurred with cime than with many python packages. I think this was another push towards a model of "use whatever code exists right here on disk" rather than "pip install cime then use the installed version". Again, I don't know how much (if at all) this makes it harder to use standard pythonic methods.

I think most E3SM developers would not be cime developers so I somewhat disagree with this premise. I work with tons of python packages that depend on other python packages I develop. Let's say that you are simultaneously developing CESM and cime. Rather than installing cime, you can create a symlink to the cime directory that is the python package in the cime branch you are developing within the CESM branch you are developing (wherever you would like to run cime from). Then, you can run python -m cime create .... This will use the local symlink first before looking in the python environment to see if cime exists there.

I realize the transition to making cime a more standard python package is going to have steep learning curve for the team, but it will mean that experience that folks bring from other python work will actually be applicable to cime. Currently, it is essentially its own foreign language.

@billsacks
Copy link
Member

Thanks a lot for your added thoughts @xylar . I don't have any substantive immediate replies, but I think a lot boils down to this:

I realize the transition to making cime a more standard python package is going to have steep learning curve for the team, but it will mean that experience that folks bring from other python work will actually be applicable to cime. Currently, it is essentially its own foreign language.

For better or worse, CESM's (and by extension, E3SM's) way of doing things evolved over many years of experience that included multiple changes in the scripting language. The result is something that I personally feel fits the needs of CESM/E3SM very well, but looks different from what we would have gotten if we had started out by saying, "Let's develop a python(ic) package." So what it may come down to is deciding if we want to more fully embrace the pythonic way even if it means users need to adjust, or if we maintain the status quo from the user perspective even if it means a bigger adjustment for developers coming from other python libraries. I don't know the answer but this is some good food for thought. Thanks.

@jhkennedy
Copy link
Contributor

jhkennedy commented Apr 29, 2021

Regarding directory structure: most examples / standards seem based around the assumption that your repository is a python package. Are there different recommendations (if any) when your repository is a mix of a python package and some other stuff? (After the upcoming cime split, cime will be close to being just a python package, though there will still be a bit of other stuff; but up until now, there has been a lot more to cime than just the python. I'm also asking this a bit for the sake of other projects, such as CTSM, where I put the python in its own subdirectory at the top level: https://github.com/ESCOMP/CTSM/tree/master/python/ctsm)

In such cases, I have typically seen a subdirectory (often called python) that looks exactly like a typical python package. That directory would contain a setup.py for installing the package and a CIME directory (but python has a strong preference for all lowercase, so cime) that contains the actual package.

Yeah, I agree a python directory that contains the python bindings is common and how you have CTSM setup @billsacks is 👍 . Another option that's common is to call the CTSM python bindings pyctsm and keep it at the top level, but that can get cluttered depending on what else is going on in the repo -- for CTSM there's already quite a bit at the root so I'd also likely have put it in a python directory.

Where to put it (right at the root, or in a python directory) depends on what's the "main" thing in the repository. For CIME, the main thing is the CIME python tools (IMO) so having the cime package right at top makes sense. For CTSM, the main thing is the model, and the python bindings are a great add-on, so a python directory makes sense.
Regarding "installation": It seems like a lot of what has driven cime to be non-standard is that we don't expect users to install cime and then use it. Instead, it is common for a given user to be working with multiple versions of CESM/E3SM at once, each with its own version of cime. It is important that, if you run create_newcase (etc.) from a given CESM/E3SM checkout, that it use the cime version in that checkout. Moreover, once you have created a case, it is important that any cime scripts (case.setup, xmlchange, etc.) run from that case use the cime version contained in the appropriate model checkout on disk. It hasn't been obvious to me if something like that is possible using standard pythonic methods.

The standard way to handle this would be to have cime releases (possibly quite frequently). Each could have different dependencies on other python packages, with constraints on what versions it is compatible with. (I think this is an issue that @jhkennedy was struggling with -- there is not currently a good way for cime to have dependencies.) Ideally, a python (e.g. conda) environment should be created with the appropriate cime version installed in it, and building a case would involve activating the environment with the appropriate cime version.

A less desirable alternative is to continue to have cime as a submodule, in which case cime would not be installed and would instead need to be referenced locally with python -m cime create, etc. This doesn't seem to me to have any clear advantages over the current approach, so I think if cime is to become a proper python package, it really would need to be installed, not treated as a submodule.

I agree @billsacks those a real concerns and hard to deal with. Moving to a more standard python package and installation strategy provides a lot of user and developer benefit (IMO) and "solves" the problem of allowing external dependencies (i.e., #2056, and needing a separate workflow for the climate reproduciblity tests). Truly separated and distributed will require either uncoupling that tight dependency or providing a way for CIME to verify correct model/version pairing [Edit: The models really should do this, not CIME]. Both of those may be hard to do.

An alternative, or interim solution while decoupling, would be to keep it as a submodule but move to the python package structure and have users install an "editable"/"develop" version of the code. So, the first time a user checks out CIME (or any model it's bound to), they'd setup an environment

  • Using conda envs (recommended; pretty common for most users already and handles many kinds of dependencies)
    git clone [email protected]:ESMCI/cime.git
    cd cime
    conda env create -f ./environment.yml
    python -m pip install -e .
    
  • Using python virtual environments (generally only for pure python dependencies)
    git clone [email protected]:ESMCI/cime.git
    cd cime
    python -m venv env
    python -m pip install -e .
    

The editable install only links to CIME code in the repo and so stays in sync when you checkout new versions/branches. The only caveat there is that adding/removing dependencies, adding/removing new entrypoints, and/or changing the package structure will require you to update the environment/install. That's easily handled when switching by doing

  • Using conda envs
    git switch [BRANCH]  # or git pull....
    conda env update -f ./environment.yml
    python -m pip install -e .
    
  • Using python virtual environments
    git switch [BRANCH]  # or git pull...
    python -m pip install -e .
    

That gives you:

  • typical python package structure
  • ability to install
    • true CLI or script availability anywhere without haing to change PATH/PYTHONPATH
    • dependency management

While still maintaining the submodules.

Somewhat connected with (2): the distinction between user and developer seems more blurred with cime than with many python packages. I think this was another push towards a model of "use whatever code exists right here on disk" rather than "pip install cime then use the installed version". Again, I don't know how much (if at all) this makes it harder to use standard pythonic methods.

This distinction kinda falls out of coupling it as a submodule -- until you can decouple them users will have to interact with it like developers (the steps I propose above are how a developer will typically interact with a python package).

Generally, that's the same assumption for the models it's bound to; very few users (as far as I am aware) is just using pre-built versions of the code that could be distributed and then leverage similar dependency management (e.g., specifying a CIME version).

Really, the model's it's bound to could provide the environment.yml/requirements.txt file that specifies the correct version of CIME to use (and it can be distributed) and users would just have to create the environment and update it...

@jgfouca
Copy link
Contributor Author

jgfouca commented Apr 29, 2021

@xylar , @jhkennedy , this is excellent stuff, thank you.

@xylar
Copy link

xylar commented Apr 29, 2021

I guess I would add one more philosophical point, since this has come up on our own work on MPAS-Ocean and MALI. As new developers come onboard, you can take the time to teach them a system that is unique to your project and which won't extend to anything else they might use in their careers, or you can conform your design to broader standards so that new developers will learn broad skills in the process of learning about your system. I have taken about 7 months to rewrite a testing infrastructure (COMPASS) that we use for ocean and land-ice test cases as a proper python package using object-oriented programming. It used to be a collection of impenetrable scripts that used XML files to write other python scripts and XML file. Now, I feel confident that new developers that learn how to extend the compass package will also learn skills that apply to python development more broadly, not just the esoteric system that was the legacy version of COMPASS.

I feel like the same philosophy could apply to CIME. If you are developing for the current developers, you may do what you like. If you want to attract new developers, I think a change is needed.

@billsacks
Copy link
Member

Thanks a lot for these additional thoughts @jhkennedy and @xylar . The points about allowing easier management of python dependencies (e.g., netcdf, yaml) and Xylar's latest philosophical point seem especially important for us to consider.

@jhkennedy
Copy link
Contributor

jhkennedy commented Apr 29, 2021

I'd also add along @xylar 's philosophical point that there is really only so much context a developer can hold in their brain at a time. So things like simple interfaces, following community best practices (pep8), and common structures really reduce the context load for a developer. Everything that's normal is effectively "contextless" -- I interact with it just like everything else and it's routine. When you have to "switch" structure/styles that's extra context I have to dig up and remember to interact with.

For a core/active developer, that's fine once you learn it, but a casual/infrequent developer will have to reload/relearn all the context each time they come in. So, following the community where it makes sense too is a good way to make a code base more approachable and encourage more contributions (that is, making it easy to do so).

@jasonb5
Copy link
Collaborator

jasonb5 commented Apr 29, 2021

Coming in as a new developer here are some things I ran into, most have already been mentioned.

  • Expected CIME to be top-level
  • Couldn't pip install ... or python setup.py install ...
  • Locating tests scripts/lib/CIME/tests and scripts/tests
    • Non-standard test naming so tools like pytest or nose don't pick up tests automatically
  • Running code without the main entrypoints requires PYTHONPATH or adjusting sys.path
    • Ran into this alot when using pdb or breakpoint()

@sarats
Copy link
Contributor

sarats commented Apr 30, 2021

Apologies for the interruption.

I would like to add an user's (dare I say power-user) perspective.

What are the big user-facing capabilities and changes that are planned? I may have missed the roadmap, so pointers are appreciated. If there are none/few, that's okay too as this is a mature product. On a related note, has there been any exercise in gathering user input regarding features?

CIME is a critical piece of software that has a substantial user base. I would advocate for "design for least surprise" as one of the top priorities. A steep learning curve for users because of any new changes should be a last resort.

As the adage goes, happy users don't complain. There is truth to that and in the case of E3SM, many hide the complexity of CIME behind a run-script. However, when things go wrong, these users still need to grasp an overview of how CIME works. Good documentation (existing docs help with room for improvement) goes a long way. So whatever changes would ease the development process or attract new developers, one should still aim for a clarity in design that can still be articulated to end-users.

One of the pain points usually happens whenever one has to port CIME to a new architecture or platforms. Having done that myself on Summit including adding new compiler families and batch systems, I learned a bit about CIME. As the Perl motto, "There's more than one way to do it", there are certainly a few ways to hack together a CIME port and it might be helpful to incorporate best practices or steer towards one true way in the docs.

Minor note: As an user, I don't rely on conda/pip environments to work with CIME on large DOE supercomputers due to their custom environments.

CIME internally could be refactored or dramatically change. I just hope for a little user-facing stability (not hindering progress) while doing so.

On the E3SM front, I would urge a little care in closer integration with CMake. We should certainly replace some of the older Perl with CMake logic etc. Again, I would advocate for clarity in lieu of reliance on "magic". Just avoiding the need for users to come back to a CIME developer or a "CMake expert" when they run into a build issue.

@kvrigor
Copy link

kvrigor commented Aug 13, 2021

I hope I could still share some thoughts even if it's been months back since the last activity on this thread.

We have a fork of CTSM (based on branch release-clm5.0) called eCLM. It does not use CIME's build and model setup scripts; instead it uses pure CMake for build and it generates the namelists via a small set of Python scripts. Our fork is still far from being production ready, but the build and namelist generator scripts are functional. Feel free to copy the eCLM scripts and adapt it to your purposes.

The complexity of CIME had also motivated us to develop a standardized approach to build and model setup. I can strongly relate to @jhkennedy and @xylar account of CIME's flaws. My vague notion of simplicity is if our users (e.g. grad students, scientists) who don't necessarily have software engineering background could study the internals of a system on their own; i.e. it has some kind of "hackable" quality to it. Through this mindset we develop the tooling around our CTSM fork.

@jgfouca
Copy link
Contributor Author

jgfouca commented Jul 6, 2022

Progress update.

Namelists

Minimal progress has been made here.

Build system

config_compilers.xml has been replaced with cmake macros, so

A consolidation of flag/compiler settings into a single system, preferably a hierarchical cmake cache system..

is done.

XML env db system

We've expanded the fields that support selectors, so

Users should be able to use attribute selectors on any field using any previously defined field as a selector

is at least partially done.

Testing

Test organization and the CI GitHub testing have been significantly improved.

Code / repo layout

Layout has been significantly improved and "pythonized". We continue to move model-specific stuff out of CIME so the models can customize their extension points without having to do PRs to CIME.

CIME development process

No progress here.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label May 14, 2023
@jgfouca jgfouca removed the Stale label May 15, 2023
@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jun 15, 2023
@ekluzek ekluzek removed the Stale label Jun 15, 2023
@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Dec 20, 2023
@rljacob rljacob assigned ekluzek and unassigned agsalin and WesCoomber Dec 20, 2023
@ekluzek
Copy link
Contributor

ekluzek commented Dec 21, 2023

@rljacob did you really mean to assign this to me? Or someone else? It does look like this should still be dealt with, but it wouldn't be me that works on it. It should stay open with the Low Priority label now.

@jgfouca
Copy link
Contributor Author

jgfouca commented Dec 21, 2023

@ekluzek , this is only a discussion ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests