-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CIME V6 Planning! #3886
Comments
Thanks for opening this @jgfouca ! I didn't get a chance to read it before our meeting yesterday, but just did. These are great ideas, so thanks for thinking this through and articulating them so well! |
Regarding the non-"pythonic" comments. We really need to tell people to submit actionable issues on this. I tried an IDE this morning "PyCharm" and it took me a few minutes of reading documentation and configuring the environment to get it working properly. This required no modifications to the cime source code. |
@jedwards4b , I looked all over for a github issue related to the code structure and was not able to find out. I am reaching out to @jhkennedy to see if I can get some concrete information. |
And I just searched the meeting notes and couldn't find any reference to this discussion. |
Importantly, It's been ~18 months since I've looks at CIME in any detail, so anything I have to contribute right now might be rather outdated. I do remember the conversation, and here "pythonic" is not so much about the code itself but how someone who's familiar with the scientific python ecosystem would expect CIME to be structured. For example
On the code side, CIME is highly abstracted OOP but a lot of the abstraction are just pass-throughs and many of the objects/interfaces could be condensed/simplified/refactored (that's not a criticism just a possible future refactor target). Overall, much of this is "tech-debt" or design legacy (you can "see" the sh --> perl --> python conversion), but tackling it would have made development a bit easier/faster for me. @xylar may or may not have additional comments in this regard. |
I would echo everything that @jhkennedy has said. As a python developer, these are the things that made CIME feel really inaccessible to me when I tried to debug issues I had run into years ago. If you do decide to move toward a more conventional python package design, I would be more than happy to beta test and provide whatever assistance I can. |
@xylar , @jhkennedy Thank you for taking the time to respond. This is quite helpful. |
@jhkennedy and @xylar - thanks a lot for sharing these thoughts. This is very helpful. I am not well-versed in standard python package organization, so I'm asking the following truly out of curiosity / naivety:
Again, I'm not trying to defend the status quo: I have truly felt lost when thinking of how we could handle these things in a more pythonic way. |
In such cases, I have typically seen a subdirectory (often called
The standard way to handle this would be to have A less desirable alternative is to continue to have
I think most E3SM developers would not be I realize the transition to making |
Thanks a lot for your added thoughts @xylar . I don't have any substantive immediate replies, but I think a lot boils down to this:
For better or worse, CESM's (and by extension, E3SM's) way of doing things evolved over many years of experience that included multiple changes in the scripting language. The result is something that I personally feel fits the needs of CESM/E3SM very well, but looks different from what we would have gotten if we had started out by saying, "Let's develop a python(ic) package." So what it may come down to is deciding if we want to more fully embrace the pythonic way even if it means users need to adjust, or if we maintain the status quo from the user perspective even if it means a bigger adjustment for developers coming from other python libraries. I don't know the answer but this is some good food for thought. Thanks. |
Yeah, I agree a
I agree @billsacks those a real concerns and hard to deal with. Moving to a more standard python package and installation strategy provides a lot of user and developer benefit (IMO) and "solves" the problem of allowing external dependencies (i.e., #2056, and needing a separate workflow for the climate reproduciblity tests). Truly separated and distributed will require either uncoupling that tight dependency or providing a way for An alternative, or interim solution while decoupling, would be to keep it as a submodule but move to the python package structure and have users install an "editable"/"develop" version of the code. So, the first time a user checks out CIME (or any model it's bound to), they'd setup an environment
The editable install only links to CIME code in the repo and so stays in sync when you checkout new versions/branches. The only caveat there is that adding/removing dependencies, adding/removing new entrypoints, and/or changing the package structure will require you to update the environment/install. That's easily handled when switching by doing
That gives you:
While still maintaining the submodules.
This distinction kinda falls out of coupling it as a submodule -- until you can decouple them users will have to interact with it like developers (the steps I propose above are how a developer will typically interact with a python package). Generally, that's the same assumption for the models it's bound to; very few users (as far as I am aware) is just using pre-built versions of the code that could be distributed and then leverage similar dependency management (e.g., specifying a CIME version). Really, the model's it's bound to could provide the |
@xylar , @jhkennedy , this is excellent stuff, thank you. |
I guess I would add one more philosophical point, since this has come up on our own work on MPAS-Ocean and MALI. As new developers come onboard, you can take the time to teach them a system that is unique to your project and which won't extend to anything else they might use in their careers, or you can conform your design to broader standards so that new developers will learn broad skills in the process of learning about your system. I have taken about 7 months to rewrite a testing infrastructure (COMPASS) that we use for ocean and land-ice test cases as a proper python package using object-oriented programming. It used to be a collection of impenetrable scripts that used XML files to write other python scripts and XML file. Now, I feel confident that new developers that learn how to extend the I feel like the same philosophy could apply to CIME. If you are developing for the current developers, you may do what you like. If you want to attract new developers, I think a change is needed. |
Thanks a lot for these additional thoughts @jhkennedy and @xylar . The points about allowing easier management of python dependencies (e.g., netcdf, yaml) and Xylar's latest philosophical point seem especially important for us to consider. |
I'd also add along @xylar 's philosophical point that there is really only so much context a developer can hold in their brain at a time. So things like simple interfaces, following community best practices (pep8), and common structures really reduce the context load for a developer. Everything that's normal is effectively "contextless" -- I interact with it just like everything else and it's routine. When you have to "switch" structure/styles that's extra context I have to dig up and remember to interact with. For a core/active developer, that's fine once you learn it, but a casual/infrequent developer will have to reload/relearn all the context each time they come in. So, following the community where it makes sense too is a good way to make a code base more approachable and encourage more contributions (that is, making it easy to do so). |
Coming in as a new developer here are some things I ran into, most have already been mentioned.
|
Apologies for the interruption. I would like to add an user's (dare I say power-user) perspective. What are the big user-facing capabilities and changes that are planned? I may have missed the roadmap, so pointers are appreciated. If there are none/few, that's okay too as this is a mature product. On a related note, has there been any exercise in gathering user input regarding features? CIME is a critical piece of software that has a substantial user base. I would advocate for "design for least surprise" as one of the top priorities. A steep learning curve for users because of any new changes should be a last resort. As the adage goes, happy users don't complain. There is truth to that and in the case of E3SM, many hide the complexity of CIME behind a run-script. However, when things go wrong, these users still need to grasp an overview of how CIME works. Good documentation (existing docs help with room for improvement) goes a long way. So whatever changes would ease the development process or attract new developers, one should still aim for a clarity in design that can still be articulated to end-users. One of the pain points usually happens whenever one has to port CIME to a new architecture or platforms. Having done that myself on Summit including adding new compiler families and batch systems, I learned a bit about CIME. As the Perl motto, "There's more than one way to do it", there are certainly a few ways to hack together a CIME port and it might be helpful to incorporate best practices or steer towards one true way in the docs. Minor note: As an user, I don't rely on conda/pip environments to work with CIME on large DOE supercomputers due to their custom environments. CIME internally could be refactored or dramatically change. I just hope for a little user-facing stability (not hindering progress) while doing so. On the E3SM front, I would urge a little care in closer integration with CMake. We should certainly replace some of the older Perl with CMake logic etc. Again, I would advocate for clarity in lieu of reliance on "magic". Just avoiding the need for users to come back to a CIME developer or a "CMake expert" when they run into a build issue. |
I hope I could still share some thoughts even if it's been months back since the last activity on this thread. We have a fork of CTSM (based on branch The complexity of CIME had also motivated us to develop a standardized approach to build and model setup. I can strongly relate to @jhkennedy and @xylar account of CIME's flaws. My vague notion of simplicity is if our users (e.g. grad students, scientists) who don't necessarily have software engineering background could study the internals of a system on their own; i.e. it has some kind of "hackable" quality to it. Through this mindset we develop the tooling around our CTSM fork. |
Progress update.NamelistsMinimal progress has been made here. Build systemconfig_compilers.xml has been replaced with cmake macros, so
is done. XML env db systemWe've expanded the fields that support selectors, so
is at least partially done. TestingTest organization and the CI GitHub testing have been significantly improved. Code / repo layoutLayout has been significantly improved and "pythonized". We continue to move model-specific stuff out of CIME so the models can customize their extension points without having to do PRs to CIME. CIME development processNo progress here. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
@rljacob did you really mean to assign this to me? Or someone else? It does look like this should still be dealt with, but it wouldn't be me that works on it. It should stay open with the Low Priority label now. |
@ekluzek , this is only a discussion ticket. |
CIME V6 Planning
Let's use this issue to discuss CIME 6 plans. Once we have solidified a feature list, I can look at creating and organizing related issues. Feel free to assign anyone or edit this document. None of the content of this document is official until the team signs off; this is just me organizing my thoughts and getting the discussion started.
Background
In the last couple major CIME versions, the system transitioned from a loose collection of Perl and csh scripts, to a somewhat more centralized Perl system, and finally to a highly centralized Python system. We have made excellent progress in the areas of system cohesion, software design/engineering, performance, robustness, and testing while expanding capabilities. CIME also went from being closely coupled to CESM to being a more-independent infrastructure package supporting several climate models.
On the less-positive side, some of this progress came at the expense of added complexity and less transparency for users. I've had numerous interactions with users over the years where it was clear they were not happy with their CIME experience. This could be selection bias since happy users are usually quieter than unhappy ones, so it's hard to say if users are unhappier overall than they were before these centralization efforts.
Current Situation
CIME is in a decent state but I think with some renewed energy and the freedom to break backwards compatibility, we can make it even better.
On the E3SM side of things, we have new developer resources in @WesCoomber (.4 FTE) and @jasonb5 (1.0 FTE, full time!) which I want to take full advantage of. I want Wes and Jason to be excited to do CIME development, enthusiastic about (and contributing to) the vision for the project, and eventually stepping up to fill my role as E3SM's main CIME resource. Between me, Rob, Wes, and Jason, I think this is the most that E3SM has ever had invested in CIME and this is one of the main reasons for me creating this document.
In general, transformative changes to CIME have slowed down a bit in the last couple years. The reasons for this are many: the system is naturally maturing, the resources for core CIME development have been limited (especially on the E3SM side, just me and @rljacob have had longterm commitment to CIME), limitations in testing have made us afraid to break each other (more on this later), and changing behavior / breaking backwards incompatibility is very painful in a production system.
The meta goals of V6 will be to accelerate the next rounds of transformative CIME changes while getting our new developers invested in CIME and improving the user experience.
Technical goals
Solidify role of namelists in the system and how model components should interact with them
We've have some discussion of this in the past here: #1278
Namelists are input files for the model execution so, in theory, we should be able to do nml generation once at the start of the RUN phase and that's it. What happens in practice is that namelists are generated during SETUP, BUILD, SUBMIT, and RUN phases. Back when generating namelists was computationally expensive, this was a significant performance bottleneck for the case control system. Now that it's pretty fast, the problem is more of an issue of technical debt and unnecessary complexity.
During my work on the build system, it quickly became clear that most of our major components (like eam (our atm) and elm (our lnd)) were using buildnml as a sort of pre-build configuration step. See this diagram:
The components know that CIME promises to call buildnml before the BUILD phase, so they took the opportunity to have buildnml setup the key Filepath and CCSM_cppdefs files that the build system depends on even though these files don't have anything to do with namelists. In a sense, buildnml became a catch-all for all pre-build component-specific setup. Adding to the problem is that, for our bigger components, the buildnml and configure (the script the generates Filepath and CCSM_cppdefs) scripts have become multi-thousand-line piles of Perl and so it's difficult to see exactly what they doing. At a glance, our configure scripts do not seem to be accessing or modifying namelists directly, so they appear to be decouple-able from buildnml.
Proposal
At the very least, configure should be decoupled from buildnml and integrated into the BUILD phase since that's what it's for. On the E3SM side, we could even integrate it into our CMake system since that's what CMake is for, configuring your build. This should allow us to remove the buildnml calls from both the SETUP and BUILD phases. If any component is using namelists to store/maintain general case data, that is a violation of their contract with CIME and those instances should be immediately changed to use the env XML system instead. If any components need custom setup actions, we should provide an extension point for that in CIME that is not buildnml; something like
$component/cime_config/setup
.Additional investigation is needed to see if about the buildnml class in SUBMIT and RUN. I think it's possible the one in SUBMIT can be removed without too much difficulty as well.
Build system
Some of this effort will likely be specific to E3SM only since we already diverged significantly from the classic CIME build system when we went to a CMake-based system two years ago.
Related issues:
#3446
#3341
#3287
I'd like to continue to reorganize and unify the build system around CMake. The first part of this will be refactoring how we handle sharedlibs which currently require lots of special handling in CIME. The SHAREDLIB_BUILD phase of the case-control system iterates over a list of sharedlibs that it thinks the case needs and calls
cime/src/build_scripts/buildlib.$libname
. What happens then varies greatly between sharedlibs. Some leverage the classic CIME scripts/Tools/Makefile, some have CMake, some have their own Makefiles. It would be nice if we could have all the shared lib builds use CMake under-the-hood. That way, we could begin to unwind some of the complexity of our various systems for managing compilation settings and just put all that stuff in CMake directly. There's a significant amount of complexity in the code that generates Macro files because it needs to support multiple build system languages (Make and CMake) and language-neutral config_compilers.xml. If the build system was fully cmake-ified , we could potentially just write all the compiler/flag stuff in hierarchical cmake cache files. Ideally, I think all the info in the Depends files, config_compilers.xml, and hardcoded stuff in CMakeLists.txt could all be nicely encapsulated in a cmake cache file system. This would remove layers of CIME magic between users and their compilation settings.Another very important topic is thinking about how to reduce the amount of building that goes on when doing test suites. The default behavior is to do a full build for every case. E3SM test suites offer the ability to mark a suite as shared build but this feature is not yet widely used and is use-at-your-own-risk. It would be interesting to see if the sharedlib system could be expanded to work for components.
Proposal
XML env database
Relevant issues:
#2161
#3338
#2965
We've made good progress on CIME's env XML "database", especially in the areas of robustness, performance/caching, and encapsulating python's XML ElementTree. I think more progress can be made in formalizing the guarantees/invariants of the system, further standardization of syntax across env xml files, expanding and standardizing the attribute selector concept, and all-around simplification.
Even as co-author of the XML system, I often get a bit lost in our XML code because the execution path, even for fairly simple actions like get_value, is so complex. I'd like for a developer to do a deep dive into CIME/XML/*.py and try to find sources of complexity and potential remedies.
Proposal
Testing
Relevant issues:
#2521
The problem is nicely described in that issue, so I won't repeat too much here. My hunch is that the best thing to do would be to expand upon the GitHub CI system to cover more system, model, and compiler combinations. We've had great results doing this for my other project SCREAM using Jenkins and autotester.
Test scheduling
We currently have create_test/test_scheduler.py that works fine if you are on the machine where you want to run. Can things be improved by using CWL?
Code / repo layout
Relevant issues:
#3432
#3393
Drop support for python2, let's move to requiring a newish version of python3. I'd like to be able to use python's latest string formatting syntax, pathlib, and modern concurrency libraries.
Complete the separation of fortran science packages (cpl/drv data models, etc) by model. I think we have an issue for this but I couldn't find it.
It would be nice to be able to do model-specific extensions to CIME without touching the CIME repo. Things like model-specific provenance should be modifiable from the host repo.
Finally, I got beat up pretty badly in an E3SM all-hands a few years ago for the non-standard, non-"pythonic" organization of CIME's python code tree under CIME/scripts. This was causing problems for developers who use python IDEs and confusion with importing , PYTHONPATH etc. This should be pretty easy to clean up, so I think it's worth addressing even if most of us are using text editors to develop CIME. Potentially even look into integrating CIME within the Python ecosystem, PIP, anaconda, etc.
CIME development process
As I look through our open issues, I see lots of old issues falling through the cracks, including bug reports and other items that look high priority. We occasionally go through open issues during our Wed meetings but that is time consuming and not much fun. I don't have any concrete proposal to deal with this, but it seems like we need some additional mechanisms for organizing, prioritizing, and shepherding tickets. It would be ideal if we could achieve this without additional meeting overhead.
The text was updated successfully, but these errors were encountered: