-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generate netcdf input parameter files during testing "on-the-fly"? #2935
Comments
We have some discussion on our project page here as well: |
@rgknox - why doesn't the existing cime infrastructure for dealing with namelists meet your needs? |
@rgknox we should talk about this. I may be missing some subtlety that makes your use case differ from the overall CTSM use case, but my initial reaction is to agree with @jedwards4b as far as the storage of defaults goes. What we're planning for CTSM is to have the version-controlled storage of defaults look the same as what @jedwards4b pointed you to (which is roughly similar to what's currently done in CTSM, except that the definition and defaults files have been merged), and to have the "user interface" for changing a single parameter be the same as it currently is (via a user_nl file). But, rather than having the namelist-generation script dump out a fortran namelist in the end, it would dump out a netcdf file. The barriers to doing this are: (1) Converting CTSM's build-namelist from perl to python. This is the big one. For your purposes, you could start fresh with a python-based script for the fates-specific parameters. (2) Bringing a netcdf library into cime. We could copy in scipy.io.netcdf for this purpose. One caveat is that this introduces a dependency on numpy, but in various discussions, people felt that's okay. (3) Adding some cime code to the namelist-generation module to support spitting out the namelist values into a netcdf file rather than a fortran namelist format. |
I think in the meantime you could use testmods for this. You can put any shell commands in the testmods which could include NCO or similar calls to edit the netcdf file. |
Thanks for the feedback all. @jedwards4b, using namelists to maintain our parameter data (exclusively) is an interesting idea, and perhaps we could migrate to that over the long term. @billsacks, I'd be happy to set aside some time to talk more about this and see what your plans are with CTSM. It seems to me that using script to modify text and then using executables like ncgen and ncdump are more likely to be readily supported on more HPC systems than the scipy library, no? @rljacob , I think adding shell command in testmods is the quickest way for me to expand my tests, while maintaining our existing default parameter file. |
The plan we've discussed is to have scipy.io.netcdf copied into cime: it's just a single file, and that's the only file you need from scipy in order to have netcdf support. Then the dependency is just on numpy, which appears to be available on all supported CESM and E3SM systems (or at least, I think it was when we last evaluated this for different purposes a year or two ago). I agree that your approach minimizes dependencies, but we really like the capabilities and user interface provided by the current mechanisims in cime. |
that plan sounds effective @billsacks |
Hi All, |
Okay, we discussed this extensively on the cime call today, with a lot of helpful input from @jhkennedy (see https://github.com/ESMCI/cime/wiki/Meeting-Notes). The basic conclusion was that we WILL allow a dependency on netcdf down the road, though with some caveats. There was some opposition to the idea of copying scipy.io.netcdf into cime, at least as a long-term solution - but if that's necessary as a short-term workaround, that may be acceptable. Instead, we'd like to adopt the more general python convention of providing an environment that has all necessary dependencies (probably through conda; e.g. https://github.com/E3SM-Project/e3sm-unified), then requiring that users load this python environment before running cime scripts. Among other things, this environment could include numpy, scipy and netcdf4-python. We felt that this should be done in a way that (1) if you're running a case that doesn't have these dependencies, then cime will run happily without them (e.g., by having conditional imports... though we aren't sure exactly what that will take, or if it's definitely feasible), and (2) ideally, we catch import errors and give some helpful error message, e.g., pointing to the software requirements documentation for e3sm/cesm. On the CESM side, we'd like to avoid CTSM needing non-standard libraries until after the CESM2.2 release (probably in 6-8 months). If you are just using this in some test scripts for now, some of the above requirements may not be relevant for you - e.g., cime/scripts/lib/CIME/SystemTests/pgn.py already has a netcdf4 import, and if you do something like that, where the import isn't invoked for most people using cime, then you're fine to move ahead with this without regard for some of these longer-term thoughts. Finally, a thought about netcdf4 vs. scipy.io.netcdf: We had originally been thinking about scipy.io.netcdf because it is easier to install and is possible to snapshot into cime (then just having a dependency on numpy). But if we're going with the idea of a conda environment anyway, then netcdf4 could be a better choice for many purposes, since it is more full-featured and supports netcdf4-formatted files. I'll also point out that, for the never-completed python version of cprnc, I developed a wrapper to provide a common API that could be used for scipy.io.netcdf, netcdf4, or a fake, in-memory netcdf4 library that could be used in unit tests. See https://github.com/billsacks/cprnc_python/tree/master/cprnc_py/netcdf (and especially the README there). Something like this could be useful for the following purposes:
I don't have strong feelings about whether we should use something like that, but I wanted to put it out there for consideration. Note that I have only written wrappers for read functionality; this wrapper would need to be extended to support writing netcdf files. |
Thanks @billsacks and all, this is really helpful to know what the long-term plan is! For our FATES purposes, I think we definitely fall into your category of "just using this in some test scripts", so hopefully this may not be too big of a deal to worry about. Longer term, as far as I can tell, any of your options between scipy.io.netcdf, necdf4, or your abstract netcdf python frontend all seem fine to me. |
Agreed, thanks for the feedback @billsacks and co. I share Charlie's sentiments, in that all of those options should serve our FATES needs. In my experience, I have used scipy's netcdf python library, as well as netcdf4-python, and both seem to be fine (my personal scripts use scipy's implementation). I have not had need of any features that go beyond reading and writing data and meta-data to files. |
sure, if we run into any snags we will let you know. Thanks for the feedback |
fine with me too |
In some discussions today, I realized a possible compromise solution for what we do with CTSM long-term: Rather than having the python build-namelist generate a netCDF file directly (via a python netCDF library), it would write text output in cdl format, then call ncgen. That requires having ncgen in your path, but that can be managed by cime using the same module mechanism that we use elsewhere (I think that, if you have loaded a netcdf module, you should have ncgen in your path). To me, this feels somewhat less robust than writing a netCDF file directly, but not necessarily that much less robust than our current scheme of writing a Fortran namelist-formatted text file. I feel like this could be a reasonable solution in order to avoid the python dependency issues. @ekluzek and @negin513 don't really like this solution, though, and feel like having an appropriate python environment loaded really isn't too much to ask. To be clear: what I'm suggesting above would differ from what (I think) you're doing in FATES, in that we'd still store the default parameters in xml files (as is done now), and modification could still be done via the Our tentative plan is still to stick with our long-standing plan of having the python write a netCDF file directly, but the above seems like a reasonable fall-back, particularly if strong objections are raised to introducing python dependencies. |
@billsacks EDIT: modified "proposed" to "describe" |
for what its worth, my personal take on this is that writing to netcdf is less error prone than trying to work on cdl text files, since the dimensionality of the parameter files is pretty complex, and that building a text-editing capability that includes all the internal consistency checks would end up being a fairly large task. |
Note, also that the CDL conversion mechanism is vulnerable to this problem that @rgknox identified if you have an implicit conversion into floating point... |
Just to be clear, my suggestion doesn't involve ever reading / modifying an existing cdl file: the data flow is: xml+user_nl ---> determine final parameter values using existing cime infrastructure ---> write cdl file via new cime infrastructure (should be similar to existing infrastrucutre for writing Fortran namelist, but with a different format spat out in the end) ---> generate netCDF file with ncgen I agree that this is more work and more error-prone than writing netCDF directly, but I wanted to clarify that (in contrast to what I think you are doing / suggesting for FATES) my suggestion doesn't involve any editing of existing cdl files. |
In the FATES project, we maintain the text (i.e. cdl) version of our "default" input parameter file used exclusively for fates parameters. We like this because it allows us to maintain these defaults in version control and thus have some provenance.
Thus far, whenever we change this default parameter file, I build a new netcdf binary file (i.e. the .nc equivalent), give it a new time-stamp in the file-name (change the metadata, at least I'm supposed to), load it to the SVN server, and then change the name of the file in the CLM defaults list used for testing.
Additionally, our testing system would be much more robust, if I could run tests that operate on a number of different parameter files that express more aspects of the FATES model, intead of our single default file. But it would be time consuming to constantly re-build all of these defaults every time we change something.
I'm exploring the idea right now, of developing a text based "diff" file, that would contain different sets of changes to our single, default, parameter file. This text "diff" file, would be interpreted by a python script, and would generate different parameter files. Aside from being used by scientists to help personally adapt to changes in the structure of the parameter file, thus keeping and sharing their own "diff" files that only document their parameters of interest, I think this type of strategy could be useful in testing.
My question to this community: would/could/does CIME support a test infrastructure, that would allow us to call a script from the create_test execution, that modifies the text-based cdl default parameter file that we have, into different variants for different tests, and then for each of them "ncgen"s a binary that was built "on-the-fly" for each test?
Apologies if I'm in the wrong place, or if this has been brought up already. or if its already done.
The text was updated successfully, but these errors were encountered: