Python script for modifying fsurdat files (after they have been generated with the tool-chain) #1448

slevis-lmwg · 2021-07-31T00:16:53Z

Description of changes

Introducing new Python script fsurdat_modifier using subset_data as my initial template. Currently this script just opens and reads an fsurdat file and writes a copy of the same file. In this PR, I will introduce functions that modify the fsurdat before outputting it.

Specific notes

For now fsurdat_modifier.py runs as follows:
./fsurdat_modifier.py --fsurdat_in </path/input_filename.nc> --fsurdat_out </path/output_filename.nc>
or
./fsurdat_modifier.py for default entries of fsurdat_in fsurdat_out.
or
./fsurdat_modifier.py --help to see options (though this still lists options I found in the template code that I started from)

Contributors other than yourself, if any: @ekluzek @billsacks

CTSM Issues Fixed (include github issue #):
Resolves #1558

Testing performed, if any:
I have executed the script in the ways explained a few lines up.

NOTE: Be sure to check your coding style against the standard
(https://github.com/ESCOMP/ctsm/wiki/CTSM-coding-guidelines) and review
the list of common problems to watch out for
(https://github.com/ESCOMP/CTSM/wiki/List-of-common-problems).

slevis-lmwg · 2021-07-31T00:28:05Z

Slides that I presented to the CSSI group to discuss our options. The group decided to go with fsurdat_modifier.py, rather than the paleoToolkit or other method that would modify raw datasets.

While I'm sharing links:
I take additional notes here.

slevis-lmwg · 2021-07-31T00:39:36Z

@billsacks @ekluzek
I requested a review already because I want to ensure that I start with a good template.

@billsacks I saw your emailed comments to @negin513 about her work, and I believe that they apply to mine since I used one of her scripts as my starting point. Feel free to just refer me back to that email if you feel I should start from there before you and @ekluzek spend much time reviewing this.

Also I didn't know whether to add others to this PR, yet. Feel free to add (or tell me to add) anyone you think necessary.

billsacks

Okay, I have a lot of specific line comments in addition to some bigger picture things. I realize that you were following the example of some of @negin513 's scripts, so a lot of this really pertains to some things there that we have discussed but I guess she hasn't had a chance to incorporate yet.

The biggest thing is that almost all of this code should be moved to the python/ctsm directory, with code shared between scripts where possible. Then there should just be a little skeleton wrapper that loads the appropriate modules and calls the appropriate main function. Feelings of @ekluzek and me are that the executable wrapper should not have a ".py" extension. (Side note from last week's ctsm-software meeting: I looked briefly for a reference for this feeling and couldn't find one. However, it follows the general practice of unix utilities and (newer) CIME scripts: the scripts run by a user shouldn't have a language extension.)
We should have a discussion about the use of logging. I made some line comments here. It looks like the way this is set up may be to redirect print statements to the logger. But I feel we should be more explicit about logging, in part because it gives you more control over different logging levels. For example, see https://docs.python.org/3/howto/logging.html#when-to-use-logging.
[OPTIONAL] We may also want to discuss whether it's better to use a config file here rather than command-line arguments. I see pros & cons of each, and don't have any strong feelings on the matter, so we could just proceed with the command line argument-based approach for now, but it may be worth a bit of discussion up-front.

tools/contrib/fsurdat_modifier.py

wwieder · 2021-08-04T12:26:45Z

tools/contrib/fsurdat_modifier.py

+
+This script reads a surface dataset (fsurdat file) and outputs a
+modified copy of the same file.
+


Maybe these notes can clarify how this is different than the modify_singlept_site_neon.py script that @negin513 created and what it should be used for? I have two other high level questions:

Is there code that should be shared between these scripts (to @billsacks' point about putting this in the python directory).

Should we consider renaming the wrapper scripts we're calling so they are descriptive and consistent?

@wwieder thanks. I have added clarifications that will appear in my next commit.

Ideally, yes, and we will continue to keep that in mind during development.

I'm not attached to existing names.

...following yesterday's ctsm python organization meeting with Bill, Erik, and Negin.

...according to mtgs with Bill, Erik, and Negin. Here I moved the ModifyFsurdat class out of fsurdat_modifier.py into its own file and placed both those files in a new directory: /python/ctsm/modify_fsurdat

slevis-lmwg · 2021-09-17T23:35:49Z

@billsacks I think I have brought the code in this PR closer to your recommendations. Pls let me know if I took a wrong turn anywhere and whether there are additional improvements and corrections that you envision. Note that I haven't looked at your review comments above, yet. I probably should go through those next.

@billsacks @ekluzek @negin513 thank you for the recent meetings outlining what needed to happen.

billsacks

Thanks a lot for your work on this, @slevisconsulting ! You have addressed most of my initial comments, and this is moving in the right direction; thank you.

I was thinking a little bit about the object-oriented design you use here, though. At least as of now, it looks like the public interface to the ModifyFsurdat class consists of an __init__ method (which sets flags based on the command-line arguments) and a single main method that doesn't take any arguments (modify). In most cases, I feel like the design is simpler & easier to understand if situations like this are changed to be non-object-oriented – so you would replace this class with a non-object-oriented modify_fsurdat function whose arguments match the current arguments to the __init__ method. Along with this, you could combine your current two files into one, keeping everything together, which feels a little better to me.

However, here I can also see what feels to me like a more intuitive object-oriented design – and one that would likely lend itself better to unit testing:

The __init__ method would accept just one argument, the input file, and would read that file into memory, storing it in a variable in the object
There would then be separate methods for each modification you could do – e.g., overwrite_single_pft, zero_nonveg_lu, etc. The main program would then have logic like: if args.overwrite_single_pft: modify_fsurdat.overwrite_single_pft(...) and similarly for the other modifications. So the main program would end up calling multiple methods, and the in-memory data would get incrementally modified.
There would also be a final method to write out the result to a file, which would be called at the end of the main program

What I like about this proposed design is that it feels easier and more intuitive for testing, either via unit tests or interactively: I can imagine initializing the object with some data (in a unit test context, I would imagine using a different constructor where I set the data directly rather than reading it from file), then calling one or more modifier methods, then checking the resulting data against expectations. It's possible to do the same thing with the current design, but it feels more awkward, because you have to specify up-front what changes you want to make, then the modify method does everything at once.

A possible downside of this proposed design is if certain modifications can only be done in a particular order: it feels easier to ensure that order dependence with your current design.

As with most design things, I don't feel like there is a clearly "right" design, but I thought I'd suggest this alternative to see what you think about it. I don't think it would take too much rework at this stage... and most of the rework is work I'd suggest doing anyway: breaking down the big modify method into separate functions for the separate operations (it's just a question of whether the single big modify method calls a bunch of individual, private functions, or if those functions are all public and called by the main program).

billsacks · 2021-09-20T04:03:06Z

python/ctsm/modify_fsurdat/fsurdat_modifier.py

+    Command-line inputs
+    - fsurdat_in: input file (str)
+    - fsurdat_out: output file (str)
+    - variable: variable to modify (str)
+    - value: value assigned to the variable to be modified (float)


Can delete this part of the comment, which is already out of date and will get more so

Will eventually get to this (sorry for postponing).

billsacks · 2021-09-20T04:08:27Z

python/ctsm/modify_fsurdat/fsurdat_modifier.py

+    modify_fsurdat = ModifyFsurdat(args.fsurdat_in,
+                                   args.fsurdat_out,
+                                   args.overwrite_single_pft,
+                                   args.dom_pft,
+                                   args.zero_nonveg_lu,
+                                   args.uni_snowpack,
+                                   args.no_saturation_excess)


For an argument list this long (and likely to grow longer), using keyword arguments helps prevent bugs (fsurdat_in = args.fsurdat_in, fsurdat_out = args.fsurdat_out, etc.).

Not relevant to this arg. list anymore, but implementing this suggestion when calling update_metadata.

billsacks · 2021-09-20T04:41:14Z

python/ctsm/modify_fsurdat/fsurdat_modifier.py

+                help='Turn on flag to make whole grid 100% single PFT. [default: %(default)s]',
+                action="store",
+                dest="overwrite_single_pft",
+                default=False)


For this and a few other booleans, you could follow the pattern of setting action="store_true" and getting rid of the default=False. (You did that for a few but not all.)

billsacks · 2021-09-20T04:41:44Z

python/ctsm/modify_fsurdat/fsurdat_modifier.py

+                action = "store",
+                dest = "fsurdat_in",
+                type = str,
+                default = "/glade/p/cesmdata/cseg/inputdata/lnd/clm2/surfdata_map/release-clm5.0.18/surfdata_0.9x1.25_hist_78pfts_CMIP6_simyr2000_c190214.nc")


I think you should remove this default and make this a required argument.

Postponing because during development I find it convenient to not have to enter fsurdat_in and fsurdat_out every time I run.

billsacks · 2021-09-20T04:41:51Z

python/ctsm/modify_fsurdat/fsurdat_modifier.py

+                dest = "fsurdat_out",
+                type = str,
+                default = "/glade/scratch/" + getuser() +
+                          "/surfdata_0.9x1.25_hist_78pfts_CMIP6_simyr2000_c190214_modified.nc")


I think you should remove this default and make this a required argument.

Postponing because during development I find it convenient to not have to enter fsurdat_in and fsurdat_out every time I run.

I removed the default and added required=True for --fsurdat_out (not for --fsurdat_in, yet), but I don't get a helpful "This is required" error when I omit it...

Also replacing default fsurdat_in and fsurdat_out to work with this test: SMS_D_Ld1.f10_f10_mg37.I1850Clm50Sp.cheyenne_intel.clm-collapse_pfts_78_to_16_decStart_f10 although ultimately we plan to remove these defaults.

This allows potentially overriding variables that may get set by the first function, which identifies a user-defined idealized land swath. I also completed the list of fsurdat variables that I believe need updating in that first function (def land_swath). I still need to address various TODOs.

Change ceta from 450 to 358 for CTSM5.1 See Leo VanKamenhout's work on this in ESCOMP#250. Note, this is used in the PPE work. See van Kampenhout et al. (2017). Turn Medlyn on for PHS off for clm51 and clm50. Make max CO2 partial pressure consistent in Photosynthesis. Change surface datasets to only use 78PFT versions Fix GSSUN and GSSHA history variables so now not just writing missing values. Changes FATES to run with use_nitrif_dentrif=T by default. Change dynlakes test to methane/use_nitrif_=T.

Reordered function calls again to what makes sense to me right now

billsacks

Thank you very much for all of your continued work on this. You have addressed all of my previous comments really well. And for the most part I am quite happy with all of the new code you have added (mainly the unit test code). I really appreciate all of your work on that! In particular, your tests of _handle_config_value are great (though see a few small comments inline). And, while I haven't looked at them carefully, at a glance the tests of _get_not_rectangle also seem really good – thank you for adding all of these tests!

In addition to my inline comments on your new test code, I can think of the following things that still need to be done before this PR is finalized:

Need to add a new test to the aux_clm test list leveraging the new system test
Need to commit the files in testinputs (and I'll confirm that they're committed via lfs; can see this when viewing the file on github)
Need to get pylint passing

billsacks · 2021-12-07T22:30:31Z

python/ctsm/test/test_unit_utils.py

+    def test_lonRange0To360_lonNegative(self):
+        """
+        Tests that negative inputs to lon_range_0_to_360 in the range [-180, 0)
+        get 360 added to them
+        """
+        inputs = [-180, -0.001]
+        for input in inputs:
+            result = lon_range_0_to_360(input)
+            self.assertEqual(result, input + 360)
+
+    def test_lonRange0To360_lonNotNegative(self):
+        """
+        Tests that inputs to lon_range_0_to_360 of 0 to 360 remain unchanged
+        """
+        inputs = [0, 360]
+        for input in inputs:
+            result = lon_range_0_to_360(input)
+            self.assertEqual(result, input)


[OPTIONAL] You have added a good set of tests for lon_range_0_to_360 – thanks! I have a fairly strong preference for having each unit test test exactly one scenario. The two main reasons for this are: (1) it keeps each test simpler, serving greater value as test as documentation (because it's easy to see at a glance what the expected output is for a given input) and reducing the likelihood that there are logic errors in the test itself; and (2) if the test fails, it is more obvious what the problem is. Given that you have already written these tests as they are, I don't feel they absolutely need to be changed, but I wanted to at least mention my feelings on this for the future. We can talk more if you have a different view of this.

billsacks · 2021-12-07T22:37:44Z