-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python script for modifying fsurdat files (after they have been generated with the tool-chain) #1448
Conversation
@billsacks @ekluzek @billsacks I saw your emailed comments to @negin513 about her work, and I believe that they apply to mine since I used one of her scripts as my starting point. Feel free to just refer me back to that email if you feel I should start from there before you and @ekluzek spend much time reviewing this. Also I didn't know whether to add others to this PR, yet. Feel free to add (or tell me to add) anyone you think necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I have a lot of specific line comments in addition to some bigger picture things. I realize that you were following the example of some of @negin513 's scripts, so a lot of this really pertains to some things there that we have discussed but I guess she hasn't had a chance to incorporate yet.
-
The biggest thing is that almost all of this code should be moved to the python/ctsm directory, with code shared between scripts where possible. Then there should just be a little skeleton wrapper that loads the appropriate modules and calls the appropriate main function. Feelings of @ekluzek and me are that the executable wrapper should not have a ".py" extension. (Side note from last week's ctsm-software meeting: I looked briefly for a reference for this feeling and couldn't find one. However, it follows the general practice of unix utilities and (newer) CIME scripts: the scripts run by a user shouldn't have a language extension.)
-
We should have a discussion about the use of logging. I made some line comments here. It looks like the way this is set up may be to redirect print statements to the logger. But I feel we should be more explicit about logging, in part because it gives you more control over different logging levels. For example, see https://docs.python.org/3/howto/logging.html#when-to-use-logging.
-
[OPTIONAL] We may also want to discuss whether it's better to use a config file here rather than command-line arguments. I see pros & cons of each, and don't have any strong feelings on the matter, so we could just proceed with the command line argument-based approach for now, but it may be worth a bit of discussion up-front.
tools/contrib/fsurdat_modifier.py
Outdated
|
||
This script reads a surface dataset (fsurdat file) and outputs a | ||
modified copy of the same file. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe these notes can clarify how this is different than the modify_singlept_site_neon.py
script that @negin513 created and what it should be used for? I have two other high level questions:
- Is there code that should be shared between these scripts (to @billsacks' point about putting this in the python directory).
- Should we consider renaming the wrapper scripts we're calling so they are descriptive and consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wwieder thanks. I have added clarifications that will appear in my next commit.
- Ideally, yes, and we will continue to keep that in mind during development.
- I'm not attached to existing names.
...following yesterday's ctsm python organization meeting with Bill, Erik, and Negin.
...according to mtgs with Bill, Erik, and Negin. Here I moved the ModifyFsurdat class out of fsurdat_modifier.py into its own file and placed both those files in a new directory: /python/ctsm/modify_fsurdat
@billsacks I think I have brought the code in this PR closer to your recommendations. Pls let me know if I took a wrong turn anywhere and whether there are additional improvements and corrections that you envision. Note that I haven't looked at your review comments above, yet. I probably should go through those next. @billsacks @ekluzek @negin513 thank you for the recent meetings outlining what needed to happen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your work on this, @slevisconsulting ! You have addressed most of my initial comments, and this is moving in the right direction; thank you.
I was thinking a little bit about the object-oriented design you use here, though. At least as of now, it looks like the public interface to the ModifyFsurdat class consists of an __init__
method (which sets flags based on the command-line arguments) and a single main method that doesn't take any arguments (modify
). In most cases, I feel like the design is simpler & easier to understand if situations like this are changed to be non-object-oriented – so you would replace this class with a non-object-oriented modify_fsurdat
function whose arguments match the current arguments to the __init__
method. Along with this, you could combine your current two files into one, keeping everything together, which feels a little better to me.
However, here I can also see what feels to me like a more intuitive object-oriented design – and one that would likely lend itself better to unit testing:
- The
__init__
method would accept just one argument, the input file, and would read that file into memory, storing it in a variable in the object - There would then be separate methods for each modification you could do – e.g.,
overwrite_single_pft
,zero_nonveg_lu
, etc. The main program would then have logic like:if args.overwrite_single_pft: modify_fsurdat.overwrite_single_pft(...)
and similarly for the other modifications. So the main program would end up calling multiple methods, and the in-memory data would get incrementally modified. - There would also be a final method to write out the result to a file, which would be called at the end of the main program
What I like about this proposed design is that it feels easier and more intuitive for testing, either via unit tests or interactively: I can imagine initializing the object with some data (in a unit test context, I would imagine using a different constructor where I set the data directly rather than reading it from file), then calling one or more modifier methods, then checking the resulting data against expectations. It's possible to do the same thing with the current design, but it feels more awkward, because you have to specify up-front what changes you want to make, then the modify method does everything at once.
A possible downside of this proposed design is if certain modifications can only be done in a particular order: it feels easier to ensure that order dependence with your current design.
As with most design things, I don't feel like there is a clearly "right" design, but I thought I'd suggest this alternative to see what you think about it. I don't think it would take too much rework at this stage... and most of the rework is work I'd suggest doing anyway: breaking down the big modify
method into separate functions for the separate operations (it's just a question of whether the single big modify
method calls a bunch of individual, private functions, or if those functions are all public and called by the main program).
Command-line inputs | ||
- fsurdat_in: input file (str) | ||
- fsurdat_out: output file (str) | ||
- variable: variable to modify (str) | ||
- value: value assigned to the variable to be modified (float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can delete this part of the comment, which is already out of date and will get more so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will eventually get to this (sorry for postponing).
modify_fsurdat = ModifyFsurdat(args.fsurdat_in, | ||
args.fsurdat_out, | ||
args.overwrite_single_pft, | ||
args.dom_pft, | ||
args.zero_nonveg_lu, | ||
args.uni_snowpack, | ||
args.no_saturation_excess) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- For an argument list this long (and likely to grow longer), using keyword arguments helps prevent bugs (
fsurdat_in = args.fsurdat_in, fsurdat_out = args.fsurdat_out
, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not relevant to this arg. list anymore, but implementing this suggestion when calling update_metadata.
help='Turn on flag to make whole grid 100% single PFT. [default: %(default)s]', | ||
action="store", | ||
dest="overwrite_single_pft", | ||
default=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- For this and a few other booleans, you could follow the pattern of setting
action="store_true"
and getting rid of thedefault=False
. (You did that for a few but not all.)
action = "store", | ||
dest = "fsurdat_in", | ||
type = str, | ||
default = "/glade/p/cesmdata/cseg/inputdata/lnd/clm2/surfdata_map/release-clm5.0.18/surfdata_0.9x1.25_hist_78pfts_CMIP6_simyr2000_c190214.nc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think you should remove this default and make this a required argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postponing because during development I find it convenient to not have to enter fsurdat_in and fsurdat_out every time I run.
dest = "fsurdat_out", | ||
type = str, | ||
default = "/glade/scratch/" + getuser() + | ||
"/surfdata_0.9x1.25_hist_78pfts_CMIP6_simyr2000_c190214_modified.nc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think you should remove this default and make this a required argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postponing because during development I find it convenient to not have to enter fsurdat_in and fsurdat_out every time I run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the default and added required=True
for --fsurdat_out (not for --fsurdat_in, yet), but I don't get a helpful "This is required" error when I omit it...
Also replacing default fsurdat_in and fsurdat_out to work with this test: SMS_D_Ld1.f10_f10_mg37.I1850Clm50Sp.cheyenne_intel.clm-collapse_pfts_78_to_16_decStart_f10 although ultimately we plan to remove these defaults.
This allows potentially overriding variables that may get set by the first function, which identifies a user-defined idealized land swath. I also completed the list of fsurdat variables that I believe need updating in that first function (def land_swath). I still need to address various TODOs.
Change ceta from 450 to 358 for CTSM5.1 See Leo VanKamenhout's work on this in ESCOMP#250. Note, this is used in the PPE work. See van Kampenhout et al. (2017). Turn Medlyn on for PHS off for clm51 and clm50. Make max CO2 partial pressure consistent in Photosynthesis. Change surface datasets to only use 78PFT versions Fix GSSUN and GSSHA history variables so now not just writing missing values. Changes FATES to run with use_nitrif_dentrif=T by default. Change dynlakes test to methane/use_nitrif_=T.
Reordered function calls again to what makes sense to me right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for all of your continued work on this. You have addressed all of my previous comments really well. And for the most part I am quite happy with all of the new code you have added (mainly the unit test code). I really appreciate all of your work on that! In particular, your tests of _handle_config_value
are great (though see a few small comments inline). And, while I haven't looked at them carefully, at a glance the tests of _get_not_rectangle
also seem really good – thank you for adding all of these tests!
In addition to my inline comments on your new test code, I can think of the following things that still need to be done before this PR is finalized:
- Need to add a new test to the aux_clm test list leveraging the new system test
- Need to commit the files in testinputs (and I'll confirm that they're committed via lfs; can see this when viewing the file on github)
- Need to get pylint passing
python/ctsm/test/test_unit_utils.py
Outdated
def test_lonRange0To360_lonNegative(self): | ||
""" | ||
Tests that negative inputs to lon_range_0_to_360 in the range [-180, 0) | ||
get 360 added to them | ||
""" | ||
inputs = [-180, -0.001] | ||
for input in inputs: | ||
result = lon_range_0_to_360(input) | ||
self.assertEqual(result, input + 360) | ||
|
||
def test_lonRange0To360_lonNotNegative(self): | ||
""" | ||
Tests that inputs to lon_range_0_to_360 of 0 to 360 remain unchanged | ||
""" | ||
inputs = [0, 360] | ||
for input in inputs: | ||
result = lon_range_0_to_360(input) | ||
self.assertEqual(result, input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [OPTIONAL] You have added a good set of tests for lon_range_0_to_360 – thanks! I have a fairly strong preference for having each unit test test exactly one scenario. The two main reasons for this are: (1) it keeps each test simpler, serving greater value as test as documentation (because it's easy to see at a glance what the expected output is for a given input) and reducing the likelihood that there are logic errors in the test itself; and (2) if the test fails, it is more obvious what the problem is. Given that you have already written these tests as they are, I don't feel they absolutely need to be changed, but I wanted to at least mention my feelings on this for the future. We can talk more if you have a different view of this.
# Allow test names that pylint doesn't like; otherwise hard to make them | ||
# readable; in pylint can use: disable=invalid-name | ||
|
||
class TestFsurdatModifier(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think the name of this class and the file should use "ModifyFsurdat" rather than "FsurdatModifier", since they are really testing the modify_fsurdat module, not the fsurdat_modifier module.
|
||
class TestFsurdatModifier(unittest.TestCase): | ||
|
||
def test_handleConfigValue_UnsetCantBeUnset(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for adding these tests of _handle_config_value!
-
This and the following tests of _handle_config_value should be in the test module for utils, not here.
-
The one _handle_config_value test I don't see coverage for is the case where is_list = False and you make it through all the way to the end (i.e., a basic case of a scalar value, with or without converting to a type). I feel like there should be a test for this basic case.
default = [True, False, True] | ||
is_list = False # False ok because default is used when val = 'UNSET' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [OPTIONAL] Is allowing this behavior of having default be a list despite
is_list = False
actually something we deliberately want to allow? I'm thinking it's more accidental that this works without an error message, in which case this test could break if we refactor the code, and so we should change this to use a default of a scalar, not a list. But I don't have my head totally in this, so feel free to disagree here.
val = 'False Tree False' # intentionally misspelled True | ||
item = 'varname_in_cfg_file' | ||
default = None | ||
is_list = False # error triggered before is_list becomes relevant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- As with the above comment: it is fragile to have two errors here: if the code were refactored, this would fail for a different reason, leading to a maintenance problem
is_list=is_list, convert_to_type=convert_to_type, | ||
can_be_unset=can_be_unset, allowed_values=allowed_values) | ||
|
||
def test_setvarLev(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good test of the setvar
functions. In contrast to my earlier comment about splitting up a unit test into individual tests, I am marginally okay with having everything done in one test here since they reuse a bunch of setup code. My preference is still generally to have separate tests for things like this, with common setup code extracted into a setup function or simply copy & pasted if it isn't too much (duplication / copy & paste isn't quite as much of a problem for unit tests as it is for production code, in my view); but I am okay with this the way it is.
-
However, the direct setting of attributes of the ModifyFsurdat class (ModifyFsurdat.not_rectangle, ModifyFsurdat.file, and calling functions directly on the ModifyFsurdat class) is a problem: I think this usage can contaminate other unit tests, since you are setting permanent class attributes rather than attributes of a temporary object. You should instead instantiate an object of this class and then do all modifications and references to this object, not to the ModifyFsurdat class itself.
-
Having an alternative init interface to the ModifyFsurdat class, where you pass in data directly rather than setting it from a file, might help with the above, and would also keep this test more maintainable so that the test doesn't need to know the details of which variables within the class it should set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@billsacks I think that I have addressed this based on our meeting yesterday. Let me know if you see issues. Thank you for all the help.
|
||
# test setvar | ||
ModifyFsurdat.setvar_lev0(ModifyFsurdat, 'var_lev0', val_for_rectangle) | ||
self.assertTrue(ModifyFsurdat.file.var_lev0.equals(comp_lev0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [OPTIONAL] Can you use self.assertEqual instead of self.assertTrue here? I'm not sure if it will work, but if it does, then you'll probably get a more helpful error message if there are differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found an alternative, and let me know if you like it less.
# I have chosen the lon/lat ranges to match their corresponding index | ||
# values to keep this simple | ||
compare[lat_1-min_lat:lat_2-min_lat+1, lon_1-min_lon:lon_2-min_lon+1] = 0 | ||
self.assertIsNone(np.testing.assert_array_equal(not_rectangle, compare)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Here and in the following tests: would it work to use self.assertEqual on the two arrays rather than this indirect assertion? If so, I think that would give you a more informative message if the assertion fails (in addition to being more obvious to the reader).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced with an alternative, and LMK if you don't like it.
FSURDATMODIFYCTSM_D_Mmpi-serial_Ld1.5x5_amazon.I2000Clm50SpRs.cheyenne_intel
Confirmed: Confirmed for a global case: |
...instead of TestFsurdatModifier
...from test_unit_modify_fsurdat.py and add to them is_list = False test. Also break up lonRange0To360 tests to a test per case.
...or np.any(np.not_equal(arr1, arr2))
@billsacks |
Resolved conflicts: cime_config/testdefs/testlist_clm.xml doc/ChangeLog doc/ChangeSum
I don't think it makes a huge difference either way, but I see some advantages to calling inline: - It makes the tests run faster - I think we'll be able to get more useful error messages and easier debugging in some situations - It allows us to suppress output, so that the unit test output remains clean and easy to read through This is consistent with how we do it for test_sys_lilac_build_ctsm; I think it's good to maintain consistency in this respect.
Was missusing np.array_equal and np.not_equal as assertions. Also I was checking for not_equal (at least I thought I was) in a few places when I should have been checking for equal. I think I got them right now.
I noticed that "make lint" wasn't checking the new files. The problem was that there was no |
This is needed for pylint to run on the files in this directory
This test suite will include tests that are potentially impacted by changes in the python directory, and so should be run whenever making a tag that changes the python code, even if no Fortran code is changed.
Description of changes
Introducing new Python script
fsurdat_modifier
usingsubset_data
as my initial template. Currently this script just opens and reads an fsurdat file and writes a copy of the same file. In this PR, I will introduce functions that modify the fsurdat before outputting it.Specific notes
For now fsurdat_modifier.py runs as follows:
./fsurdat_modifier.py --fsurdat_in </path/input_filename.nc> --fsurdat_out </path/output_filename.nc>
or
./fsurdat_modifier.py
for default entries offsurdat_in fsurdat_out
.or
./fsurdat_modifier.py --help
to see options (though this still lists options I found in the template code that I started from)Contributors other than yourself, if any: @ekluzek @billsacks
CTSM Issues Fixed (include github issue #):
Resolves #1558
Testing performed, if any:
I have executed the script in the ways explained a few lines up.
NOTE: Be sure to check your coding style against the standard
(https://github.com/ESCOMP/ctsm/wiki/CTSM-coding-guidelines) and review
the list of common problems to watch out for
(https://github.com/ESCOMP/CTSM/wiki/List-of-common-problems).