Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create ELM module to handle LUH2 data import #21

Open
6 tasks done
glemieux opened this issue Mar 21, 2023 · 57 comments
Open
6 tasks done

Create ELM module to handle LUH2 data import #21

glemieux opened this issue Mar 21, 2023 · 57 comments
Assignees

Comments

@glemieux
Copy link
Owner

glemieux commented Mar 21, 2023

Per @ckoven, use the dynHarvestMod.F90 as a guideline for this module.

Tasks

@glemieux glemieux self-assigned this Mar 21, 2023
@glemieux glemieux moved this to In Progress in FATES LUH2 development Mar 21, 2023
@glemieux
Copy link
Owner Author

glemieux commented Apr 7, 2023

Review of dynHarvestMod.F90, which is used as a template for the dynLandUseMod.F90 development.

Notable differences between elm and clm:

  • CLM: dynHarvest_interp and dynHarvest_interp_resolve_harvesttype are the CN and FATES interpolation procedures, respectively.
  • CLM: dynHarvest_interp_resolve_harvesttype is called directly in clmfates_interfaceMod.F90. The data is subsequently passed to fates later down the dynamics_driver call via a typical bc_in assignment.
  • ELM: dynHarvest_interp_harvest_type is the only interpolate procedure in elm. It is called in dynSubgrid_driver and is called every time step (regardless of FATES use or not).
  • ELM: data is passed to fates through elmfates_interfacemod.F90 via bc_in variables in dynamics_driv in the same manner as clm.
  • harvest_rates dimensionality is flipped between the two models. In elm it's harvest_rates(varnum,bounds%begg:bounds%endg, while in clm, it's the opposite.
  • The year_position flag is flipped between the two models: dynHarvest_file = dyn_file_type(harvest_filename, YEAR_POSITION_START_OF_TIMESTEP) for elm, clm uses END_OF_TIMESTEP
  • get_do_harvest seems really complicated. Why is it setup this way? I can ask Bill Sacks for a breakdown as I think he developed this. That said, I think we don't need to mimic this for use_fates_luh, but it might be good to double check.

@glemieux
Copy link
Owner Author

glemieux commented Apr 7, 2023

@glemieux
Copy link
Owner Author

@ckoven should use of luh2 data be incompatible with using transient data?

@ckoven
Copy link
Collaborator

ckoven commented Apr 12, 2023

@ckoven should use of luh2 data be incompatible with using transient data?

yes, anything FATES-related should be incompatible with the big-leaf transient land use drivers

@glemieux
Copy link
Owner Author

glemieux commented Apr 12, 2023

ELM build development notes:

  • I don't think we want to incorporate an luh2 option for the namelist_defaults_usr_files.xml as I think (for now) we don't want the user to be able to define their own luh2 data set to be used. Maybe in the future?
  • I think I need to rename the module to avoid land use specifically as it might be confused with the existing transient land use capabilities.
  • I think eventually we may want to incorporate specific luh2 use_cases for the different scenarios
  • I think we might want (need?) an update to setup_logic_surface_dataset to make sure a compatible dataset is called to be used with the luh data.
  • Similarly, do we need something in setup_logic_initial_conditions to check the initial date?

Updated:

  • namelist_defaults.xml
  • ELMBuildNamelist.pm
  • namelist_definition.xml

ELMBuildNameList notes:

@glemieux
Copy link
Owner Author

@ckoven did I understand correctly that hlm_use_lu_harvest and hlm_use_luh are independent and thus commit NGEET/fates@4099345 makes sense?

As a followup, iirc in the meeting today, we discussed that using harvest and luh2 at the same time is a future goal, correct? I started to add logic to deny this combo on the elm-side during namelist build check, but I'm wondering if I should let both be set?

@ckoven
Copy link
Collaborator

ckoven commented Apr 12, 2023

@glemieux no these should both be allowed already. The future goal will be to remove the current reading of the logging rates from the surface dataset entirely and replace them with the same information that is now on the luh2 file.

@glemieux
Copy link
Owner Author

elmfates_interfacemod.F90 notes:

  • We don't need to allocate the land use vectors as we are not always passing the data to other fates subroutines

@glemieux
Copy link
Owner Author

Aside: should the module actually be located in the dyn_subgrid directory? It seems conceptually consistent, but I'm not sure if software structure-wise it makes sense.

  • Talk to Erik about the history of dynSubGrid modules

@ckoven
Copy link
Collaborator

ckoven commented Apr 14, 2023

Aside: should the module actually be located in the dyn_subgrid directory? It seems conceptually consistent, but I'm not sure if software structure-wise it makes sense.

I guess another possibility is to put the subroutine not in its own module but in the (ec)lmfates_interfaceMod.F90 module?

@glemieux
Copy link
Owner Author

glemieux commented Apr 14, 2023

LUH2 file IO notes:

  • What is the distinction between controlMod.F90 and dynSubGridControlMod.F90?

To do:

  • Review the namelist groupings setup for passing the luh2 filename

@glemieux
Copy link
Owner Author

I guess another possibility is to put the subroutine not in its own module but in the (ec)lmfates_interfaceMod.F90 module?

Definitely a possibility. Right now I'm looking to see if I can implement something simple for using controlMod.F90 for the file IO, instead of the dynSubGrid module setup. I've also got a short, tentative walk through of dynSubGrid with Bill (and Erik) since he's the one who did most of the development for it, I believe.

@glemieux
Copy link
Owner Author

glemieux commented Apr 14, 2023

Build testing is in progress. I've messed up something with the namelist system:

ERROR: Command: '/home/glemieux/Repos/E3SM-Project/e3sm/components/elm/bld/build-namelist -infile /home/glemieux/scratch/e3sm-cases/luh2-buildcheck.fates.lobata.E6c1717c87c-F40993458/Buildconf/elmconf/namelist  -csmdata /data/cesmdataroot/inputdata -inputdata /home/glemieux/scratch/e3sm-cases/luh2-buildcheck.fates.lobata.E6c1717c87c-F40993458/Buildconf/elm.input_data_list -ignore_ic_year -namelist " &elm_inparm  start_ymd=00010101  /" -use_case 2000_control  -res bci_0.1x0.1_v4.0i  -clm_usr_name bci_0.1x0.1_v4.0i -clm_start_type default -envxml_dir /home/glemieux/scratch/e3sm-cases/luh2-buildcheck.fates.lobata.E6c1717c87c-F40993458 -l_ncpl 48 -r_ncpl 48 -lnd_frac /data/sitedata/bci_0.1x0.1_v4.0i/domain_bci_clm5.0.dev009_c180523.nc -glc_nec 0 -co2_ppmv 367.0 -co2_type constant  -ncpl_base_period day  -config /home/glemieux/scratch/e3sm-cases/luh2-buildcheck.fates.lobata.E6c1717c87c-F40993458/Buildconf/elmconf/config_cache.xml -bgc fates -no-megan -no-drydep' 
failed with error 'Can't call method "ww" without a package or object reference at /home/glemieux/Repos/E3SM-Project/e3sm/components/elm/bld/ELMBuildNamelist.pm line 2655.' from dir '/home/glemieux/scratch/e3sm-cases/luh2-buildcheck.fates.lobata.E6c1717c87c-F40993458/Buildconf/elmconf'

This is failing in setup_logic_do_harvest in ELMBuildNameList.pm which is interesting since I didn't set this in the namelist or require it during the init.

A sanity check build against the most recent elm-fates api update builds fine.

@glemieux
Copy link
Owner Author

glemieux commented Apr 17, 2023

This is failing in setup_logic_do_harvest in ELMBuildNameList.pm which is interesting since I didn't set this in the namelist or require it during the init.

This ended up being a simple issue of having some random characters (the ww referenced above) in this subroutine that just made no sense. It's getting past the ELM namelist build successfully now and crashing due to fates-side needs. Working through those now. I should note this is building just without actually setting the use_fates_luh flag to true, so its breaking the default run mode.

@ckoven
Copy link
Collaborator

ckoven commented Apr 18, 2023

Just to say that I pushed commits on both the ELM and FATES branches that include some new history dimensions and one new history variable

@glemieux
Copy link
Owner Author

@ckoven can you comment on this change that came in with commit NGEET/fates@8c942e8? It looks like you are adding a new site level variable, but I wasn't sure if that was actually the intent here.

@ckoven
Copy link
Collaborator

ckoven commented Apr 19, 2023

Thanks @glemieux for catching. Yeah, sorry, I didn't follow that thread to completion. The intent is to get rid of the three variables currentSite%disturbance_rates_primary_to_primary(i_disturbance_type), currentSite%disturbance_rates_secondary_to_secondary(i_disturbance_type), and currentSite%disturbance_rates_primary_to_secondary(i_disturbance_type), and replace them with a new site-level variables that tracked disturbance rates with three dimensions: N_DIST_TYPES * n_landuse_cats * n_landuse_cats. The first n_landuse_cats would be for the donor patch label, and the second would be for the receiver patch label.

Then, in the history output, this part of the code https://github.com/NGEET/fates/blob/main/main/FatesHistoryInterfaceMod.F90#L2615-L2635 would need to be reworked to replace all the p2p, s2s, and p2s stuff by a single variable with the new lulu dimension that I added in 893615c.

Does that make sense?

@glemieux
Copy link
Owner Author

glemieux commented Apr 19, 2023

Yep, that makes sense.

Reviewing that loop I got to thinking that we could potentially add some logic to it to avoid accumulating the disturbance rate for primaryland if the donor type is anything other than primaryland. E.g. we will never have a secondaryland donating to primaryland correct? Theoretically the way it is should be fine since said disturbance rate during those loop iterations would be zero anyways.

@ckoven
Copy link
Collaborator

ckoven commented Apr 19, 2023

Right, there could be logic but it shouldn't be needed. It might make sense to put in a failure if the donor loop index is not primary, the receiver loop index is primary, and the area disturbed in that part of the loop sequence is nonzero.

@ckoven
Copy link
Collaborator

ckoven commented Apr 20, 2023

Hi @glemieux just to say that I was looking at the code and saw a bug that needed fixing, so just pushed as 713aa876.

@glemieux
Copy link
Owner Author

We've got a set of code now with e25e4a8 and NGEET/fates@fb7b4eb that will build in fates default mode.

@ckoven
Copy link
Collaborator

ckoven commented Apr 20, 2023

OK, I think a couple things need to be done still before turning it on. First is that the current code will I think apply an annual change flux every day. So we need to either divide by 365 or do it only once a year (or put in a switch to allow the user to decide which of those to do). Another thing is I think we've lost some information in the site%disturbance rate variable that was feeding the history diagnostics, I can rework it slightly to give the right info unless you want to do that. Otherwise, it should be good to try I think?

@glemieux
Copy link
Owner Author

OK, I think a couple things need to be done still before turning it on. First is that the current code will I think apply an annual change flux every day. So we need to either divide by 365 or do it only once a year (or put in a switch to allow the user to decide which of those to do).

This came up as an aside in the ctsm meeting today when we were talking the clm-side BGC code updates Ryan was working on. Erik was explaining the ctsm "dribbler" concept for avoiding step function-like changes in the rates.

Another thing is I think we've lost some information in the site%disturbance rate variable that was feeding the history diagnostics, I can rework it slightly to give the right info unless you want to do that. Otherwise, it should be good to try I think?

I can give this a crack tonight or tomorrow morning, unless you get to it first.

@ckoven
Copy link
Collaborator

ckoven commented Apr 20, 2023

Another thing is I think we've lost some information in the site%disturbance rate variable that was feeding the history diagnostics, I can rework it slightly to give the right info unless you want to do that. Otherwise, it should be good to try I think?

I can give this a crack tonight or tomorrow morning, unless you get to it first.

I just added that change and pushed it to ckoven/fates@6c8f560

@ckoven
Copy link
Collaborator

ckoven commented Apr 21, 2023

OK I made a couple other changes just now to (hopefully) help the readability of the looping logic a bit just now. Will stop messing with the code now until you can start testing it.

@glemieux
Copy link
Owner Author

@ckoven per our conversation, I tried generating some write statements to get the indices of the loop in spawn_patches prior to where the call to TransLitterNewPatch is failing. It looks like the runtime is segfaulting before it can even write the statements. The land log doesn't even have the expected FATES dynamics start output (i.e. the last entry is Beginning timestep : 2000-01-01_00:30:00). This happens regardless of being in hlm debug mode or not.

@ckoven
Copy link
Collaborator

ckoven commented Apr 26, 2023

no that should be the simplest configuration

@ckoven
Copy link
Collaborator

ckoven commented Apr 26, 2023

I wonder if it is because the arrays in aren't being allocated in dynFatesLandUseInit when use_luh isn't true?

@glemieux
Copy link
Owner Author

Good thought. I'll look into this.

@glemieux
Copy link
Owner Author

glemieux commented Apr 26, 2023

@ckoven per our conversation, I tried generating some write statements to get the indices of the loop in spawn_patches prior to where the call to TransLitterNewPatch is failing. It looks like the runtime is segfaulting before it can even write the statements. The land log doesn't even have the expected FATES dynamics start output (i.e. the last entry is Beginning timestep : 2000-01-01_00:30:00). This happens regardless of being in hlm debug mode or not.

This issue I was seeing was a little misleading. I forgot I was to remove the temp code to force disturbance to be zero for a different issue (NaN due to divide by zero). Removing that I ran into a similar issue though, which was easier to identify. Part of the spawn_patches routine was missed in renaming the old new_patch_primiary: NGEET/fates@6a935ad

I'm hitting an FPE issue now in the first fates history write, but I suspect that should be a straightforward fix.

@glemieux
Copy link
Owner Author

glemieux commented Apr 27, 2023

@ckoven Ok, the default mode is working now with commit NGEET/fates@b19cedf. An important note: commenting out the write blocks in the patchloop_areadis loop that print out the indices, labels, and disturbance rate causes a FPE with the line that does the comparison against reasonable math precision:

disturbance_rate > (1.0_r8 + rsnbl_math_prec)

I think I have an idea for why this is and I'm testing it now

@ckoven
Copy link
Collaborator

ckoven commented Apr 27, 2023

ok great, thanks. On the site where you are testing. Does the initial state vector sum to one? I suspect that once you turn on the land use change code it should crash if that is not the case.

@glemieux
Copy link
Owner Author

ok great, thanks. On the site where you are testing. Does the initial state vector sum to one? I suspect that once you turn on the land use change code it should crash if that is not the case.

Thanks for the reminder, I haven't checked that.

@glemieux
Copy link
Owner Author

glemieux commented Apr 27, 2023

I'm running into some issues with the elm pio that I need to look into to get a better understanding of:

 ERROR: The size of the user buffer (           0  elements) is smaller than the size of the variable (                 1166  elements)
 pio_support::pio_die:: myrank=          -1 : ERROR: pionfget_mod.F90:         509 : Insufficient user buffer size

As a minor note, I didn't realize that the pio routines are fussy about variable case.

UPDATE: This is likely due to me not running luh mode for the appropriate resolution.

@glemieux
Copy link
Owner Author

Running on summit is getting farther along, but hitting an issue with the dimensionality of the data:

1: 1:  Opened existing file /ccs/home/glemieux/parameter_files/LUH2_historical_1850_2015_4x5_220308_year.nc 47
1: 1: Assertion failed...
1: 1: PIO: FATAL ERROR: Aborting... Extra outermost dimensions must have lengths of 1 (/autofs/nccs-svm1_home1/glemieux/e3sm/externals/scorpio/src/clib/pio_darray_int.c: 1358)
1: 1: Obtained 10 stack frames.

Note that I need to change the time variable to YEAR due to dynSubGrid file import requirements.

@ckoven
Copy link
Collaborator

ckoven commented Apr 28, 2023

HI @glemieux just curious if there is any update or if it would be helpful to discuss anything today, thanks-

@glemieux
Copy link
Owner Author

@ckoven I think this has more to do with elm's expectation of file "structure" at a low-level, for lack of a better term. I.e. I think we need make the luh2 file more closely resemble whatever requirement are being set. I haven't had much time to dig into it this morning, but will be able to give this time this afternoon.

@ckoven
Copy link
Collaborator

ckoven commented Apr 28, 2023

ok thanks. do you know if the issue is the metadata (variable names and dimensions, etc) or something lower level (netcdf version, variable type, etc)?

@glemieux
Copy link
Owner Author

From looking at the code, i think this is metadata (dimensions specifically).

@glemieux
Copy link
Owner Author

glemieux commented May 1, 2023

From looking at the code, i think this is metadata (dimensions specifically).

Looking at the differences between PIO and SCORPIO, this assertion check is specific to SCORPIO. Looking at the PIO code in ctsm, I don't this we'd have an issue with the file metadata as laid out currently:
SCORPIO pioassertion in pio_read_darray_nc_serial versus the most similar pioassertion in the same procedure in PIO.

The issue is cropping up upon trying to read in the first record variable (primf_to_secdn).

The relevant commit that where this check was added in SCORPIO is E3SM-Project/scorpio@d0163d2, which was part of E3SM-Project/scorpio#113. I'm going to reach out to the author to get some guidance.

@glemieux
Copy link
Owner Author

glemieux commented May 1, 2023

The above issue has been resolved. This was due to me renaming time to YEAR. Reviewing other elm landuse timeseries data I found that what I should have done was simply added YEAR as a copy of time. Erik noted that the YEAR variable requirement is pretty old and likely outdated. But at least its consistent across ELM and CLM.

I'm now running into the state_vector sum to one check warnings and hitting an issue in the EDInitMod during init_patches having area issues, so I think we're past the read-in issues and moving on to passing data issues.

@glemieux
Copy link
Owner Author

glemieux commented May 2, 2023

I'm now running into the state_vector sum to one check warnings and hitting an issue in the EDInitMod during init_patches having area issues, so I think we're past the read-in issues and moving on to passing data issues.

@ckoven the issue here appears to be due in part to missing calls to the interpolation prior to the patch initialization on the fates-side, which I was missing, so it was getting junk data for the states and rates (names were fine though).

The other issue has to do with the loop in the landuse mod in fates here:
https://github.com/ckoven/fates/blob/44696201f9dd9a3add7889f805d39ef82b7205ff/biogeochem/FatesLandUseChangeMod.F90#L241-L249.

The hlm_num_luh2_states value is 12, so this will cause an issue when using it to define the boundaries of the index into luh2_fates_luype_map which is a 5x5 matrix. I have an idea of how to refactor this code with a different mapping scheme that I'm going to try tomorrow, unless you get to it first.

@ckoven
Copy link
Collaborator

ckoven commented May 2, 2023

ok, great. yep that was just a mistake in my logic, sorry. feel free to do the mapping a different way, thanks.

@glemieux
Copy link
Owner Author

glemieux commented May 4, 2023

@ckoven, so I've fixed the initialization order of operations on the elm side to make sure that the init_patches has the state data it needs. That said, writing out the luh2 states by gridcell, I'm seeing no states summing to unity (although many are close) as well as a number of gridcells where each state type in that gridcell is NaN. I did a spot check on the netcdf file that its inputting to make sure the values match.

My assumption here is that our regridding may be messing up the fractions such that they don't sum to one within the new grid cell?

@ckoven
Copy link
Collaborator

ckoven commented May 4, 2023

I think the first thing to do is divide by the regridded land area, and see if that fixes the summing-to-one problem. The NaNs is a tougher problem, I wonder what the lats and lons are of those gridcells? presumably it is because they are for land that ELM is running over but has no LUH2 data because ice or similar.

@ckoven
Copy link
Collaborator

ckoven commented May 4, 2023

re the NaN issue: e.g. I am noticing that luh2 (unsurprisingly) has no data for Antarctica, whereas ELM also runs at the perimeter of the continent. So I think the strategy should probably be to search for NaNs, and when found, set the primf (or maybe primn) state to be one, and all other states and all transitions to be zero.

@glemieux
Copy link
Owner Author

glemieux commented May 4, 2023

re the NaN issue: e.g. I am noticing that luh2 (unsurprisingly) has no data for Antarctica, whereas ELM also runs at the perimeter of the continent. So I think the strategy should probably be to search for NaNs, and when found, set the primf (or maybe primn) state to be one, and all other states and all transitions to be zero.

Roger that. Will do

@glemieux
Copy link
Owner Author

glemieux commented May 4, 2023

I think the first thing to do is divide by the regridded land area, and see if that fixes the summing-to-one problem.

So basically, rerun the regridder but use conservative_normed?

@ckoven
Copy link
Collaborator

ckoven commented May 4, 2023

thanks. but at least as an initial sanity check let's also add a write statement with the lat and lon of each grid cell where we do that, so we can make a map and confirm that that is what is going on.

@ckoven
Copy link
Collaborator

ckoven commented May 4, 2023

I think the first thing to do is divide by the regridded land area, and see if that fixes the summing-to-one problem.

So basically, rerun the regridder but use conservative_normed?

No, I mean to ingest the land fraction dataset and propagate that through the regridding, as we had discussed the other day

@glemieux
Copy link
Owner Author

glemieux commented May 4, 2023

No, I mean to ingest the land fraction dataset and propagate that through the regridding, as we had discussed the other day

So this would be the LANDFRAC_PFT on the surface data set we regridded against? If I have this right, I spot checked this with python and the values are still not coming out to unity.

UPDATE: wait no, that'd be PCT_NATVEG?

@ckoven
Copy link
Collaborator

ckoven commented May 4, 2023

(1 - icwtr) in the filestaticData_quarterdeg.nc

@glemieux
Copy link
Owner Author

glemieux commented May 4, 2023

For future me, see: glemieux/fates#25 (comment)

@glemieux
Copy link
Owner Author

@ckoven do we want to provide a default dataset on the input data server so that users can have something to work with without having to provide it themselves? Currently I have the 1850-2015 f45 historical data listed as the default, but I also have the 0850-2015 on hand that I could upload.

@ckoven
Copy link
Collaborator

ckoven commented Jun 13, 2023

@glemieux yes I think adding the 1850-2015 f45 dataset as a default is right.

glemieux pushed a commit that referenced this issue Sep 23, 2024
…Dev/develop

Previously, when the regional sea-level prediction capability was added
to MALI (#21), the restart config option for the sea-level model was not
added. This led the sea-level model to get initialized to Timestep zero
when coupled MALI-SLM simulations are being restarted, forgetting about
the ice loading changes and associated viscoelastic solid earth deformation
that happened in the timesteps prior to current model time. This PR
fixes the problem by allowing the sea-level model to resume where it
was left off. Note in parallel to this PR, the version of the SLM needs
to incorporate the changes made in the following accompanying
PR (MALI-Dev/1DSeaLevelModel_FWTW#9)

* hollyhan/add_restart_functionality_slm:
  Don't call SLM on init of a restart
  Add addl info on restart about the calculated time since last SLM call
  Allow restarts at any interval when using SLM
  Add missing error flag so model actually dies when error occurs
  Add missing arguments to log write statement
  Update restart check to also use time interval division
  Adjust check if adaptive dt is on or not
  Update checks using interval division
  Improve error handling, correct other usage of config_uplift_method
  Improve synchronization of timesteps between MALI and SLM
  Add restart option when the SLM is coupled to MALI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants