Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend the CIME workflow to support user-specified initial conditions flexibly #37

Closed
rsdunlapiv opened this issue Dec 19, 2019 · 45 comments
Assignees

Comments

@rsdunlapiv
Copy link
Collaborator

A clear mechanism is needed in CIME for how the user should handle specifying which initial conditions to use (e.g., a forecast starting date and a set of input file to chgres) and ensuring overall consistency of the IC settings across the workflow.

This is something we expect the user to change frequently and potentially at different times during the workflow (e.g., the weather model has been built and run with a particular IC and the user wants to perform another run with a new IC - the model would not need to be rebuilt in this case.) The selected IC impacts both chrgres inputs and the model start time.

  • CIME already supports a START_DATE parameter which should be used to determine the ICs. A meaningful START_DATE is needed out of the box. An option here is to add a --startdate option to the create_newcase script (@jedwards4b). There may be other options as well.
  • The chgres namelist (config.nml) has a variable data_dir_input_grid that points to the location of the time dependent raw input data
  • A proposal is to define a root directory on each platform (machine) containing both fixed and input files. This is set in the machine.xml file using the variable $DIN_LOC_ROOT. The raw input data on a platform will be in $DIN_LOC_ROOT/global/ic/gfs.icdate. The fixed files on a platform will be in $DIN_LOC_ROOT/global/fix/fix_xx/.
  • The output from chgres is put into $RUNDIR/INPUT (the appropriate interpolated files for the selected IC)
  • See related issue XX about the need for a separate user_nl_chgres (currently there is only a single user_nl_fv3gfs for chrgres, the model, and post).
@uturuncoglu
Copy link
Collaborator

uturuncoglu commented Dec 24, 2019

@jedwards4b @rsdunlapiv when workflow is activated the case.run triggers buildnml and buildnml overwrites the files produced by chgres. i am not sure this was the case for the prototype that we had before. i think that case.submit must run the buildnml not case.run. there is no need to run buildnml in every task in the workflow. of course, we could modify the buildnml not to overwrite files produced by chgres but i am not sure about best way. Anyway, in the current version i put a control to the section that copies/links input files to check the files to prevent overwriting the existing files but we could change this design.

@uturuncoglu
Copy link
Collaborator

uturuncoglu commented Dec 24, 2019

@rsdunlapiv i added IC_DIR environment variable to the machine.xml file to point the initial condition directory. The directory must follow the standard indicated as below,

  • the data directory need to be named as $DIN_LOC_ROOT/IC/XXX.YYYYMMDD/HH
    • XXX is the prefix that defines the source of the data, it is not used by the CIME and it is just for documentation. it could be gfs etc.
    • YYYY is the year
    • MM is the month
    • DD is the day
    • and the sub directory called HH is the hour
  • CIME (buildnml) basically parse the name of the directories and compare the model start time with the initial condition date. if there is a mismatch between the initial condition and the model start time, the error message appear. we could also change the model start time based on the initial condition time but i think it is better to leave it to user.
  • buildnml also modifies chgres configuration file based on the model start time, resolution and used vertical layers
  • all the resolution dependent fixed files are automatically defined in the chgres configuration file. So, there is no need to interact with chgres configuration file except dealing with different input format, input source (such as FV3 output or restart files) and everything is modified on-the-fly to minimize the use fault.

@uturuncoglu
Copy link
Collaborator

Post-processing is also implemented. Now, we have initial version of end to end workflow. We still need followings

The ufs-mrweather-app on ESCOMP is updated if you want to test it on Cheyenne.

The successful run can be found in /glade/scratch/turuncu/ufs-mrweather-app.v16beta/run

@uturuncoglu
Copy link
Collaborator

I could run the whole workflow with different initial condition successfully with minor modifications in buildnml. I'll update app after testing different combinations and resolutions.

/glade/p/cesmdata/cseg/ufs_inputdata/IC/gfs.20191224/12

@mvertens
Copy link
Collaborator

mvertens commented Dec 26, 2019 via email

@rsdunlapiv
Copy link
Collaborator Author

rsdunlapiv commented Jan 6, 2020

@uturuncoglu needs to remove the IC_DIR environment variable and implement access to NOMADS server to download raw input files for chgres. This will only support the datasets currently available in NOMADS - the user is responsible for pulling in their own input files for dates not available in NOMADS.

@jedwards4b We need to add a DIN_IC_ROOT to env_run.xml. If DIN_IC_ROOT is not set, it will default to $DIN_LOC_ROOT/IC.

If the user pulls down their own inputs to chgres, then they can set DIN_IC_ROOT to their own local input directory if needed.

@uturuncoglu
Copy link
Collaborator

uturuncoglu commented Jan 7, 2020

@rsdunlapiv IC_DIR environment variable is removed and access to NOMADS server to download raw input files for chgres is implemented. The model looks the specific location in $DIN_IC_ROOT/prod and try to find desired input folder which is a sub-directory under prod/ as gfs.YYYYMMDD/HH. If it could not find the folder and files that are suitable for model start time, then it goes to nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/ and look for the desired files.

@rsdunlapiv
Copy link
Collaborator Author

@uturuncoglu to send request to @KateFriedman-NOAA to place default raw inputs to the FTP server that will be used as the default start date. prod/gfs.20190909/00

We also need additional testing of the NOMADS retrieval.

@uturuncoglu
Copy link
Collaborator

@KateFriedman-NOAA Do you have access to Cheyenne or Stampede2? The data is in following directories,

Cheyenne:
/glade/p/cesmdata/cseg/ufs_inputdata/prod/gfs.20190909

Stampede2:
/work/01118/tg803972/stampede2/UFS/ufs_inputdata/IC/gfs.20190909

The data need to be placed as prod/gfs.20190909/00. @rsdunlapiv could also copy files to NOAA Hera machine if you don't have access to any of the above platforms.

@KateFriedman-NOAA
Copy link
Collaborator

@uturuncoglu I don't have access to Cheyenne or Stampede2 unfortunately. If @rsdunlapiv could put the files on Hera I can grab them from there, thanks!

@rsdunlapiv
Copy link
Collaborator Author

@KateFriedman-NOAA and @uturuncoglu I started the copy to Hera. It looks like it's going to take a while. I'll update you when it's complete.

@rsdunlapiv
Copy link
Collaborator Author

@KateFriedman-NOAA the files are on Hera:
/scratch1/NCEPDEV/nems/Rocky.Dunlap/kate/gfs.20190909

There are two files:
00/gfs.t00z.atmanl.nemsio
00/gfs.t00z.sfcanl.nemsio

@KateFriedman-NOAA
Copy link
Collaborator

@rsdunlapiv Thanks, once Hera is back from today's maintenance I'll copy the files from there to our ftp server. Stay tuned...

@uturuncoglu
Copy link
Collaborator

@KateFriedman-NOAA I was copying following files from the model source before but the new version of the model does not have those files under pram/ folder. So, we need to place them also in the FTP

postxconfig-NT.txt
params_grib2_tbl_new

If it is possible could you also put them to FTP or i could put them inside of CIME interface of FV3 (like i did for *_table files) but i am not sure those files changes in time. So, what do you think?

@KateFriedman-NOAA
Copy link
Collaborator

@rsdunlapiv I'm in the process of copying the sample inputs over to WCOSS from Hera and then up to our ftp server. The atmanl file is huge so it's taking a while...

@uturuncoglu Those two files are post parm files:

  1. The postxconfig-NT.txt file is a copy of one of the postxconfig-NT*.txt files in the post/UPP parm folder, set as PostFlatFile in the post scripts. Depending on some conditions you get different ones used:
[Kate.Friedman@m72a3 gfs_post.fd]$ grep PostFlatFile scripts/exgdas_nceppost.sh.ecf
     export PostFlatFile=${PostFlatFile:-$PARMpost/postxconfig-NT-GFS-ANL.txt}
     export PostFlatFile=$PARMpost/postxconfig-NT-GFS.txt
     export PostFlatFile=$PARMpost/postxconfig-NT-GFS-F00.txt
     export PostFlatFile=$PARMpost/postxconfig-NT-GFS-FLUX-F00.txt
     export PostFlatFile=$PARMpost/postxconfig-NT-GFS-FLUX.txt

If $GRIBVERSION = 'grib2' & anl file: postxconfig-NT-GFS-ANL.txt
If $GRIBVERSION = 'grib2' & fcst file: postxconfig-NT-GFS.txt
If $GRIBVERSION = 'grib2' & fcst file & FH=00: postxconfig-NT-GFS-F00.txt
if $OUTTYP = 4 (flux file): postxconfig-NT-GFS-FLUX.txt
if $OUTTYP = 4 (flux file) & FH=00: postxconfig-NT-GFS-FLUX-F00.txt

[Kate.Friedman@m72a3 gfs_post.fd]$ pwd
/gpfs/dell2/emc/modeling/noscrub/emc.glopara/git/global-workflow/develop/sorc/gfs_post.fd
[Kate.Friedman@m72a3 gfs_post.fd]$ ll parm/postxconfig-NT-*GFS*
-rw-r--r-- 1 emc.glopara emcmodel 14558 Dec  5 17:42 parm/postxconfig-NT-GFS-ANL_GSM.txt
-rw-r--r-- 1 emc.glopara emcmodel 17089 Dec  5 17:42 parm/postxconfig-NT-GFS-ANL.txt
-rw-r--r-- 1 emc.glopara emcmodel 20582 Dec  5 17:42 parm/postxconfig-NT-GFS-F00_GSM.txt
-rw-r--r-- 1 emc.glopara emcmodel 24101 Dec  5 17:42 parm/postxconfig-NT-GFS-F00.txt
-rw-r--r-- 1 emc.glopara emcmodel  6089 Dec  5 17:42 parm/postxconfig-NT-GFS-FLUX-F00.txt
-rw-r--r-- 1 emc.glopara emcmodel 14118 Dec  5 17:42 parm/postxconfig-NT-GFS-FLUX.txt
-rw-r--r-- 1 emc.glopara emcmodel   665 Dec  5 17:42 parm/postxconfig-NT-GFS-GOES.txt
-rw-r--r-- 1 emc.glopara emcmodel 26921 Dec  5 17:42 parm/postxconfig-NT-GFS_GSM.txt
-rw-r--r-- 1 emc.glopara emcmodel  1322 Dec  5 17:42 parm/postxconfig-NT-GFS-GTG.txt
-rw-r--r-- 1 emc.glopara emcmodel 31556 Dec  5 17:42 parm/postxconfig-NT-GFS.txt

Do you have the postxconfig-NT*.txt files in the post associated with the release?

  1. The params_grib2_tbl_new file normally comes from the g2tmpl library as $POSTGRB2TBL. The post/UPP scripts set it as:

export POSTGRB2TBL=${POSTGRB2TBL:-${G2TMPL_SRC}/params_grib2_tbl_new}

The G2TMPL_SRC variable comes from the g2tmpl module.

Is there a copy of the g2tmlp library with the release?

@KateFriedman-NOAA
Copy link
Collaborator

@rsdunlapiv The sample input files are now on the ftp server:

https://ftp.emc.ncep.noaa.gov/EIB/UFS/inputdata/prod/gfs.20190909/00/

@uturuncoglu
Copy link
Collaborator

@KateFriedman-NOAA

  1. I could find postxconfig-* files under ./NCEPLIBS-post/parm/
  2. I also checked params_grib2_tbl_new file and it seems that it is included into NCEPLIBS too.

So, i could copy the files from NCEPLIBS but NCEPLIBS installation could change the based on the used platform and i prefer to include those files to CIME. At this point, i just wonder that are those files changes regularly?

@KateFriedman-NOAA
Copy link
Collaborator

@uturuncoglu I can't speak to the params_grib2_tbl_new file but the post postxconfig-*txt files do change from time to time when new fields are added. If the release is a frozen set it's not a concern but if you're setting up something that can evolve with time I recommend not copying the files into CIME but rather establishing links/references like we do in global-workflow.

For the params_grib2_tbl_new file from the g2tmpl module we have separate module files for each platform that defines and loads the module so at runtime the g2tmpl module gets loaded and the value of $POSTGRB2TBL is known. See the module_base.* files here in global-workflow for examples:

https://github.com/NOAA-EMC/global-workflow/tree/develop/modulefiles

@uturuncoglu
Copy link
Collaborator

@KateFriedman-NOAA which postxconfig-NT-* file in NCEPLIBS-post/parm/ directory was in the model source param/ directory before? I was copying postxconfig-NT.txt in that time. I think i just need to copy that file because the workflow only post-process FV3 output.

@KateFriedman-NOAA
Copy link
Collaborator

@uturuncoglu What do you mean when you say "model source param/ directory"? Which path are you talking about specifically?

The postxconfig-NT-GFS.txt file is what should be used for all forecast hours except for the anl and f000 files. The postxconfig-NT-GFS-ANL.txt file is for processing the anl file and the postxconfig-NT-GFS-F00.txt file is for processing the f000 file.

@uturuncoglu
Copy link
Collaborator

@KateFriedman-NOAA

Thanks for your help. Sorry i am little bit confused. I was using following files for post

https://github.com/ufs-community/ufs-weather-model/blob/3bc41ffde579516773549079fa00929f802b218a/parm/postxconfig-NT.txt
https://github.com/ufs-community/ufs-weather-model/blob/3bc41ffde579516773549079fa00929f802b218a/parm/params_grib2_tbl_new

and those files were included into ufs-weather-model/parm directory before but in the current version of model they are not. So, i am trying to find the correct place to reach those files. As you can see, the file name was postxconfig-NT.txt and it is hard for me to find the correct one from the list of files that are found in the NCEPLIBS-post/parm/. So, if you point me the correct one, i could try to use it.

My other concern is that, we design the workflow to process FV3 output using NCEP Post at the end of the simulation. I could both process nemsio and netcdf at this point to create grib files. In this case, do i need to use different postxconfig file for 000 and different for rest of the files such as 001 etc.? There was no such information given to us when we start to design the workflow and this implementation might require additional change in our side.

@uturuncoglu
Copy link
Collaborator

BTW, there is no ANL file when you run the model.

@KateFriedman-NOAA
Copy link
Collaborator

@uturuncoglu Ah I see, ok so in the post scripts there are if conditions that determine which postxconfig-NT*.txt file to grab and it ends up with the name "postxconfig-NT.txt" after being copied, which is what you're seeing. So you want the postxconfig-NT-GFS.txt file for f001+ and the postxconfig-NT-GFS-F00.txt file for f000.

I do not know if the new inline post differentiates between f000 and f001+. Let me see if I can find out...

@KateFriedman-NOAA
Copy link
Collaborator

@uturuncoglu Scratch my comment about inline post, we aren't including inline post in this release.

@arunchawla-NOAA @junwang-noaa Does the post for the UFS release differentiate between f000 and the other forecast hours when setting the postxconfig-NT.txt file? Can someone point me to the copy of the post being included in this release? I can check what the scripts are doing. Thanks!

@uturuncoglu
Copy link
Collaborator

@KateFriedman-NOAA Thanks. That is really valuable information for the person outside of NOAA. I was using single file before. If you don't mind could you put those files to FTP with some kind of versioning. If i try to get them from NCEPLIBS-post/parm/ directory it would break the workflow if someone install the library and delete the source or having some kind of custom installation. It is better to have hem in a public place that everybody could reach it. Then, i'll make the modification in the post script and CIME to use this updated information. Do you think you have any other information related with post that could help us to fix or improve?

Yes, the inline post is not the part of the release but we still need those files for post processing of the simulation output.

@junwang-noaa
Copy link

junwang-noaa commented Jan 17, 2020 via email

@KateFriedman-NOAA
Copy link
Collaborator

@uturuncoglu I have created a parm folder on the ftp server and placed the two postxconfig-NT-GFS*.txt files there:

https://ftp.emc.ncep.noaa.gov/EIB/UFS/parm/postxconfig-NT-GFS.txt
https://ftp.emc.ncep.noaa.gov/EIB/UFS/parm/postxconfig-NT-GFS-F00.txt

@uturuncoglu
Copy link
Collaborator

@KateFriedman-NOAA Thanks. Is it possible also put params_grib2_tbl_new to new folder.

@KateFriedman-NOAA
Copy link
Collaborator

@uturuncoglu Done:

https://ftp.emc.ncep.noaa.gov/EIB/UFS/parm/params_grib2_tbl_new

That is the file from g2tmpl v1.5.0.

@uturuncoglu
Copy link
Collaborator

@KateFriedman-NOAA Thank you for your kindly help. Do we need to put a versioning to those files at this point?

@jedwards4b
Copy link
Collaborator

jedwards4b commented Jan 17, 2020 via email

@uturuncoglu
Copy link
Collaborator

uturuncoglu commented Jan 17, 2020

Current POST allows users to use different post control file by setting the
post control file name in the post configuration file: itag.

@junwang-noaa which parameter that could be used to set post control file name? Is it introduced recently? I defined all namelist options in the CIME side using XML file but i could not see any option for it. If you let me know the exact name of the parameter that would be great!

@KateFriedman-NOAA
Copy link
Collaborator

KateFriedman-NOAA commented Jan 17, 2020

@uturuncoglu I created a lib folder and mimicked the structure of the g2tmpl library on WCOSS. You can now find that parm file here with a version folder included:

https://ftp.emc.ncep.noaa.gov/EIB/UFS/lib/g2tmpl/v1.5.0/src/params_grib2_tbl_new

Is the g2tmpl library not included in this release package somewhere with NCEPLIBS?

@uturuncoglu
Copy link
Collaborator

It is included as NCEPLIBS-g2tmpl and it has a params_grib2_tbl_new.

@uturuncoglu
Copy link
Collaborator

@junwang-noaa I think i found it, it seems it is fileNameFlat and read from stdin

@KateFriedman-NOAA
Copy link
Collaborator

KateFriedman-NOAA commented Jan 17, 2020

It is included as NCEPLIBS-g2tmpl and it has a params_grib2_tbl_new.

@uturuncoglu Cool, can CIME get that parm file from there then? I'm anxious about maintaining copies outside of the packaged libraries.

@uturuncoglu
Copy link
Collaborator

@KateFriedman-NOAA I am totally agree with your concern about keeping parm files outside of the source such as FTP but this particular case, i would like to get them from FTP because it is not easy (or possible) to reach the source directory of library installations all the time. So, FTP will be better solution at this point.

@KateFriedman-NOAA
Copy link
Collaborator

@uturuncoglu Understood, I'll stand down from my concerns. Thanks!

@WenMeng-NOAA @HuiyaChuang-NOAA What is the version number of the post being included in the UFS release? Thanks!

@WenMeng-NOAA
Copy link
Collaborator

@KateFriedman-NOAA The UPP branch "ufs_public_release" is for the UFS release.

@KateFriedman-NOAA
Copy link
Collaborator

@KateFriedman-NOAA The UPP branch "ufs_public_release" is for the UFS release.

@WenMeng-NOAA Thanks for the branch name but is there a version # I can attach to it (v#.#.#)? Perhaps the associated operational version # that best matches the postxconfig-NT*.txt files in that branch? It has been requested that everything we put on the ftp server for the release include version numbers. I am posting copies of the postxconfig-NT*.txt files. Thanks!

@WenMeng-NOAA
Copy link
Collaborator

@KateFriedman-NOAA As I know so far, there is no version number created. The DTC UPP team has been working on UPP for the UFS release. I include @fossell. She might provide you further information.

@KateFriedman-NOAA
Copy link
Collaborator

Thanks @WenMeng-NOAA !

@arunchawla-NOAA
Copy link
Collaborator

@rsdunlapiv what is the status of this ticket?

@jedwards4b
Copy link
Collaborator

I think that this issue is resolved. A new xml variable DIN_LOC_IC was introduced which is the local location of input files to the chgres program. The user sets RUN_STARTDATE in the case. Cime will check the DIN_LOC_IC directory for inputs for that date, if they are not found cime will check the NOMAD server for that date and download the files if available.

@uturuncoglu
Copy link
Collaborator

I confirm that setting DIN_LOC_IC for custom initial condition path works without any problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants