Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCPP physics_init step failed #30

Closed
uturuncoglu opened this issue Dec 16, 2019 · 40 comments
Closed

CCPP physics_init step failed #30

uturuncoglu opened this issue Dec 16, 2019 · 40 comments
Assignees
Labels

Comments

@uturuncoglu
Copy link
Collaborator

@ligiabernardet I am getting following error when i try to run the model with CCPP (by default the model builds with CCPP FV3_GFS_v15)

49:An error occurred in ccpp_physics_init
49:An error occured in GFS_phys_time_vary_init
49:
49:FATAL from PE    49: Call to CCPP physics_init step failed
49:
87:An error occurred in ccpp_physics_init
87:An error occured in GFS_phys_time_vary_init
87:
87:FATAL from PE    87: Call to CCPP physics_init step failed

The full log is in

/glade/scratch/turuncu/ufs-mrweather-app-master/run/ufs.log.9506878.chadmin1.ib0.cheyenne.ucar.edu.191216-144606

Following commands can be used to reproduce the error,

git clone https://github.com/ESCOMP/ufs-mrweather-app.git ufs-mrweather-app.dec16
cd ufs-mrweather-app.dec16
git checkout jpe_fv3_build
./manage_externals/checkout_externals
cd cime/scripts
./create_newcase --compset UFS_Weather --res C96 --case ufs-mrweather-app-test
cd ufs-mrweather-app-test
./case.setup
./case.build
./xmlchange DOUT_S=FALSE
./xmlchange STOP_OPTION=nhours
./xmlchange STOP_N=36
./xmlchange RUN_REFDATE=2016-10-03
./xmlchange RUN_STARTDATE=2016-10-03
./xmlchange JOB_WALLCLOCK_TIME=00:30:00
./xmlchange USER_REQUESTED_WALLTIME=00:30:00
./case.submit 

I just wonder is there any flag to debug CCPP and get more information about problem?

It might be a missing file or configuration option but i am not sure.

In this case, all the input files are coming from the https://ftp.emc.ncep.noaa.gov/EIB/UFS and input.nml and model_configure are generated by CIME. So, i am not copying them from any static input directory. I also tested with input.nml (i just set h2o_phys to .false because the required file does not exist in the FTP site) provided by @ligiabernardet which is in /glade/work/turuncu/FV3GFS/fv3_gfs_v15_repro_ccpp but i am getting same error.

@ligiabernardet
Copy link
Collaborator

Ufuk, I would like to suggest that you repeat the run with the updated code in ufs_public_release ufs-community/ufs-weather-model#15.

This code will support the two physics suite for the release: GFSv15p2 and GFSv16beta. You will find those SDFs in https://github.com/NOAA-EMC/fv3atm/tree/ufs_public_release/ccpp/suites (all other SDFs have been removed since they are not supported for use with the UFS in this release). You will find the namelists for the C96 configuration here:
https://github.com/ufs-community/ufs-weather-model/blob/ufs_public_release/parm/ccpp_v15p2_c96.nml.IN
and
https://github.com/ufs-community/ufs-weather-model/blob/ufs_public_release/parm/ccpp_v16beta_c96.nml.IN

The error you got may have been related to the changes you made in the namelist. Setting a parametrization to false in the namelist but still having it in the SDF often times does not work: those two files (SDF and namelist) must be compatible.

Did you say you are missing an input file? Please let me know which one you are missing.

@jedwards4b
Copy link
Collaborator

@ufuk I am updating the cime build to work with the latest ufs_weather_model release code now, should have something for you later today.

@ligiabernardet
Copy link
Collaborator

@mzhangw @JulieSchramm @llpcarson Pls refer to these SDFs and namelists for documentation

@uturuncoglu
Copy link
Collaborator Author

@jedwards4b okay that is great. BTW, I have some update in CIME side and I'll make a PR for it. There is no major changes and I just updated input data directory. So, if we merge it with CIME master then we could also update it and point original CIME repository.

@uturuncoglu
Copy link
Collaborator Author

@ligiabernardet Thanks for your help. @jedwards4b will update the model source and I could try again.

@uturuncoglu
Copy link
Collaborator Author

@ligiabernardet if i compare both nml files i could see some difference as following,

diff ccpp_v15p2_c96.nml.IN ccpp_v16beta_c96.nml.IN 
19c19
<   ccpp_suite = 'FV3_GFS_v15p2'
---
>   ccpp_suite = 'FV3_GFS_v16beta'
31a32,36
> &mpp_io_nml
> shuffle=1
> deflate_level=1
> /
> 
55,56c60,61
<   d2_bg_k1 = 0.15
<   d2_bg_k2 = 0.02
---
>   d2_bg_k1 = 0.20
>   d2_bg_k2 = 0.0
131c136,139
<   iaer         = 111
---
>   iaer         = 5111
>   icliq_sw     = 2
>   iovr_lw      = 3
>   iovr_sw      = 3
143c151,154
<   hybedmf      = .true.
---
>   hybedmf      = .false.
>   satmedmf     = .true.
>   isatmedmf    = 1
>   lheatstrg    = .true.
149c160
<   cdmbgwd      = 3.5,0.25
---
>   cdmbgwd      = 4.0,0.15,1.0,1.0
152a164,177
>   lsoil        = 4
>   lsm          = 1
>   iopt_dveg    = 1
>   iopt_crs     = 1
>   iopt_btr     = 1
>   iopt_run     = 1
>   iopt_sfc     = 1
>   iopt_frz     = 1
>   iopt_inf     = 1
>   iopt_rad     = 1
>   iopt_alb     = 2
>   iopt_snf     = 4
>   iopt_tbot    = 2
>   iopt_stc     = 1
162,165c187,193
<   do_sppt        = .T.
<   do_shum        = .T.
<   do_skeb        = .T.
<   do_sfcperts    = .F.
---
>   ldiag_ugwp   = .false.
>   do_ugwp      = .false.
>   do_tofd      = .true.
>   do_sppt      = .true.
>   do_shum      = .true.
>   do_skeb      = .true.
>   do_sfcperts    = .false.
214a243
>   reiflag = 2
246a276
>   LANDICE  = .true.
247a278,279
>   FAISL    = 99999
>   FAISS    = 99999
248a281
>   FSNOS    = 99999
249a283
>   FSICS    = 99999
251d284
<   FAISL    = 99999
303c336
<   launch_level      = 25
---
>   launch_level      = 27

while we are creating entire namelist automatically using CIME. I need to decide which parameter is resolution and CCPP dependent and which one is the common. So, i could define CIME xml namelist file based on this information and we just need to define the modifications for specific case. Do you have any idea about it?

@uturuncoglu
Copy link
Collaborator Author

I might miss something but i noticed that there is no option as interp_method in interpolator_nml (under FMS)

&interpolator_nml
  interp_method = 'conserve_great_circle'
/

Are these namelists are up-to-date with code? Any idea?

@ligiabernardet
Copy link
Collaborator

ligiabernardet commented Dec 17, 2019 via email

@climbfuji
Copy link
Collaborator

I can only say that in the regression tests some of the dynamics/physics parameters vary with resolution (which makes sense), but I do not know which values are the "correct" ones to use for which resolution (and for which version of the suite).

@uturuncoglu
Copy link
Collaborator Author

@ligiabernardet It is fine! we will use those two namelist for the release.

@uturuncoglu
Copy link
Collaborator Author

@ligiabernardet @climbfuji @KateFriedman-NOAA @yangfanglin We also found some inconsistencies in the input files especially for global_o3prdlos.f77. The file which is found in the FTP site

https://ftp.emc.ncep.noaa.gov/EIB/UFS/RT/fv3_gfdlmprad/global_o3prdlos.f77

is not compatible with CCPP and the latest version of the model gives following error when we use the file from the FTP

Now getting a runtime error:
22:An error occurred in ccpp_physics_init
22:An error occured in GFS_phys_time_vary_init: Value error in GFS_phys_time_vary_init: oz_coeff from read_o3data does not match value in GFS_typedefs.F90: 4 /= 6
22:
22:FATAL from PE    22: Call to CCPP physics_init step failed

While we are getting all input files from the FTP site, this creates a problem for us. It would be nice to have input files consistent with the CCPP on the FTP.

@ligiabernardet created a input directories for us to test the CCPP and that folders contains different global_o3prdlos.f77 files (at least the md5 hash are different).

@ligiabernardet
Copy link
Collaborator

ligiabernardet commented Dec 17, 2019 via email

@uturuncoglu
Copy link
Collaborator Author

@ligiabernardet okay thanks for the clarification. It is hard to decide which input file is correct which one is old without any data stamp on the files. @KateFriedman-NOAA @yangfanglin is it possible to put those directories to the FTP site? By this way, we could still use FTP site.

@ligiabernardet
Copy link
Collaborator

ligiabernardet commented Dec 17, 2019 via email

@yangfanglin
Copy link

yangfanglin commented Dec 17, 2019 via email

@uturuncoglu
Copy link
Collaborator Author

@yangfanglin Do we have those both file in the FTP? I think we have only one of them and the file that we need is missing. If we have both file then i could make the required changes in the CIME side.

@yangfanglin
Copy link

yangfanglin commented Dec 17, 2019 via email

@KateFriedman-NOAA
Copy link
Collaborator

Thanks @yangfanglin for making the trimmed down version.

All, I have replaced the fix directory on our ftp server with the trimmed down version:

https://ftp.emc.ncep.noaa.gov/EIB/UFS/global/fix/

fix_am.v20191213
fix_fv3_gmted2010.v20191213

The version (v20191213) corresponds to our latest set on WCOSS from which this set was made.

The two O3FORC files are here:

https://ftp.emc.ncep.noaa.gov/EIB/UFS/global/fix/fix_am.v20191213/global_o3prdlos.f77
https://ftp.emc.ncep.noaa.gov/EIB/UFS/global/fix/fix_am.v20191213/ozprdlos_2015_new_sbuvO3_tclm15_nuchem.f77

@ligiabernardet
Copy link
Collaborator

ligiabernardet commented Dec 18, 2019 via email

@uturuncoglu
Copy link
Collaborator Author

@yangfanglin BTW, i could not find the new_o3forc if statement that you mentioned in the model code. i just checked

release/v0/scripts/exglobal_fcst_nemsfv3gfs.sh

the top level hash for the mrweather-model is 7a4a7f3d in the following repository
https://github.com/ufs-community/ufs-weather-model/

and this version of code just has following copy command

$NLN $FIX_AM/global_o3prdlos.f77 $DATA/INPUT/global_o3prdlos.f77

@arunchawla-NOAA
Copy link
Collaborator

@uturuncoglu @KateFriedman-NOAA

Has the new input fix directories solved this?

@yangfanglin
Copy link

yangfanglin commented Dec 18, 2019 via email

@uturuncoglu
Copy link
Collaborator Author

@arunchawla-NOAA I am trying to modify the CIME scripts because folders in FTP changed. Then, i'll test with new file.

@yangfanglin If you don't mind, could you point me the place of the new script from your workflow? Thanks for your help.

@yangfanglin
Copy link

yangfanglin commented Dec 18, 2019 via email

@yangfanglin
Copy link

yangfanglin commented Dec 18, 2019 via email

@uturuncoglu
Copy link
Collaborator Author

I implemented almost all the logic for cold start and and CCPP v15p2 and try to run the model but i am getting following error in microphysics,

102:MPT: #2  MPI_SGI_stacktraceback (
102:MPT:     header=header@entry=0x7ffe6f97c340 "MPT ERROR: Rank 102(g:102) received signal SIGBUS(7).\n\tProcess ID: 25772, Host: r2i3n20, Program: /glade/scratch/turuncu/ufs-mrweather-app/bld/ufs.exe\n\tMPT Version: HPE MPT 2.19  02/23/19 05:30:09\n") at sig.c:340
102:MPT: #3  0x00002ae603dc3fb2 in first_arriver_handler (signo=signo@entry=7,
102:MPT:     stack_trace_sem=stack_trace_sem@entry=0x2ae6109a0080) at sig.c:489
102:MPT: #4  0x00002ae603dc434b in slave_sig_handler (signo=7, siginfo=<optimized out>,
102:MPT:     extra=<optimized out>) at sig.c:564
102:MPT: #5  <signal handler called>
102:MPT: #6  gfdl_cloud_microphys_mod_mp_wqs2_ ()
102:MPT:     at /glade/scratch/turuncu/ufs-mrweather-app/bld/atm/obj/FV3/ccpp/physics/physics/module_gfdl_cloud_microphys.F90:3878
102:MPT: #7  0x0000000000cad36f in gfdl_cloud_microphys_mod_mp_subgrid_z_proc_ ()
102:MPT:     at /glade/scratch/turuncu/ufs-mrweather-app/bld/atm/obj/FV3/ccpp/physics/physics/module_gfdl_cloud_microphys.F90:2131
102:MPT: #8  0x0000000000cac1f2 in gfdl_cloud_microphys_mod_mp_icloud_ ()
102:MPT:     at /glade/scratch/turuncu/ufs-mrweather-app/bld/atm/obj/FV3/ccpp/physics/physics/module_gfdl_cloud_microphys.F90:2003
102:MPT: #9  0x0000000000c9ea5d in gfdl_cloud_microphys_mod_mp_mpdrv_ ()
102:MPT:     at /glade/scratch/turuncu/ufs-mrweather-app/bld/atm/obj/FV3/ccpp/physics/physics/module_gfdl_cloud_microphys.F90:971
102:MPT: #10 0x0000000000c98529 in gfdl_cloud_microphys_mod_mp_gfdl_cloud_microphys_mod_driver_ ()
102:MPT:     at /glade/scratch/turuncu/ufs-mrweather-app/bld/atm/obj/FV3/ccpp/physics/physics/module_gfdl_cloud_microphys.F90:476
102:MPT: #11 0x0000000000b9fceb in gfdl_cloud_microphys_mp_gfdl_cloud_microphys_run_ ()
102:MPT:     at /glade/scratch/turuncu/ufs-mrweather-app/bld/atm/obj/FV3/ccpp/physics/physics/gfdl_cloud_microphys.F90:231
102:MPT: #12 0x0000000000b743fe in ccpp_fv3_gfs_v15p2_physics_cap_mp_fv3_gfs_v15p2_physics_run_cap_ ()
102:MPT:     at /glade/scratch/turuncu/ufs-mrweather-app/bld/atm/obj/FV3/ccpp/physics/ccpp_FV3_GFS_v15p2_physics_cap.F90:1548
102:MPT: #13 0x0000000000b0a534 in ccpp_static_api_mp_ccpp_physics_run_ ()
102:MPT:     at /glade/scratch/turuncu/ufs-mrweather-app/bld/atm/obj/FV3/ccpp/physics/ccpp_static_api.F90:150
102:MPT: #14 0x0000000000b0c0b6 in ccpp_driver_mp_ccpp_step_ ()
102:MPT:     at /glade/scratch/turuncu/ufs-mrweather-app/bld/atm/obj/FV3/ccpp/driver/CCPP_driver.F90:234
102:MPT: #15 0x00000000004b966e in atmos_model_mod_mp_update_atmos_radiation_physics_ ()
102:MPT:     at /glade/scratch/turuncu/ufs-mrweather-app/bld/atm/obj/FV3/atmos_model.F90:368
102:MPT: #16 0x00000000004afd63 in module_fcst_grid_comp_mp_fcst_run_phase_1_ ()
102:MPT:     at /glade/scratch/turuncu/ufs-mrweather-app/bld/atm/obj/FV3/module_fcst_grid_comp.F90:705
102:MPT: #17 0x00002ae5ff5d87de in ESMCI::FTable::callVFuncPtr(char const*, ESMCI::VM*, int*) ()
102:MPT:    from /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/libesmf.so
102:MPT: #18 0x00002ae5ff5dc39b in ESMCI_FTableCallEntryPointVMHop ()
102:MPT:    from /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/libesmf.so
102:MPT: #19 0x00002ae5ffa9f2d5 in ESMCI::VM::enter(ESMCI::VMPlan*, void*, void*) ()
102:MPT:    from /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/libesmf.so
102:MPT: #20 0x00002ae5ff5d9e3a in c_esmc_ftablecallentrypointvm_ ()
102:MPT:    from /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/libesmf.so
102:MPT: #21 0x00002ae5ffcceb4d in esmf_compmod_mp_esmf_compexecute_ ()
102:MPT:    from /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/libesmf.so
102:MPT: #22 0x00002ae5ffec7e31 in esmf_gridcompmod_mp_esmf_gridcomprun_ ()
102:MPT:    from /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/libesmf.so
102:MPT: #23 0x00000000004a2cdc in fv3gfs_cap_mod_mp_modeladvance_ ()
102:MPT:     at /glade/scratch/turuncu/ufs-mrweather-app/bld/atm/obj/FV3/fv3_cap.F90:998
102:MPT: #24 0x00002ae5ff946d59 in ESMCI::MethodElement::execute(void*, int*) ()
102:MPT:    from /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/libesmf.so
102:MPT: #25 0x00002ae5ff946c8a in ESMCI::MethodTable::execute(std::string, void*, int*, bool*) ()
102:MPT:    from /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/libesmf.so
102:MPT: #26 0x00002ae5ff946172 in c_esmc_methodtableexecute_ ()
102:MPT:    from /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/libesmf.so
102:MPT: #27 0x00002ae5ffb7a592 in esmf_attachmethodsmod_mp_esmf_methodgridcompexecute_
102:MPT:     ()
102:MPT:    from /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/libesmf.so
102:MPT: #28 0x00002ae600397ff9 in nuopc_modelbase_mp_routine_run_ ()
102:MPT:    from /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/libesmf.so

The initial conditions are coming from https://ftp.emc.ncep.noaa.gov/EIB/UFS/RT/fv3_gfdlmprad/INPUT/ and all input namelist files are generated automatically by CIME (it seems that they are identical with https://github.com/ufs-community/ufs-weather-model/blob/ufs_public_release/parm/ccpp_v15p2_c96.nml.IN, except default options, they are not included to the namelist).

i 'll also test with the input files found in static directory provided by @ligiabernardet

PS: My run directory is in /glade/scratch/turuncu/ufs-mrweather-app/run

@ligiabernardet
Copy link
Collaborator

ligiabernardet commented Dec 19, 2019 via email

@uturuncoglu
Copy link
Collaborator Author

@ligiabernardet Okay that is great. Comparing the directories will definitely help.

There is also some inconsistencies between the file

https://github.com/ufs-community/ufs-weather-model/blob/ufs_public_release/parm/ccpp_v15p2_c96.nml.IN

and

https://github.com/NOAA-EMC/global-workflow/blob/feature/gfsv16b/scripts/exglobal_fcst_nemsfv3gfs.sh

if you use the script to find the resolution of the static files, it must be t190.384.192 but in the namelist file t126.384.190 is used. So, i am not sure which one is correct. I also found that ccpp_v15p2_c96.nml.IN file includes also namelist variables that are same with the default values in the code, which i am not including them to input.nml. Now, i am going through to find the diffferncies between ccpp_v15p2_c96.nml.IN and input.nml produced by CIME.

@pjpegion
Copy link
Collaborator

@uturuncoglu, I'm looking in your directory on cheyenne, and I don't see the suite definition file. The crash is occurring in a portion of the microphysics that seems to catch unphysical temperatures.

And answering @ligiabernardet questions from earlier. The other options that vary with resolution is cdmbgwd. The global workflow config/config.fv3 show the resolution dependent parameters.

@uturuncoglu
Copy link
Collaborator Author

@pjpegion do we need to copy suit definition file (XML one) also to the run directory?

@pjpegion
Copy link
Collaborator

I'm not 100% sure, but if it is no there, how are you telling the model where it is? @ligiabernardet would know for sure.

@uturuncoglu
Copy link
Collaborator Author

Okay, to test it i copied to my run directory manually. So, i'll let you know about it. It is still in the queue.

@climbfuji
Copy link
Collaborator

climbfuji commented Dec 19, 2019 via email

@pjpegion
Copy link
Collaborator

@climbfuji thanks for clarifying. @uturuncoglu can you point me to your SDF, I want to see if it consistent with the namelist.
Thanks,-Phil

@uturuncoglu
Copy link
Collaborator Author

@climbfuji Thanks for the clarification. I also tested by copying to the run directory and it fails with the same way. The only difference seems that i am using different resolution for the static files. So, i'll try with the lower resolutions again.

@yangfanglin There is no file called seaice_newland.grb in the global/fix/fix_am.v20191213. Actually this is used by fv3_gfs_v15p2_repro_ccpp/input.nml that @ligiabernardet pointed. If you don't mind could you also put it to the FTP.

@uturuncoglu
Copy link
Collaborator Author

@pjpegion it is in /glade/scratch/turuncu/ufs-mrweather-app/run/suite_FV3_GFS_v15p2.xml and i copied from my source /glade/u/home/turuncu/EMC/ufs-mrweather-app.dec17/src/model/FV3/ccpp/suites/suite_FV3_GFS_v15p2.xml

@uturuncoglu
Copy link
Collaborator Author

I tested following configurations;

  • Using exactly same resolution files from successful runs that @ligiabernardet pointed. This test is failed with the same error

  • Using same input.nml from successful runs that @ligiabernardet pointed. This runs without any problem. So, it seems that it is an issue related with the input.nml. I am not sure why but i need to check the default values with the current version of the model source.

Thanks to all for your help.

@uturuncoglu
Copy link
Collaborator Author

@ligiabernardet @climbfuji @yangfanglin @pjpegion When i set h2o_phys = .true. (it was .false.) the model runs without any problem. I had also found problem with d_ext option, it was in its default value. So, now it seems it is working. I'll also test with CCPP v16beta. Thanks again for your great help.

@arunchawla-NOAA
Copy link
Collaborator

@uturuncoglu is this resolved now? Can this issue be closed?

@uturuncoglu
Copy link
Collaborator Author

@arunchawla-NOAA Yes, the issue can be closed now. I could able to run the model without any problem using both CCPP v15p2 and v16beta suites. In this case, all namelist files are automatically generated by CIME.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants