-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The high-resolution region simulation with NUOPC is interrupted at the beginning #975
Comments
Hi @niuhanlin, This sounds like an exciting experiment! Have you tried to run it with a default (non fates) CLM compset? In the case of relatively complex setups like this it might be useful to confirm whether it it s FATES-specific error or not... If so, I a, not sure we have done a whole lot of testing (someone correct me if I'm wrong) with the accelerated spinup activated. Cheers! |
@rosiealice raises a good point. I think that the accelerated spinup option is reserved for bgc cases, and your compset does not include bgc from what I can tell. Regardless, if you haven't done the following, I recommend that you do this first:
Another comment: |
@rosiealice With your reminder, I have done some cases of other situations and the results are as follows. 1.compset:2000_DATM%QIA_CLM51%SP_SICE_SOCN_SROF_SGLC_SWAV_SESP 2.compset: X res:f19_g16 3.compset: B1850 res:f19_g16 Based on testing so far, I'm guessing if it's due to a version update and I need to add important settings to the config_* file. |
Something that I don't see you using in your create_newcase is this: |
@slevisconsulting In fact, I followed this step as you mentioned, except that it was in a local cluster. Sorry I didn't find a way to make it directly available for you to see, I had to write it as a file. |
Porting to a local cluster or other platform is beyond my expertise. @ekluzek does NCAR offer community support for porting the CTSM on other platforms? |
@rosiealice @slevisconsulting I did the experiment on FATES and it worked! Created using compset as follows. According to the current test results indicate that there is a problem with the compset of 2000_DATM%QIA_CLM51%FATES_SICE_SOCN_SROF_SGLC_SWAV, not a problem with my porting to the local machine. It also shows that my local setup for this port can be used by others. |
That's great news @niuhanlin Just to be clear: |
Yes, the other settings I use are consistent. But the surface file and the atmospheric forcing file use the default instead of using the one I made. |
So the compset that doesn't work is a FATES-SP (satellite phenology) case, and the one that does work is a fully dynamic (default) fates case? I am not expert enough in compset names to figure out the other differences, but just to check that an SP case is actually what you want to run? |
Contrary to what you said, FATES-SP works and FATES-fixed_biogeog does not. |
@niuhanlin thank you for documenting and testing this so thoroughly. Can you try running your setup with the long name for the compset? I include here the long name for GSWP3, but I see you are using QIA. Maybe it is a problem with the alias. I2000Clm51FatesRs |
@jkshuman You are right! After following the information you provided, I created a new case with GSWP3 and it ran successfully! |
Thanks for checking that @niuhanlin. Can you open an issue on the CTSM side with the details for the fail for the alias? Tagging @ekluzek to talk about this alias problem. Glad to hear it is functional with the long name for the compset. |
OK, it sounds like the issue here is that QIAN forcing doesn't run well with FATES. This isn't a configuration we test, as we only test FATES with GSWP3 and CRUNCEP forcing. In principle you should be able to use any datm forcing with any CTSM configuration, so even though we don't test QIAN with FATES I'd expect it to work. But, we only test QIAN forcing with CTSM-BGC. With software if you don't test something it can mean it's broken. In this case I'd wonder if the problem is too few processors for this specific case, because it's a custom resolution on a custom machine. But, actually QIAN forcing has less data than GSWP3, which is the opposite of what I'd expect. I suppose it could still be something different between QIAN and GSWP3 forcing. And still could be something with the machine or processor setup. I looked at the compset aliases for FATES and didn't see any problems. The main problem would be a mismatch of the alias name with how the long-compset name is given. @jkshuman am I understanding what's going on here? Is there a specific compset alias you think I should check? Also do you think it's important for FATES to be able to run with QIAN forcing? Since, QIAN forcing is our oldest lowest resolution forcing dataset, I wasn't thinking it was that important. But, if so I could check some cases with FATES and QIAN. We could at least provide a warning about using the two together. But, it still could be something specific to this resolution and machine. |
Thanks for talking this through @ekluzek and looking things over. Glad to hear the aliases look fine. I think I got myself mixed up on this one, but at least @niuhanlin got a successful run. I agree with you on QIAN being low priority based on your comments @ekluzek @niuhanlin can you confirm that using the GSWP3 compset 2000_DATM%GSWP3v1_CLM51%FATES_SICE_SOCN_SROF_SGLC_SWAV will work for you? |
I did some testing and found that both GSWP3v1 and Qia worked. The problem was with mapalgo. |
Closing this here, to continue discussion on the ctsm-side: ESCOMP/CTSM#1937 |
Hi! I am currently using CTSM-FATES to perform simulations on the Tibetan Plateau.Run with NUOPC as recommended.
But get a bad news, the run failed.
The error indicates that there is no clear problem, which is where I get confused.
The compiler I use is intel. Do I need to change to the gnu compiler?
It should be noted that single point and regional runs are feasible using MCT.
Running a single point is fine, but running a regional simulation will force it out after three years.
I guess it's caused by too much memory usage.
For NUOPC, a single point of simulation is perfectly fine.
The area simulation will show the MPI interrupt directly and does not show the real problem.
Here are some of the Settings I used to create the case.
./create_newcase --compset 2000_DATM%QIA_CLM51%FATES_SICE_SOCN_SROF_SGLC_SWAV --res CLM_USRDAT --case TP_5days_test_nuopc_2 --run-unsupported --machine niuhanlin
./xmlchange DATM_YR_START=1979
./xmlchange DATM_YR_END=1979
./xmlchange RUN_STARTDATE=1979-01-01
./xmlchange CLM_FORCE_COLDSTART=on
./xmlchange CLM_ACCELERATED_SPINUP=on
./xmlchange STOP_OPTION=ndays
./xmlchange STOP_N=5
./xmlchange LND_DOMAIN_MESH=lnd_mesh.nc
./xmlchange ATM_DOMAIN_MESH=lnd_mesh.nc
./xmlchange MASK_MESH=mask_mesh.nc
./case.setup
Add the surface file in user_nl_clm.
./case.build
sbatch cesm.sh(This file is the submission Settings.)
The following is what is written in the log file after the interrupt output. By the way, using single-node single-core and multi-node multi-core both failed.
in cesm.log file:
application called MPI_Abort(comm=0x84000000, 1) - process 0
in lnd.log file:
LND: PIO numiotasks= 1
LND: PIO stride= 1
LND: PIO rearranger= 2
LND: PIO root= 1
1 pes participating in computation for CLM
NODE# NAME
( 0) comput20
atm component = datm
rof component = srof
glc component = sglc
atm_prognostic = F
rof_prognostic = F
glc_present = F
flds_scalar_name = cpl_scalars
flds_scalar_num = 4
flds_scalar_index_nx = 1
flds_scalar_index_ny = 2
flds_scalar_index_nextsw_cday = 3
flds_co2a= F
flds_co2b= F
flds_co2c= F
sending co2 to atm = F
receiving co2 from atm = F
(shr_drydep_read) Read in drydep_inparm namelist from: drv_flds_in
(shr_drydep_read) No dry deposition fields will be transfered
(shr_fire_emis_readnl) Read in fire_emis_readnl namelist from: drv_flds_in
(shr_megan_readnl) Read in megan_emis_readnl namelist from: drv_flds_in
(shr_carma_readnl) Read in carma_inparm namelist from: drv_flds_in
shr_carma_readnl: no carma_inparm namelist found in drv_flds_in
(shr_ndep_readnl) Read in ndep_inparm namelist from: drv_flds_in
in atm.log file:
ATM: PIO numiotasks= 1
ATM: PIO stride= 1
ATM: PIO rearranger= 1
ATM: PIO root= 1
((atm_comp_nuopc)) case_name = TP_5days_test_nuopc_2
((atm_comp_nuopc)) datamode = CLMNCEP
((atm_comp_nuopc)) model_meshfile = /public/home/huser053/nhl/CTSM-221203/CTSM-master/tools/site_and_regional/subset_data_regional/lnd_mesh.nc
((atm_comp_nuopc)) model_maskfile = /public/home/huser053/nhl/CTSM-221203/CTSM-master/tools/site_and_regional/subset_data_regional/lnd_mesh.nc
((atm_comp_nuopc)) nx_global = 1
((atm_comp_nuopc)) ny_global = 1
((atm_comp_nuopc)) restfilm = null
((atm_comp_nuopc)) iradsw = 1
((atm_comp_nuopc)) factorFn_data = null
((atm_comp_nuopc)) factorFn_mesh = null
((atm_comp_nuopc)) flds_presaero = T
((atm_comp_nuopc)) flds_presndep = T
((atm_comp_nuopc)) flds_preso3 = T
((atm_comp_nuopc)) flds_co2 = F
((atm_comp_nuopc)) flds_wiso = F
((atm_comp_nuopc)) skip_restart_read = F
datm datamode = CLMNCEP
(dshr_mesh_init) (dshr_mod:dshr_mesh_init) obtained ATM mesh and mask from /public/home/huser053/nhl/CTSM-221203/CTSM-master/tools/site_and_regional/subset_data_regional/lnd_mesh.nc
(shr_stream_getCalendar) opening stream filename = /public/home/huser053/nhl/data/Solar3Hrly/clmforc.Qian.c2006.T62.Solr.1979-01.nc
(shr_stream_getCalendar) closing stream filename = /public/home/huser053/nhl/data/Solar3Hrly/clmforc.Qian.c2006.T62.Solr.1979-01.nc
(shr_stream_getCalendar) opening stream filename = /public/home/huser053/nhl/data/Precip3Hrly/clmforc.Qian.c2006.T62.Prec.1979-01.nc
(shr_stream_getCalendar) closing stream filename = /public/home/huser053/nhl/data/Precip3Hrly/clmforc.Qian.c2006.T62.Prec.1979-01.nc
(shr_stream_getCalendar) opening stream filename = /public/home/huser053/nhl/data/TmpPrsHumWnd3Hrly/clmforc.Qian.c2006.T62.TPQW.1979-01.nc
(shr_stream_getCalendar) closing stream filename = /public/home/huser053/nhl/data/TmpPrsHumWnd3Hrly/clmforc.Qian.c2006.T62.TPQW.1979-01.nc
(shr_stream_getCalendar) opening stream filename = /public/home/huser053/nhl/inputdata/atm/cam/chem/trop_mozart_aero/aero/aerosoldep_WACCM.ensmean_monthly_hist_1849-2015_0.9x1.25_CMIP6_c180926.nc
(shr_stream_getCalendar) closing stream filename = /public/home/huser053/nhl/inputdata/atm/cam/chem/trop_mozart_aero/aero/aerosoldep_WACCM.ensmean_monthly_hist_1849-2015_0.9x1.25_CMIP6_c180926.nc
(shr_stream_getCalendar) opening stream filename = /public/home/huser053/nhl/inputdata/lnd/clm2/ndepdata/fndep_clm_hist_b.e21.BWHIST.f09_g17.CMIP6-historical-WACCM.ensmean_1849-2015_monthly_0.9x1.25_c180926.nc
(shr_stream_getCalendar) closing stream filename = /public/home/huser053/nhl/inputdata/lnd/clm2/ndepdata/fndep_clm_hist_b.e21.BWHIST.f09_g17.CMIP6-historical-WACCM.ensmean_1849-2015_monthly_0.9x1.25_c180926.nc
(shr_stream_getCalendar) opening stream filename = /public/home/huser053/nhl/inputdata/cdeps/datm/ozone/O3_surface.f09_g17.CMIP6-historical-WACCM.001.monthly.185001-201412.nc
(shr_stream_getCalendar) closing stream filename = /public/home/huser053/nhl/inputdata/cdeps/datm/ozone/O3_surface.f09_g17.CMIP6-historical-WACCM.001.monthly.185001-201412.nc
(shr_stream_getCalendar) opening stream filename = /public/home/huser053/nhl/inputdata/atm/datm7/topo_forcing/topodata_0.9x1.25_USGS_070110_stream_c151201.nc
(shr_stream_getCalendar) closing stream filename = /public/home/huser053/nhl/inputdata/atm/datm7/topo_forcing/topodata_0.9x1.25_USGS_070110_stream_c151201.nc
(shr_strdata_set_stream_domain) stream_nlev = 1
(shr_sdat_init) Creating field bundle array fldbun_data of size 2 for stream 1
adding field Faxa_swdn to fldbun_data for stream 1
in drv.log file:
(esm_time_clockInit):: driver start_ymd: 19790101
(esm_time_clockInit):: driver start_tod: 0
(esm_time_clockInit):: driver curr_ymd: 19790101
(esm_time_clockInit):: driver curr_tod: 0
(esm_time_clockInit):: driver time interval is : 1800
(esm_time_clockInit):: driver stop_ymd: 99990101
(esm_time_clockInit):: driver stop_tod: 0
PIO rearranger options:
comm type = 0 (p2p)
comm fcd = 0 (2denable)
max pend req (comp2io) = -2
enable_hs (comp2io) = T
enable_isend (comp2io) = F
max pend req (io2comp) = 64
enable_hs (io2comp) = F
enable_isend (io2comp) = T
8 MB memory alloc in MB is 8.00\n8 MB memory dealloc in MB is
0.00\nMemory block size conversion in bytes is 1019.02
(t_initf) Read in prof_inparm namelist from: drv_in
(t_initf) Using profile_disable= F
(t_initf) profile_timer= 4
(t_initf) profile_depth_limit= 4
(t_initf) profile_detail_limit= 2
(t_initf) profile_barrier= F
(t_initf) profile_outpe_num= 1
(t_initf) profile_outpe_stride= 0
(t_initf) profile_single_file= F
(t_initf) profile_global_stats= T
(t_initf) profile_ovhd_measurement= F
(t_initf) profile_add_detail= F
(t_initf) profile_papi_enable= F
Attached is my lnd.in file
lnd_in.txt
The text was updated successfully, but these errors were encountered: