Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example cfg for NERSC #235

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 128 additions & 0 deletions examples/example_cori_haswell.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Do not edit any example cfg except for `example_generic.cfg`!

[default]
case = 20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis
environment_commands = "source /global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_cori-haswell.sh"
input = /global/cscratch1/sd/forsyth/zppy_complete_run_nersc_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis
input_subdir = archive/atm/hist
mapping_file = /global/homes/z/zender/data/maps/map_ne30pg2_to_cmip6_180x360_aave.20200201.nc
output = /global/cscratch1/sd/forsyth/zppy_complete_run_nersc_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis
partition = haswell
walltime = "02:00:00"
www = /global/cfs/cdirs/e3sm/www/forsyth/zppy_complete_run_nersc_output

[climo]
active = True
years = "51:55:2", "51:55:4",

[[ atm_monthly_180x360_aave ]]
frequency = "monthly"

[[ atm_monthly_diurnal_8xdaily_180x360_aave ]]
frequency = "diurnal_8xdaily"
input_files = "eam.h4"
vars = "PRECT"

[ts]
active = True
frequency = "monthly"
years = "51:55:2",

[[ atm_monthly_180x360_aave ]]
input_files = "eam.h0"

[[ atm_daily_180x360_aave ]]
frequency = "daily"
input_files = "eam.h1"
vars = "PRECT"

[[ atm_monthly_glb ]]
input_files = "eam.h0"
input_subdir = "archive/atm/hist"
mapping_file = "glb"
years = "51:61:5",

[[ land_monthly ]]
input_files = "elm.h0"
input_subdir = "archive/lnd/hist"
vars = "FSH,LAISHA,LAISUN,RH2M"

[[ rof_monthly ]]
extra_vars = 'areatotal2'
input_files = "mosart.h0"
input_subdir = "archive/rof/hist"
mapping_file = ""
vars = "RIVER_DISCHARGE_OVER_LAND_LIQ"

[tc_analysis]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengzhuzhang It's possible #169 (which added Tropical Cyclone analysis/diagnostics) wasn't tested on Cori. This task worked fine on the Compy example (#234). The Cori example cfg, however, fails with the following. Any thoughts?

Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(537):
MPID_Init(246).......: channel initialization failed
MPID_Init(647).......:  PMI2 init failed: 1
/var/spool/slurmd/job58523789/slurm_script: line 94: 42021 Aborted                 DetectNodes --verbosity 0 --in_connect ${result_dir}connect_CSne${res}_v2.dat --closedcontourcmd "PSL,300.0,4.0,0;_AVG(T200,T500),-0.6,4,0.30" --mergedist 6.0 --searchbymin PSL --outputcmd "PSL,min,0;_VECMAG(UBOT,VBOT),max,2" --timestride 1 --in_data_list ${result_dir}inputfile_${file_name}.txt --out ${result_dir}out.dat

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the kind of error that happens when you're using the wrong MPI (like from conda-forge). I might be on completely the wrong track but wanted to suggest that in case...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall that there was an issue when you tested the tc_analysis.bash on Cori. Not sure if this is related. but from what I can tell, on Cori compute node, it tries to initiate an MPI process with the tempestextremes call (DetectNodes), maybe you can try following? , i.g.:

"srun -n 32 DetectNodes --verbosity 0 --in_connect ${result_dir}connect_CSne${res}_v2.dat --closedcontourcmd "PSL,300.0,4.0,0;_AVG(T200,T500),-0.6,4,0.30" --mergedist 6.0 --searchbymin PSL --outputcmd "PSL,min,0;_VECMAG(UBOT,VBOT),max,2" --timestride 1 --in_data_list ${result_dir}inputfile_${file_name}.txt --out ${result_dir}out.dat

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it looks like adding the srun -n 32 didn't fix anything.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, I would suggest to get the stand alone DetecNodes command on has-well, with and without srun and see if the error would reproduce.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also do which DetectNodes to make sure it's coming from Spack and not from conda-forge?

active = True
scratch = /global/cscratch1/sd/forsyth
years = "51:53:2",

[e3sm_diags]
active = True
grid = '180x360_aave'
obs_ts = /global/cfs/cdirs/e3sm/e3sm_diags/obs_for_e3sm_diags/time-series
ref_final_yr = 2014
ref_start_yr = 1985
reference_data_path = /global/cfs/cdirs/e3sm/e3sm_diags/obs_for_e3sm_diags/climatology
short_name = '20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis'
ts_num_years = 2
years = "51:55:2", "51:55:4",

[[ atm_monthly_180x360_aave ]]
climo_diurnal_frequency = "diurnal_8xdaily"
climo_diurnal_subsection = "atm_monthly_diurnal_8xdaily_180x360_aave"
dc_obs_climo = "/compyfs/e3sm_diags_data/obs_for_e3sm_diags/climatology"
sets = "lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","enso_diags","qbo","diurnal_cycle","annual_cycle_zonal_mean","streamflow", "zonal_mean_2d_stratosphere",
streamflow_obs_ts = /global/cfs/cdirs/e3sm/e3sm_diags/obs_for_e3sm_diags/time-series

[[ atm_monthly_180x360_aave_tc_analysis ]]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e3sm_diags_atm_monthly_180x360_aave_tc_analysis_model_vs_obs_0051-0052 says "OK" for status but it doesn't produce a viewer. There is a FileNotFoundError: [Errno 2] No such file or directory: b'/global/cfs/cdirs/e3sm/e3sm_diags/obs_for_e3sm_diags/tc_analysis/IBTrACS.NA.v04r00.nc'.

# Running as its own subtask because tc_analysis requires jobs to run sequentially, which slows down testing
sets = "tc_analysis",
tc_obs = /global/cfs/cdirs/e3sm/e3sm_diags/obs_for_e3sm_diags/tc_analysis
years = "51:53:2",

[[ atm_monthly_180x360_aave_mvm ]]
# Test model-vs-model using the same files as the reference
climo_diurnal_frequency = "diurnal_8xdaily"
climo_diurnal_subsection = "atm_monthly_diurnal_8xdaily_180x360_aave"
climo_subsection = "atm_monthly_180x360_aave"
diff_title = "Difference"
gauges_path = /global/cfs/cdirs/e3sm/e3sm_diags/obs_for_e3sm_diags/time-series/GSIM/GSIM_catchment_characteristics_all_1km2.csv
ref_final_yr = 52
ref_name = "20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis"
ref_start_yr = 51
ref_years = "51-52",
reference_data_path = /global/cscratch1/sd/forsyth/zppy_complete_run_nersc_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/atm/180x360_aave/clim
reference_data_path_climo_diurnal = /global/cscratch1/sd/forsyth/zppy_complete_run_nersc_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/atm/180x360_aave/clim_diurnal_8xdaily
reference_data_path_tc = /global/cscratch1/sd/forsyth/zppy_complete_run_nersc_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/atm/tc-analysis_0051_0052
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably, this change from 51_52 to 0051_0052 will get TC working on e3sm_diags_atm_monthly_180x360_aave_mvm_model_vs_model_0051-0052_vs_0051-0052.

reference_data_path_ts = /global/cscratch1/sd/forsyth/zppy_complete_run_nersc_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/atm/180x360_aave/ts/monthly
reference_data_path_ts_rof = /global/cscratch1/sd/forsyth/zppy_complete_run_nersc_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/rof/native/ts/monthly
run_type = "model_vs_model"
short_ref_name = "20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis"
swap_test_ref = False
tag = "model_vs_model"
ts_num_years_ref = 2
ts_subsection = "atm_monthly_180x360_aave"

[mpas_analysis]
active = True
anomalyRefYear = 51
climo_years ="51-55", "56-61",
enso_years = "51-55", "56-61",
mesh = "EC30to60E2r2"
parallelTaskCount = 6
# Requires a longer time limit than permitted by "haswell"
partition = haswell
ts_years = "51-55", "51-61",

[global_time_series]
active = True
climo_years ="51-55", "56-61",
experiment_name = "20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis"
figstr = "20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis"
moc_file=mocTimeSeries_0051-0061.nc
ts_num_years = 5
ts_years = "51-55", "51-61",
years = "51-61",
31 changes: 29 additions & 2 deletions examples/generate_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,42 @@
"tc_obs": "/compyfs/e3sm_diags_data/obs_for_e3sm_diags/tc-analysis",
# [e3sm_diags] > [[ atm_monthly_180x360_aave_mvm ]]
"gauges_path": "/compyfs/e3sm_diags_data/obs_for_e3sm_diags/time-series/GSIM/GSIM_catchment_characteristics_all_1km2.csv",
"ref_name": "20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis",
"reference_data_path_mvm": "/qfs/people/fors729/zppy_complete_run_compy_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/atm/180x360_aave/clim",
"reference_data_path_climo_diurnal": "/qfs/people/fors729/zppy_complete_run_compy_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/atm/180x360_aave/clim_diurnal_8xdaily",
"reference_data_path_tc": "/qfs/people/fors729/zppy_complete_run_compy_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/atm/tc-analysis_51_52",
"reference_data_path_ts": "/qfs/people/fors729/zppy_complete_run_compy_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/atm/180x360_aave/ts/monthly",
"reference_data_path_ts_rof": "/qfs/people/fors729/zppy_complete_run_compy_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/rof/native/ts/monthly",
# [mpas_analysis]
"partition_mpas": "slurm",
}
},
"cori_haswell": {
# [default]
"environment_commands": "source /global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_cori-haswell.sh",
"input": "/global/cscratch1/sd/forsyth/zppy_complete_run_nersc_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis",
"mapping_file": "/global/homes/z/zender/data/maps/map_ne30pg2_to_cmip6_180x360_aave.20200201.nc",
"output": "/global/cscratch1/sd/forsyth/zppy_complete_run_nersc_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis",
"partition": "haswell",
"www": "/global/cfs/cdirs/e3sm/www/forsyth/zppy_complete_run_nersc_output",
# [tc_analysis]
"scratch": "/global/cscratch1/sd/forsyth",
# [e3sm_diags]
"obs_ts": "/global/cfs/cdirs/e3sm/e3sm_diags/obs_for_e3sm_diags/time-series",
"reference_data_path": "/global/cfs/cdirs/e3sm/e3sm_diags/obs_for_e3sm_diags/climatology",
# [e3sm_diags] > [[ atm_monthly_180x360_aave ]]
"dc_obs_climo": "/global/cfs/cdirs/e3sm/e3sm_diags/obs_for_e3sm_diags/climatology",
"streamflow_obs_ts": "/global/cfs/cdirs/e3sm/e3sm_diags/obs_for_e3sm_diags/time-series",
# [e3sm_diags] > [[ atm_monthly_180x360_aave_tc_analysis ]]
"tc_obs": "/global/cfs/cdirs/e3sm/e3sm_diags/obs_for_e3sm_diags/tc_analysis",
# [e3sm_diags] > [[ atm_monthly_180x360_aave_mvm ]]
"gauges_path": "/global/cfs/cdirs/e3sm/e3sm_diags/obs_for_e3sm_diags/time-series/GSIM/GSIM_catchment_characteristics_all_1km2.csv",
"reference_data_path_mvm": "/global/cscratch1/sd/forsyth/zppy_complete_run_nersc_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/atm/180x360_aave/clim",
"reference_data_path_climo_diurnal": "/global/cscratch1/sd/forsyth/zppy_complete_run_nersc_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/atm/180x360_aave/clim_diurnal_8xdaily",
"reference_data_path_tc": "/global/cscratch1/sd/forsyth/zppy_complete_run_nersc_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/atm/tc-analysis_51_52",
"reference_data_path_ts": "/global/cscratch1/sd/forsyth/zppy_complete_run_nersc_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/atm/180x360_aave/ts/monthly",
"reference_data_path_ts_rof": "/global/cscratch1/sd/forsyth/zppy_complete_run_nersc_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/rof/native/ts/monthly",
# [mpas_analysis]
"partition_mpas": "haswell",
},
}


Expand Down
18 changes: 15 additions & 3 deletions zppy/templates/tc_analysis.bash
Original file line number Diff line number Diff line change
Expand Up @@ -59,16 +59,21 @@ fi
mkdir -p $result_dir
file_name=${caseid}_${start}_${end}

which DetectNodes
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line


# Generate mesh files (.g).
echo "Run GenerateCSMesh"
GenerateCSMesh --res $res --alt --file ${result_dir}outCSne$res.g
out_type="CGLL"
# For v2 production simulation with pg2 grids:
if $pg2; then
echo "Run GenerateVolumetricMesh"
GenerateVolumetricMesh --in ${result_dir}outCSne$res.g --out ${result_dir}outCSne$res.g --np 2 --uniform
out_type="FV"
fi
echo $out_type
# Generate connectivity files (.dat)
echo "Run GenerateConnectivityFile"
GenerateConnectivityFile --in_mesh ${result_dir}outCSne$res.g --out_type $out_type --out_connect ${result_dir}connect_CSne${res}_v2.dat

# Get the list of files
Expand All @@ -78,27 +83,34 @@ cd ${drc_in};eval ls ${caseid}.$atm_name.h2.*{${start}..${end}}*.nc >${result_di
cd ${result_dir}
# Detection threshold including:
# The sea-level pressure (SLP) must be a local minimum; SLP must have a sufficient decrease (300 Pa) compared to surrounding nodes within 4 degree radius; The average of the 200 hPa and 500 hPa level temperature decreases by 0.6 K in all directions within a 4 degree radius from the location to fSLP minima
DetectNodes --verbosity 0 --in_connect ${result_dir}connect_CSne${res}_v2.dat --closedcontourcmd "PSL,300.0,4.0,0;_AVG(T200,T500),-0.6,4,0.30" --mergedist 6.0 --searchbymin PSL --outputcmd "PSL,min,0;_VECMAG(UBOT,VBOT),max,2" --timestride 1 --in_data_list ${result_dir}inputfile_${file_name}.txt --out ${result_dir}out.dat
echo "Run DetectNodes"
srun -n 32 DetectNodes --verbosity 0 --in_connect ${result_dir}connect_CSne${res}_v2.dat --closedcontourcmd "PSL,300.0,4.0,0;_AVG(T200,T500),-0.6,4,0.30" --mergedist 6.0 --searchbymin PSL --outputcmd "PSL,min,0;_VECMAG(UBOT,VBOT),max,2" --timestride 1 --in_data_list ${result_dir}inputfile_${file_name}.txt --out ${result_dir}out.dat
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

srun -n 32 is currently applied to VariableProcessor and both calls to DetectNodes. Should check to see which of the 3 are actually required. Should the srun only run on Cori or is it fine to always run? (Currently, tc_analysis_0051-0052 completes on Cori because of at least 1 of these 3 changes.)


cat ${result_dir}out.dat0* > ${result_dir}cyclones_${file_name}.txt

# Stitch all candidate nodes in time to form tracks, with a maximum distance between candidates of 6.0, minimum time steps of 6, and with a maximum gap size of one (most consecutive time steps with no associated candidate). And there is threshold for wind speed, lat and lon.
echo "Run StitchNodes"
StitchNodes --in_fmt "lon,lat,slp,wind" --in_connect ${result_dir}connect_CSne${res}_v2.dat --range 6.0 --mintime 6 --maxgap 1 --in ${result_dir}cyclones_${file_name}.txt --out ${result_dir}cyclones_stitch_${file_name}.dat --threshold "wind,>=,17.5,6;lat,<=,40.0,6;lat,>=,-40.0,6"
rm ${result_dir}cyclones_${file_name}.txt

# Generate histogram of detections
echo "Run HistogramNodes"
HistogramNodes --in ${result_dir}cyclones_stitch_${file_name}.dat --iloncol 2 --ilatcol 3 --out ${result_dir}cyclones_hist_${file_name}.nc

# Calculate relative vorticity
sed -i 's/.nc/_vorticity.nc/' ${result_dir}outputfile_${file_name}.txt
VariableProcessor --in_data_list ${result_dir}inputfile_${file_name}.txt --out_data_list ${result_dir}outputfile_${file_name}.txt --var "_CURL{4,0.5}(U850,V850)" --varout "VORT" --in_connect ${result_dir}connect_CSne${res}_v2.dat
echo "Run VariableProcessor"
srun -n 32 VariableProcessor --in_data_list ${result_dir}inputfile_${file_name}.txt --out_data_list ${result_dir}outputfile_${file_name}.txt --var "_CURL{4,0.5}(U850,V850)" --varout "VORT" --in_connect ${result_dir}connect_CSne${res}_v2.dat

DetectNodes --verbosity 0 --in_connect ${result_dir}connect_CSne${res}_v2.dat --closedcontourcmd "VORT,-5.e-6,4,0" --mergedist 2.0 --searchbymax VORT --outputcmd "VORT,max,0" --in_data_list ${result_dir}outputfile_${file_name}.txt --out ${result_dir}aew_out.dat --minlat -35.0 --maxlat 35.0
echo "Run DetectNodes"
srun -n 32 DetectNodes --verbosity 0 --in_connect ${result_dir}connect_CSne${res}_v2.dat --closedcontourcmd "VORT,-5.e-6,4,0" --mergedist 2.0 --searchbymax VORT --outputcmd "VORT,max,0" --in_data_list ${result_dir}outputfile_${file_name}.txt --out ${result_dir}aew_out.dat --minlat -35.0 --maxlat 35.0
cat ${result_dir}aew_out.dat0* > ${result_dir}aew_${file_name}.txt

echo "Run StitchNodes"
StitchNodes --in_fmt "lon,lat,VORT" --in_connect ${result_dir}connect_CSne${res}_v2.dat --range 3.0 --minlength 8 --maxgap 0 --min_endpoint_dist 10.0 --in ${result_dir}aew_${file_name}.txt --out ${result_dir}aew_stitch_5e-6_${file_name}.dat --threshold "lat,<=,25.0,8;lat,>=,0.0,8"
rm ${result_dir}aew_${file_name}.txt

echo "Run HistogramNodes"
HistogramNodes --in ${result_dir}aew_stitch_5e-6_${file_name}.dat --iloncol 2 --ilatcol 3 --nlat 256 --nlon 512 --out ${result_dir}aew_hist_${file_name}.nc
rm ${result_dir}*out.dat00*.dat
rm ${result_dir}${caseid}*.nc
Expand Down