-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example cfg for NERSC #235
Conversation
04ddae7
to
6d76525
Compare
mapping_file = "" | ||
vars = "RIVER_DISCHARGE_OVER_LAND_LIQ" | ||
|
||
[tc_analysis] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengzhuzhang It's possible #169 (which added Tropical Cyclone analysis/diagnostics) wasn't tested on Cori. This task worked fine on the Compy example (#234). The Cori example cfg
, however, fails with the following. Any thoughts?
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(537):
MPID_Init(246).......: channel initialization failed
MPID_Init(647).......: PMI2 init failed: 1
/var/spool/slurmd/job58523789/slurm_script: line 94: 42021 Aborted DetectNodes --verbosity 0 --in_connect ${result_dir}connect_CSne${res}_v2.dat --closedcontourcmd "PSL,300.0,4.0,0;_AVG(T200,T500),-0.6,4,0.30" --mergedist 6.0 --searchbymin PSL --outputcmd "PSL,min,0;_VECMAG(UBOT,VBOT),max,2" --timestride 1 --in_data_list ${result_dir}inputfile_${file_name}.txt --out ${result_dir}out.dat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the kind of error that happens when you're using the wrong MPI (like from conda-forge). I might be on completely the wrong track but wanted to suggest that in case...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall that there was an issue when you tested the tc_analysis.bash on Cori. Not sure if this is related. but from what I can tell, on Cori compute node, it tries to initiate an MPI process with the tempestextremes call (DetectNodes), maybe you can try following? , i.g.:
"srun -n 32 DetectNodes --verbosity 0 --in_connect ${result_dir}connect_CSne${res}_v2.dat --closedcontourcmd "PSL,300.0,4.0,0;_AVG(T200,T500),-0.6,4,0.30" --mergedist 6.0 --searchbymin PSL --outputcmd "PSL,min,0;_VECMAG(UBOT,VBOT),max,2" --timestride 1 --in_data_list ${result_dir}inputfile_${file_name}.txt --out ${result_dir}out.dat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it looks like adding the srun -n 32
didn't fix anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, I would suggest to get the stand alone DetecNodes command on has-well, with and without srun
and see if the error would reproduce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also do which DetectNodes
to make sure it's coming from Spack and not from conda-forge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a draft. Making notes for reference.
@@ -59,16 +59,21 @@ fi | |||
mkdir -p $result_dir | |||
file_name=${caseid}_${start}_${end} | |||
|
|||
which DetectNodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line
@@ -78,27 +83,34 @@ cd ${drc_in};eval ls ${caseid}.$atm_name.h2.*{${start}..${end}}*.nc >${result_di | |||
cd ${result_dir} | |||
# Detection threshold including: | |||
# The sea-level pressure (SLP) must be a local minimum; SLP must have a sufficient decrease (300 Pa) compared to surrounding nodes within 4 degree radius; The average of the 200 hPa and 500 hPa level temperature decreases by 0.6 K in all directions within a 4 degree radius from the location to fSLP minima | |||
DetectNodes --verbosity 0 --in_connect ${result_dir}connect_CSne${res}_v2.dat --closedcontourcmd "PSL,300.0,4.0,0;_AVG(T200,T500),-0.6,4,0.30" --mergedist 6.0 --searchbymin PSL --outputcmd "PSL,min,0;_VECMAG(UBOT,VBOT),max,2" --timestride 1 --in_data_list ${result_dir}inputfile_${file_name}.txt --out ${result_dir}out.dat | |||
echo "Run DetectNodes" | |||
srun -n 32 DetectNodes --verbosity 0 --in_connect ${result_dir}connect_CSne${res}_v2.dat --closedcontourcmd "PSL,300.0,4.0,0;_AVG(T200,T500),-0.6,4,0.30" --mergedist 6.0 --searchbymin PSL --outputcmd "PSL,min,0;_VECMAG(UBOT,VBOT),max,2" --timestride 1 --in_data_list ${result_dir}inputfile_${file_name}.txt --out ${result_dir}out.dat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
srun -n 32
is currently applied to VariableProcessor
and both calls to DetectNodes
. Should check to see which of the 3 are actually required. Should the srun
only run on Cori or is it fine to always run? (Currently, tc_analysis_0051-0052
completes on Cori because of at least 1 of these 3 changes.)
ref_years = "51-52", | ||
reference_data_path = /global/cscratch1/sd/forsyth/zppy_complete_run_nersc_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/atm/180x360_aave/clim | ||
reference_data_path_climo_diurnal = /global/cscratch1/sd/forsyth/zppy_complete_run_nersc_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/atm/180x360_aave/clim_diurnal_8xdaily | ||
reference_data_path_tc = /global/cscratch1/sd/forsyth/zppy_complete_run_nersc_output/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/post/atm/tc-analysis_0051_0052 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably, this change from 51_52
to 0051_0052
will get TC working on e3sm_diags_atm_monthly_180x360_aave_mvm_model_vs_model_0051-0052_vs_0051-0052
.
sets = "lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","enso_diags","qbo","diurnal_cycle","annual_cycle_zonal_mean","streamflow", "zonal_mean_2d_stratosphere", | ||
streamflow_obs_ts = /global/cfs/cdirs/e3sm/e3sm_diags/obs_for_e3sm_diags/time-series | ||
|
||
[[ atm_monthly_180x360_aave_tc_analysis ]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e3sm_diags_atm_monthly_180x360_aave_tc_analysis_model_vs_obs_0051-0052
says "OK" for status but it doesn't produce a viewer. There is a FileNotFoundError: [Errno 2] No such file or directory: b'/global/cfs/cdirs/e3sm/e3sm_diags/obs_for_e3sm_diags/tc_analysis/IBTrACS.NA.v04r00.nc'
.
Example
cfg
for NERSC. This is the NERSC example for #233.