Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make JRA55-do v1.4.0 configuration(s) #155

Closed
aekiss opened this issue Aug 12, 2019 · 17 comments
Closed

Make JRA55-do v1.4.0 configuration(s) #155

aekiss opened this issue Aug 12, 2019 · 17 comments

Comments

@aekiss
Copy link
Contributor

aekiss commented Aug 12, 2019

We will need to have a 1 deg configuration that use JRA55-do v1.4.0 (rather than 1.3.0 as used at present) if we are to make an OMIP-BGC submission for CMIP6. It would also be good to use this latest version for the other resolutions.

I've requested that NCI download this to /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/.

Unfortunately this is not quite a drop-in replacement for version 1.3.0 and will involve a little work.

From Steve's email,

JRA55-do 1.4 splits runoff into liquid and solid. At GFDL we are inserting the solid runoff into the GFDL iceberg model. I do not think ACCESS-OM2 uses the iceberg model. My recommendations for ACCESS-OM2 are the following.
A/ Do not use an iceberg model at this time since it can be a bit of work to handle the bergs (e.g., they can pile up in corners similar to sea ice).
B/ Test (some work): let the solid bergs melt at their point of entry to the ocean. Prior to icebergs that is what we did in our coupled model. One might be concerned that it will make the waters near freezing and thus increase sea ice. I do not recall how important an issue this was in our older coupled models.
--Test (less work): insert the solid runoff as if it was liquid, so no need to extract the heat of fusion from the ocean. This is effectively what was done with v1.3 runoff, so it might be the least radical approach in moving from v1.3 to v1.4.

also see #120

@aekiss
Copy link
Contributor Author

aekiss commented Aug 19, 2019

NCI has downloaded updates to JRA55-do here
/g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/
including 1.3.2 and 1.4.0

@hakaseh
Copy link
Collaborator

hakaseh commented Aug 21, 2019

Since ACCESS-OM2 does not have iceberg, it sounds fine to go with the Steve's second suggestion (insert the solid runoff as if it was liquid). This can be achieved by replacing the runoff data of v1.4.0 with that of v1.3.1.

@abhisheksavita
Copy link

I think, if we are not planning to include iceberg model in ACCESS-OM2 for OMIP runs, we can use v1.3.2 forcing in @hakaseh (email note) copying below, it is very clear that total runoff is equal in both version "Please be careful that from v1.3.2 and v1.4.0, total runoff is separated into liquid water (friver) and solid ice (licalvf). To obtain total runoff, friver and licalvf should be summed. In v1.3.1, friver contains total (liquid + solid) runoff".

@aekiss
Copy link
Contributor Author

aekiss commented Aug 26, 2019

If we simply combine solid and liquid runoff it would be better to read the files from 1.4.0 and then sum them, since 1.3.1 does not extend to the present date and I'm not sure what yatm would do if the forcing for one field finishes early.

@hakaseh
Copy link
Collaborator

hakaseh commented Aug 27, 2019

I see. Then I agree to use the combined runoff of v1.4.0.

would the model code need to be modified to allow conversion from solid to liquid during simulation? or modify the runoff data? I think the former would be easier considering that the dataset will likely be updated every year.

@aekiss
Copy link
Contributor Author

aekiss commented Aug 27, 2019

I agree having the code combine solid and liquid would be more elegant.
I guess it could be toggled by a combine_runoff boolean flag in the runoff_nml namelist in atmosphere/atm.nml.
The technical details can be discussed at COSIMA/libaccessom2#25

@nichannah
Copy link
Contributor

nichannah commented Jan 24, 2020

Deveopment notes from @aekiss;

JRA55-do 1.4 support

#155

#120

COSIMA/libaccessom2#25

JRA55-do 1.3 calving flux
/g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-3/landIce/yrC/licalvf/gn/v20180412/licalvf/licalvf_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-3_gn_2007-2008-clim.nc

https://arccss.slack.com/archives/C9Q7Y1400/p1575523699033500
aekiss 4:28 PM 5 Dec 2019
Hey @russ Fiedler, @nic I'm just wondering what to do with the JRA55-do 1.4 solid runoff, which we'd decided (for now) to treat as if it was liquid (as in JRA55-do 1.3 - see #155) but sometime in the future we may want to do something more sophisticated with it, e.g. extract latent heat of fusion. I'm considering 2 options:
combine solid and liquid within libaccessom2, then pass it through the libaccessom2 runoff remapper and on to the coupler as a single field (which suits the current use case but not future), or
keep the solid and liquid separate, with separate remapping/capping and then passed as separate coupler fields, to be combined in MOM
1 looks simpler to me, and confines the changes to libaccessom2. Option 2 is better for possible future uses, but does it sound straightforward to you? I noticed there's a calving coupler field we could possibly use. Sticking points are whether we would need lower runoff caps (so the sum is not too large), and whether it is possible to control whether the latent heat of fusion is absorbed in MOM (for physical realism we'd want this, but if we wanted to mimic JRA55-do 1.3 we'd want to turn it off). Grateful for any thoughts.

Russ Fiedler 4:48 PM
I think that the fields licefw and liceht in ACCESS-CM do what you want so sending through libaccessom2 seems to be sensible, at least for the freshwater. The calculation of the heat could be done in MOM. I don't know about doing physics calculations in the coupler.

aekiss 10:22 AM 6 Dec 2019
Ah ok thanks @russ Fiedler. So MOM gets liceht from the coupler in ACCESS-CM, right?
For consistency with that, I'm thinking YATM could pass both liquid and solid runoff to CICE, and then the ACCESS-OM2 CICE driver could calculate liceht from licefw (or just set it to zero), and pass heat, liquid and solid runoff to MOM through the coupler.
How does that sound? It seems cleaner than having physics in YATM or the coupler.


CHANGES REQUIRED

configuration

  • do payu setup before committing so the manifests will be updated - and add them to repo: everything in manifests, except restarts
forcing.yaml
namcouple
  • pass solid runoff from yatm to cice
  • pass solid runoff and heat from cice to mom

libaccessom2

  • set solid runoff to zero if it doesn't exist in forcing.yaml (to support eg JRA55-do 1.3)
  • have separate runoff caps for solid and liquid - use a low cap for solid to spread it further
  • set up / update tests

cice

  • calculate heat (liceht) from solid runoff (licefw) -- initially just set this to zero

mom


example JRA55-do 1.4 files

/g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-4-0/landIce/day/licalvf/gr/v20190429/licalvf/licalvf_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-4-0_gr_20180101-20181231.nc

/g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-4-0/land/day/friver/gr/v20190429/friver/friver_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-4-0_gr_20180101-20181231.nc


test runs

  1. /home/156/aek156/payu/testing/1deg_jra55_iaf_control
    5-yr test run with latest master branches on raijin
    exe: /short/v45/aek156/sources/chuckable/access-om2-raijin/bin/yatm_2cc76e2.exe
    exe: /short/v45/aek156/sources/chuckable/access-om2-raijin/bin/fms_ACCESS-OM_46774ee_libaccessom2_2cc76e2.x
    exe: /short/v45/aek156/sources/chuckable/access-om2-raijin/bin/cice_auscom_360x300_24p_dd02b01_libaccessom2_2cc76e2.exe
    using
    JRA55-do 1.3 from /g/data1/ua8/JRA55-do/latest
cd /home/156/aek156/payu/testing/
git clone https://github.com/COSIMA/1deg_jra55_iaf.git 1deg_jra55_iaf_control
cd 1deg_jra55_iaf_control/
git checkout -b 1deg_jra55_iaf_control
nano config.yaml
nano accessom2.nml
git commit -am "5-year test using JRA55-do 1.3 from /g/data1/ua8/JRA55-do/latest"
module load payu/1.0
payu run

3999412
ran successfully

  1. /home/156/aek156/payu/testing/1deg_jra55_iaf_test_yearp1
    as for run 1, but using libaccessom2 13315f5 which is on 25-support-input4MIPs
    exe: /short/v45/aek156/sources/chuckable/access-om2-raijin/bin/yatm_13315f5.exe
    exe: /short/v45/aek156/sources/chuckable/access-om2-raijin/bin/fms_ACCESS-OM_46774ee_libaccessom2_13315f5.x
    exe: /short/v45/aek156/sources/chuckable/access-om2-raijin/bin/cice_auscom_360x300_24p_dd02b01_libaccessom2_13315f5.exe
    using
    JRA55-do 1.3 from /g/data1/ua8/JRA55-do/latest
cd /home/156/aek156/payu/testing/
git clone https://github.com/COSIMA/1deg_jra55_iaf.git 1deg_jra55_iaf_test_yearp1
cd 1deg_jra55_iaf_test_yearp1/
git checkout -b 1deg_jra55_iaf_test_yearp1
nano config.yaml
nano accessom2.nml
git commit -am "5-year test using JRA55-do 1.3 from /g/data1/ua8/JRA55-do/latest but with libaccessom2 13315f5 which handles year+1"
module load payu/1.0
payu run

3999549
ran successfully

  1. /home/156/aek156/payu/testing/1deg_jra55_iaf_test_yearp1_qv56
    https://github.com/COSIMA/1deg_jra55_iaf/tree/1deg_jra55_iaf_test_yearp1_qv56
    As for run 2, but using
    JRA55-do 1.3 from /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-3
cd /home/156/aek156/payu/testing/
git clone https://github.com/COSIMA/1deg_jra55_iaf.git 1deg_jra55_iaf_test_yearp1_qv56
cd 1deg_jra55_iaf_test_yearp1_qv56/
git checkout -b 1deg_jra55_iaf_test_yearp1_qv56
cp ../1deg_jra55_iaf_test_yearp1/config.yaml .
cp ../1deg_jra55_iaf_test_yearp1/accessom2.nml .
nano config.yaml
nano accessom2.nml
nano atmosphere/forcing.json
git commit -am "5-year test using JRA55-do 1.3 from /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-3 and with libaccessom2 13315f5 which handles year+1"
module load payu/1.0
payu run

4003192
ran successfully
TODO: check the correct forcing files were loaded

  1. /home/156/aek156/payu/testing/1deg_jra55_iaf_test_yearp1_qv56_2
    as for run 3 but using INPUT defined in config.yaml
cd /home/156/aek156/payu/testing/
git clone https://github.com/COSIMA/1deg_jra55_iaf.git 1deg_jra55_iaf_test_yearp1_qv56_2
cd 1deg_jra55_iaf_test_yearp1_qv56_2/
git checkout -b 1deg_jra55_iaf_test_yearp1_qv56_2
cp ../1deg_jra55_iaf_test_yearp1_qv56/config.yaml .
cp ../1deg_jra55_iaf_test_yearp1_qv56/accessom2.nml .
cp ../1deg_jra55_iaf_test_yearp1_qv56/atmosphere/forcing.json atmosphere/forcing.json
nano config.yaml
nano atmosphere/forcing.json
module load payu/1.0
payu setup
git add manifests/exe.yaml
git add manifests/input.yaml
git commit -am "5-year test using JRA55-do 1.3 from /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-3 and with libaccessom2 13315f5 which handles year+1, using INPUT defined in config.yaml"
payu sweep
payu run

4014601
ran successfully
TODO: check the correct forcing files were loaded

2019-12-09

checking with
https://github.com/aekiss/notebooks/blob/master/input4MIPs_testing.ipynb
Runs 1 and 2 match in global timeseries
Runs 3 and 4 match in global timeseries but runs (3,4) don't match (1,2)

https://github.com/aekiss/notebooks/blob/master/check-MRI-JRA55-do-1-3.ipynb
shows the problem is small differences at random points in 10m temperature and humidity.

So try a run which reverts these fields to the ua8 versions.

  1. /home/156/aek156/payu/testing/1deg_jra55_iaf_test_yearp1_qv56_2_hybrid
    as for run 4, using
    JRA55-do 1.3 from /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-3
    except for temperature and humidity, which use JRA55-do 1.3 from /g/data1/ua8/JRA55-do/latest
cd /home/156/aek156/payu/testing/
git clone https://github.com/COSIMA/1deg_jra55_iaf.git 1deg_jra55_iaf_test_yearp1_qv56_2_hybrid
cd 1deg_jra55_iaf_test_yearp1_qv56_2_hybrid/
git checkout -b 1deg_jra55_iaf_test_yearp1_qv56_2_hybrid
cp ../1deg_jra55_iaf_test_yearp1_qv56_2/config.yaml .
cp ../1deg_jra55_iaf_test_yearp1_qv56_2/accessom2.nml .
cp ../1deg_jra55_iaf_test_yearp1_qv56_2/atmosphere/forcing.json atmosphere/forcing.json
less ../1deg_jra55_iaf_control/atmosphere/forcing.json
nano atmosphere/forcing.json
module load payu/1.0
payu setup
git add manifests/exe.yaml
git add manifests/input.yaml
git commit -am "5-year test using JRA55-do 1.3 from /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-3 except for temperature and humidity, which use JRA55-do 1.3 from /g/data1/ua8/JRA55-do/latest, and with libaccessom2 13315f5 which handles year+1, using INPUT defined in config.yaml"
payu sweep
payu run

4051889

see #120 (comment)

comparison of runs 1-5 using these scripts

Runs 1 and 2 are identical, as expected.
Run 3 and 4 are identical, but differ from 1 and 2.
Run 5 is identical to runs 1 and 2.

@aidanheerdegen also noted above that /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-3 gives different results from /g/data/ua8/JRA55-do/v1-3.

This is because the near-surface temperature and humidity (tas and huss) are slightly different (as confirmed in run 5). Paola says qv56 is a slightly newer version (source_version = "1.3.1" in .nc attributes) than ua8 (version = "v1.3"), which explains it.

I've also confirmed that the month and time part of the filename is identical for all years (except the final one) in each variable we use in /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-3, so the pattern used in forcing.json won't miss any v1.3 files (I haven't checked this for version 1.4 yet).

So in summary I'm confident that PR COSIMA/libaccessom2#26 is ready to use.

  1. /home/156/aek156/payu/testing/1deg_jra55_iaf_test_yearp1_qv56_2_gadi
    As for 4 but on gadi ie different executables (with different hashes: MOM and CICE updated)
    exe: /g/data4/ik11/inputs/access-om2/bin/yatm_1bb8904.exe
    exe: /g/data4/ik11/inputs/access-om2/bin/fms_ACCESS-OM_66a3e59_libaccessom2_1bb8904.x
    exe: /g/data4/ik11/inputs/access-om2/bin/cice_auscom_360x300_24p_b37ea14_libaccessom2_1bb8904.exe
cd /home/156/aek156/payu/testing/
git clone https://github.com/COSIMA/1deg_jra55_iaf.git 1deg_jra55_iaf_test_yearp1_qv56_2_gadi
cd 1deg_jra55_iaf_test_yearp1_qv56_2_gadi/
git checkout gadi-transition
git checkout -b 1deg_jra55_iaf_test_yearp1_qv56_2_gadi
nano accessom2.nml
nano config.yaml
nano sync_output_to_gdata.sh
module use /g/data3/hh5/public/modules
module load conda/analysis3-unstable
# module load payu/1.0.6
payu setup
git add manifests/exe.yaml
git add manifests/input.yaml
git commit -am "5-year test on gadi using JRA55-do 1.3 from /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-3, and with libaccessom2 1bb8904 which handles year+1, using INPUT defined in config.yaml"
payu sweep
payu run

298573

  1. /home/156/aek156/payu/testing/1deg_jra55v1p4p0_iaf_test_gadi
    As for 6 but with jra55-do v1.4.0
cd /home/156/aek156/payu/testing/
git clone https://github.com/COSIMA/1deg_jra55_iaf.git 1deg_jra55v1p4p0_iaf_test_gadi
cd 1deg_jra55v1p4p0_iaf_test_gadi/
git checkout gadi-transition
git checkout -b JRA55-do-1.4.0
nano atmosphere/forcing.json
nano config.yaml
module use /g/data3/hh5/public/modules
module load conda/analysis3-unstable
# module load payu/1.0.6
payu setup
git add manifests/exe.yaml
git add manifests/input.yaml
git commit -am "set up for JRA55-do v1.4.0 - but solid runoff not included yet"
git checkout -b 1deg_jra55v1p4p0_iaf_test_gadi
## should have done "git merge JRA55-do-1.4.0" here - see below
nano config.yaml
nano sync_output_to_gdata.sh
nano accessom2.nml
payu sweep
payu run

299057

Two big surprises here:

  • A: run 6 and 7 seem identical (despite going from JRA55-do 1.3.0 to 1.4.0)
  • B: run 6/7 differ from both 1/2/5 and 3/4 (despite 6 having the same forcing as 3/4, although different executable hashes)

re. A: checking what files are being read:
/home/156/aek156/payu/testing/1deg_jra55_iaf_test_yearp1_qv56_2_gadi/archive/output000/atmosphere/log/matmxx.pe00000.log
is the same as
/home/156/aek156/payu/testing/1deg_jra55v1p4p0_iaf_test_gadi/archive/output000/atmosphere/log/matmxx.pe00000.log
and they both get files from JRA55-do v1.3.1 ie
INPUT/rsds_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-3* etc
so I've misconfigured run 7
config.yaml and forcing.json are the same as for run 6
must have forgotten to update branch
rerun 7 with:

cd /home/156/aek156/payu/testing/1deg_jra55v1p4p0_iaf_test_gadi
git merge JRA55-do-1.4.0
module use /g/data3/hh5/public/modules
module load conda/analysis3-unstable
payu setup
git add manifests/input.yaml
git commit -m "merged JRA55-do-1.4.0 branch"
payu sweep --hard
payu run

303079
This diverges from 6 much more rapidly than 6 diverges from the others, as expected as it's missing solid runoff.

re. B above: very different MOM versions were used
run 3/4 used
exe: /short/v45/aek156/sources/chuckable/access-om2-raijin/bin/yatm_13315f5.exe
exe: /short/v45/aek156/sources/chuckable/access-om2-raijin/bin/fms_ACCESS-OM_46774ee_libaccessom2_13315f5.x
exe: /short/v45/aek156/sources/chuckable/access-om2-raijin/bin/cice_auscom_360x300_24p_dd02b01_libaccessom2_13315f5.exe
but run 6 used
exe: /g/data4/ik11/inputs/access-om2/bin/yatm_1bb8904.exe
exe: /g/data4/ik11/inputs/access-om2/bin/fms_ACCESS-OM_66a3e59_libaccessom2_1bb8904.x
exe: /g/data4/ik11/inputs/access-om2/bin/cice_auscom_360x300_24p_b37ea14_libaccessom2_1bb8904.exe
There are only gadi-transition-related differences between yatm 13315f5..1bb8904
There are only gadi-transition-related differences between cice dd02b01..b37ea14
but there are lots of differences between mom 46774ee..66a3e59, e.g. FAFMIP changes in the former:

cd /home/156/aek156/github/COSIMA/access-om2/src/mom
git diff 46774ee..66a3e59

looks like Aidan based his mom gadi-transition branch on an old commit, so the gadi run 6/7 used (mostly) older MOM than the rajin run 3/4.

I've updated MOM gadi-transition to now use mom 97e3429. Do a run with this:

  1. /home/156/aek156/payu/testing/1deg_jra55v1p4p0_iaf_test_gadi_2
    As for run 7 but with updated mom exe 97e3429
cd /home/156/aek156/payu/testing/
git clone https://github.com/COSIMA/1deg_jra55_iaf.git 1deg_jra55v1p4p0_iaf_test_gadi_2
cd 1deg_jra55v1p4p0_iaf_test_gadi_2/
git checkout JRA55-do-1.4.0
git checkout gadi-transition
git checkout -b 1deg_jra55v1p4p0_iaf_test_gadi_2
git merge JRA55-do-1.4.0
nano atmosphere/forcing.json
nano config.yaml
nano sync_output_to_gdata.sh
module use /g/data3/hh5/public/modules
module load conda/analysis3-unstable
# module load payu/1.0.6
payu setup
git commit -am "set up with new mom exe for JRA55-do v1.4.0 - but solid runoff not included yet"
payu sweep
payu run

305920

Run 8 is nearly (but not quite) the same as run 7 - so the mom version makes only a small difference.
Also did a short test run (an extension of run 8) to confirm mom terminates if min_thickness is not specified

Done: repeat run 6 and 4 with the updated mom - expect them to be the same (if no raijin/gadi differences)

  • these are runs 10 and 9, respectively. (but they differ - see below)
  1. /home/156/aek156/payu/testing/1deg_jra55_iaf_test_yearp1_qv56_2_gadi_2
    As for run 6 but with updated mom exe 97e3429
cd /home/156/aek156/payu/testing/
git clone https://github.com/COSIMA/1deg_jra55_iaf.git 1deg_jra55_iaf_test_yearp1_qv56_2_gadi_2
cd 1deg_jra55_iaf_test_yearp1_qv56_2_gadi_2/
git checkout JRA55-do-1.4.0
git checkout gadi-transition
git checkout -b 1deg_jra55_iaf_test_yearp1_qv56_2_gadi_2
git merge JRA55-do-1.4.0
cp ../1deg_jra55_iaf_test_yearp1_qv56_2_gadi/atmosphere/forcing.json ./atmosphere/forcing.json
diff config.yaml ../1deg_jra55_iaf_test_yearp1_qv56_2_gadi/config.yaml
cp ../1deg_jra55_iaf_test_yearp1_qv56_2_gadi/config.yaml .
nano config.yaml  # to set       exe: /g/data4/ik11/inputs/access-om2/bin/fms_ACCESS-OM_97e3429_libaccessom2_1bb8904.x
nano accessom2.nml
nano sync_output_to_gdata.sh
module use /g/data3/hh5/public/modules
module load conda/analysis3-unstable
# module load payu/1.0.6
payu setup
git commit -am "set up with new mom exe 97e3429 for JRA55-do v1.3.1 from qv56"
payu sweep
payu run

307369 - failed - out of quota on scratch
311140 - worked
on 2019-12-20 did a 3-month run continuing from 5-yr run to test COSIMA/libaccessom2#22 (comment)
576287
this failed (as expected) with

Error in accessom2_deinit: atm and ice models are out of sync.
atm end date: 1963-04-01T00:00:00.000
ice end date: 1968-03-31T00:00:00.000

updating exes on raijin:

git clone --recursive https://github.com/COSIMA/access-om2.git
cd access-om2
git checkout --track origin/gadi-transition
git submodule update --init --recursive
git diff ee555..7f83
git checkout ee555
git checkout -b raijin-reference-for-gadi-transition
cd src/libaccessom2/
git checkout master
git pull
cd ../mom/
git checkout safer-min_thickness
git pull
cd ../cice5/
git checkout master
git pull
cd ../..
./install.sh
./hashexe-public.sh
cd control/1deg_jra55_iaf
git commit -am "reference rajin setup for comparison with gadi-transition"
git checkout -b raijin-reference-for-gadi-transition
git push -u origin raijin-reference-for-gadi-transition
cd ../1deg_jra55_ryf
git commit -am "reference rajin setup for comparison with gadi-transition"
git checkout -b raijin-reference-for-gadi-transition
git push -u origin raijin-reference-for-gadi-transition
cd ../..
git commit -am "reference rajin setup for comparison with gadi-transition"
git push -u origin raijin-reference-for-gadi-transition
  1. /home/156/aek156/payu/testing/1deg_jra55_iaf_test_yearp1_qv56_3
    as for run 4 but with updated exes
    exe: /short/public/access-om2/bin/yatm_9fda758.exe
    exe: /short/public/access-om2/bin/fms_ACCESS-OM_56467f5_libaccessom2_9fda758.x
    exe: /short/public/access-om2/bin/cice_auscom_360x300_24p_dd02b01_libaccessom2_9fda758.exe
    on raijin:
cd /home/156/aek156/payu/testing/
git clone https://github.com/COSIMA/1deg_jra55_iaf.git 1deg_jra55_iaf_test_yearp1_qv56_3
cd 1deg_jra55_iaf_test_yearp1_qv56_3/
git checkout raijin-reference-for-gadi-transition
git checkout -b 1deg_jra55_iaf_test_yearp1_qv56_3
diff ../1deg_jra55_iaf_test_yearp1_qv56_2/config.yaml config.yaml
diff ../1deg_jra55_iaf_test_yearp1_qv56_2/accessom2.nml accessom2.nml
diff ../1deg_jra55_iaf_test_yearp1_qv56_2/atmosphere/forcing.json atmosphere/forcing.json
nano config.yaml
nano accessom2.nml
nano sync_output_to_gdata.sh
module load payu/1.0
payu setup
git commit -am "5-year test using JRA55-do 1.3 from /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-3 and updated executables"
payu sweep
payu run

4107442

  • Expect runs 9,10 (1deg_jra55_iaf_test_yearp1_qv56_2_gadi_2, 1deg_jra55_iaf_test_yearp1_qv56_3)to be the same (if no raijin/gadi differences). They are similar but not identical. So there are differences between raijin and gadi - perhaps due to different order of execution with different openMPI?
  • Run 10 (1deg_jra55_iaf_test_yearp1_qv56_3) is identical to 3/4 (1deg_jra55_iaf_test_yearp1_qv56 / 1deg_jra55_iaf_test_yearp1_qv56_2) so there was no impact of the small executable changes (not surprising).
  • Run 9 (1deg_jra55_iaf_test_yearp1_qv56_2_gadi_2) is different from 6 (1deg_jra55_iaf_test_yearp1_qv56_2_gadi) so the (much more extensive) mom exe changes 46774ee..97e3429 had an effect on the model run - but these are only slightly larger (in temp_global_ave) as differences between raijin and gadi (runs 9/10), ie small and presumably due to different order of execution or something like that.

2019-12-13

test runs at higher resolution on gadi

  1. /home/156/aek156/payu/testing/gadi_test_025deg_jra55_iaf
    run for 10 days (864000 s)
cd /home/156/aek156/payu/testing/
git clone https://github.com/COSIMA/025deg_jra55_iaf.git gadi_test_025deg_jra55_iaf
cd gadi_test_025deg_jra55_iaf/
git checkout gadi-transition
git checkout -b gadi_test_025deg_jra55_iaf
nano accessom2.nml
nano sync_output_to_gdata.sh
nano config.yaml
module use /g/data3/hh5/public/modules
module load conda/analysis3-unstable
# module load payu/1.0.6
payu setup
git commit -am "10-day test run of gadi config"
payu sweep
payu run

330722
ran ok, so did payu sweep --hard, increased run length to 1 year and ran again
335840

  1. /home/156/aek156/payu/testing/gadi_test_01deg_jra55_iaf
    run for 10 days (864000 s)
cd /home/156/aek156/payu/testing/
git clone https://github.com/COSIMA/01deg_jra55_iaf.git gadi_test_01deg_jra55_iaf
cd gadi_test_01deg_jra55_iaf/
git checkout gadi-transition
git checkout -b gadi_test_01deg_jra55_iaf
nano accessom2.nml
nano sync_output_to_gdata.sh
nano config.yaml
module use /g/data3/hh5/public/modules
module load conda/analysis3-unstable
# module load payu/1.0.6
payu setup
git commit -am "10-day test run of gadi config"
payu sweep
payu run

330708
MPI_INIT failed - could not find yalla
so commented out mpirun: --mca pml yalla -x MXM_LOG_FILE=$PBS_JOBFS/mxm.log in config.yaml
and ran again
335825
died within first 3 hr with
FATAL from PE 2626: Error: temperature out of range with value -2.838495341345E+01 at (i,j,k) = (2033,1446, 31), (lo n,lat,dpt) = ( -76.7500, 19.6529, 206.5102 m)
so reduced timestep to 200s and tried again
336715
seemed to work. Took 52min walltime for 10 days
did payu sweep --hard and ran for 1 month
354187
got to 1985-01-26T00:00:00.000 but then
failed with OSError: [Errno 28] No space left on device: '/scratch/x77/aek156/access-om2/archive/gadi_test_01deg_jra55_iaf'
but du -hc /scratch/x77/ says 314Gb used by /scratch/x77
and lquota says
x77 scratch 334.02GB 79.0TB 158.0TB 43475 44782933 89565866
v14 scratch 180.33GB 494.96TB 989.92TB 1666 27360000 54720000
so I don't see what's going on here.
also noticed
JobFS requested: 10.55GB JobFS used: 8.2MB
so saved work for future reference
mv /scratch/x77/aek156/access-om2/work/gadi_test_01deg_jra55_iaf /scratch/x77/aek156/access-om2/work/gadi_test_01deg_jra55_iaf-354187
and tried again with v45 for project and shortpath, and larger JobFS (20gb)

git commit -am "use v45 and -ljobfs=20gb"
payu run

411556
1-month run completed successfully
JobFS requested: 20.0GB JobFS used: 8.2MB

also submitted MCI help request

NCI Project Code
x77
NCI System
Gadi
Description
Hi
I've had several gadi jobs fail with messages like
OSError: [Errno 28] No space left on device: '/scratch/x77/aek156/access-om2/archive/gadi_test_01deg_jra55_iaf'
but lquota says there's plenty of space:
fs Usage Quota Limit iUsage iQuota iLimit
-----------------------------------------------------------------------
x77 scratch 334.02GB 79.0TB 158.0TB 43475 44782933 89565866
so I don't understand why I'm getting this error. I'd be grateful for any suggestions.
thanks
Andrew

go this response:

a metadata storage device momentarily ran out of inode space hence dir/file creation at that point failed. 
this is since fixed and you should be able to run jobs with no further interrruptions.
  1. trying out ak-dev branch 1deg iaf
cd /home/156/aek156/payu/testing/
git clone https://github.com/COSIMA/1deg_jra55_iaf.git gadi_ak_dev_test_1deg_jra55_iaf
cd gadi_ak_dev_test_1deg_jra55_iaf/
git checkout ak-dev
git checkout -b gadi_ak_dev_test_1deg_jra55_iaf
nano sync_output_to_gdata.sh
nano config.yaml
module use /g/data3/hh5/public/modules
module load conda/analysis3-unstable
# module load payu/1.0.6
payu setup
git commit -am "5-year test run of gadi config on ak-dev branch"
payu sweep
payu run

442978
seemed to work fine.

  1. trying out ak-dev branch 025deg iaf
cd /home/156/aek156/payu/testing/
git clone https://github.com/COSIMA/025deg_jra55_iaf.git gadi_ak_dev_test_025deg_jra55_iaf
cd gadi_ak_dev_test_025deg_jra55_iaf/
git checkout ak-dev
git checkout -b gadi_ak_dev_test_025deg_jra55_iaf
nano sync_output_to_gdata.sh
nano config.yaml
module use /g/data3/hh5/public/modules
module load conda/analysis3-unstable
# module load payu/1.0.6
payu setup
git commit -am "2-year test run of gadi config on ak-dev branch"
payu sweep
payu run

457474
ran ok but sync_output_to_gdata.sh ran out of memory

------------------------------------------------------------------------
Job 469355 has exceeded memory allocation on node gadi-dm-01.gadi.nci.org.au
Process "bash", pid 36464, rss 2179072, vmem 20774912
Process "469355.gadi-pbs", pid 36510, rss 1789952, vmem 10141696
Process "rsync", pid 36516, rss 1888256, vmem 24981504
Process "rsync", pid 36517, rss 1216512, vmem 24363008
Process "rsync", pid 36522, rss 925696, vmem 24952832
------------------------------------------------------------------------
For more information visit https://opus.nci.org.au/x/SwGRAQ
------------------------------------------------------------------------
-bash: line 1: 36510 Killed                  /local/spool/pbs/mom_priv/jobs/469355.gadi-pbs.SC
Resource Usage on 2019-12-17 08:27:19:
Job Id:             469355.gadi-pbs
Project:            v45
Exit Status:        137 (Linux Signal 9 SIGKILL Kill, unblockable)
Service Units:      0.02
NCPUs Requested:    1                      NCPUs Used: 1
                         CPU Time Used: 00:00:25
Memory Requested:   2.0GB                 Memory Used: 2.0GB
Walltime requested: 01:00:00            Walltime Used: 00:00:30
JobFS requested:    100.0MB                JobFS used: 0B

increased memory request from 2Gb to 4Gb in sync_output_to_gdata.sh
this ran ok but seems to have used all memory allocated to it:
Memory Requested: 4.0GB Memory Used: 4.0GB

  1. trying out ak-dev branch 01deg iaf: gadi_ak_dev_test_01deg_jra55_iaf
cd /home/156/aek156/payu/testing/
git clone https://github.com/COSIMA/01deg_jra55_iaf.git gadi_ak_dev_test_01deg_jra55_iaf
cd gadi_ak_dev_test_01deg_jra55_iaf/
git checkout ak-dev
git checkout -b gadi_ak_dev_test_01deg_jra55_iaf
nano sync_output_to_gdata.sh
nano config.yaml
module use /g/data3/hh5/public/modules
module load conda/analysis3-unstable
# module load payu/1.0.6
payu setup
git commit -am "1-month test run of gadi config on ak-dev branch with dt=200s"
payu sweep
payu run

457501
this failed with
Job 457501 has exceeded memory allocation on node gadi-cpu-clx-0730.gadi.nci.org.au

Memory Requested:   3.38TB                Memory Used: 2.76TB
Walltime requested: 05:00:00            Walltime Used: 02:12:45
JobFS requested:    10.55GB                JobFS used: 8.19MB

work/atmosphere/log/matmxx.pe00000.log ends with

{ "forcing_update_field-datetime" : "1958-01-31T21:00:00.000" }
{ "field_update_data-file" : "INPUT/vas_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-3_gn_195801010000-195812312230.nc" }
{ "field_update_data-index" :        248 }
{ "checksum-matmxx-vwnd_ai-0002667600": -.3468975037E+005 }
{ "cur_exp-datetime" :  "1958-01-31T21:00:00" }
{ "cur_forcing-datetime" : "1958-01-31T21:00:00" }
cur_runtime_in_seconds    2667600
Run complete, calling deinit

so it got to the end of the run and died finishing. Did not get to collation or sync.
access-om2.out confirms this.
access-om2.err says

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
forrtl: error (78): process killed (SIGTERM)

not clear what died exactly.
sweep and try again

module use /g/data3/hh5/public/modules
module load conda/analysis3-unstable
payu sweep
payu run

528627

ob 528627 has exceeded memory allocation on node gadi-cpu-clx-1926.gadi.nci.org.au
Process "orted", pid 56499, rss 31617024, vmem 661856256
Process "fms_ACCESS-OM_9", pid 56518, rss 566235136, vmem 1227616256
...
Process "fms_ACCESS-OM_9", pid 56565, rss 567529472, vmem 1231986688
------------------------------------------------------------------------
For more information visit https://opus.nci.org.au/x/SwGRAQ
   Memory Requested:   3.38TB                Memory Used: 2.75TB

so I guess more memory is needed
Note that "memory requested" doesn't detect spikes - see https://opus.nci.org.au/display/Help/What+does+exceeded+memory+allocation+mean
set mem this in config.yaml via mem
This is the total memory requirement.
I'm requesting 5180 cpus, ie 5180/48=108 nodes, ie 3380Gb/108=31.3 Gb/node
which matches the default 31Gb/node (I wasn't setting mem explicitly)
https://payu.readthedocs.io/en/latest/config.html#configuration-settings
But gadi has 192 GB RAM/node so I can ask for way more.
So set mem: 6000GB in config.yaml and tried again.
609170
ran ok

   Memory Requested:   5.86TB                Memory Used: 2.96TB

2019-12-20

Investigating COSIMA/libaccessom2#22 (comment)
Make executables that revert
COSIMA/cice5@ab47343
on gadi:

cd /home/156/aek156/github/COSIMA/access-om2
git checkout -b iss22
cd src/cice5
git checkout -b iss22

then edit drivers/auscom/CICE_InitMod.F90 to revert COSIMA/cice5@ab473434

git commit -am "revert https://github.com/COSIMA/cice5/commit/ab473434 to investigate https://github.com/COSIMA/libaccessom2/issues/22#issuecomment-567340948"
cd ../..
./install.sh
./hashexe-public.sh

this yielded cice d3e8bdf (NB: based on gadi-transition branch, which is what I want)
/g/data4/ik11/inputs/access-om2/bin/cice_auscom_360x300_24p_d3e8bdf_libaccessom2_1bb8904.exe

  1. /home/156/aek156/payu/testing/1deg_jra55_iaf_test_yearp1_qv56_2_gadi_3
    As for run 9 (1deg_jra55_iaf_test_yearp1_qv56_2_gadi_2) but with updated cice exe d3e8bdf
cd /home/156/aek156/payu/testing/
git clone 1deg_jra55_iaf_test_yearp1_qv56_2_gadi_2 1deg_jra55_iaf_test_yearp1_qv56_2_gadi_3
cd 1deg_jra55_iaf_test_yearp1_qv56_2_gadi_3
git checkout -b 1deg_jra55_iaf_test_yearp1_qv56_2_gadi_3
cp -r /scratch/x77/aek156/access-om2/archive/1deg_jra55_iaf_test_yearp1_qv56_2_gadi_2 /scratch/x77/aek156/access-om2/archive/1deg_jra55_iaf_test_yearp1_qv56_2_gadi_3
ln -s /scratch/x77/aek156/access-om2/archive/1deg_jra55_iaf_test_yearp1_qv56_2_gadi_3 archive

then edit config.yaml to use /g/data4/ik11/inputs/access-om2/bin/cice_auscom_360x300_24p_d3e8bdf_libaccessom2_1bb8904.exe instead of b37ea14

nano sync_output_to_gdata.sh
module use /g/data3/hh5/public/modules
module load conda/analysis3-unstable
payu setup
git commit -am "use cice https://github.com/COSIMA/cice5/commit/d3e8bdf which reverts https://github.com/COSIMA/cice5/commit/ab473434 to investigate https://github.com/COSIMA/libaccessom2/issues/22#issuecomment-567340948"
payu sweep
payu run

576963
no output after 28 min - assumed it hung - killed.
terminal slow an flakey with payu sweep as well.
logged in on a new terminal and did payu run again
578180
ran ok

   Walltime requested: 03:00:00            Walltime Used: 00:04:09

conclusion (compare with run 9): COSIMA/cice5@ab473434 is the cause of the problem


for reference - not needed?

"filename": "/g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-3/atmos/3hrPt/tas/gn/v20180412/tas/tas_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-3_gn_{{year}}01010000-{{year}}12312230.nc",
float tas(time, lat, lon) ;
tas:standard_name = "air_temperature" ;
tas:long_name = "Near-Surface Air Temperature" ;
tas:comment = "near-surface (usually, 2 meter) air temperature" ;
tas:units = "K" ;
tas:cell_methods = "area: mean time: point" ;
tas:cell_measures = "area: areacella" ;
tas:history = "2018-04-11T23:29:38Z altered by CMOR: Treated scalar dimension: 'height'." ;
tas:coordinates = "height" ;
tas:missing_value = 1.e+20f ;
tas:_FillValue = 1.e+20f ;

"filename": "/g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-3/atmos/3hrPt/ts/gn/v20180412/ts/ts_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-3_gn_{{year}}01010000-{{year}}12312230.nc",
float ts(time, lat, lon) ;
ts:standard_name = "surface_temperature" ;
ts:long_name = "Surface Temperature" ;
ts:comment = "Temperature of the lower boundary of the atmosphere" ;
ts:units = "K" ;
ts:cell_methods = "area: mean time: point" ;
ts:cell_measures = "area: areacella" ;
ts:missing_value = 1.e+20f ;
ts:_FillValue = 1.e+20f ;

"filename": "/g/data1/ua8/JRA55-do/latest/t_10.{{year}}.nc",
float tas_10m(time, latitude, longitude) ;
tas_10m:units = "K" ;
tas_10m:long_name = "Near-Surface Air Temperature at the 10 meter height" ;
tas_10m:standard_name = "air_temperature" ;
tas_10m:missing_value = -9.99e+33f ;

/g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-3/ocean/day/friver/gn/v20180412/friver/friver_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-3_gn_20160101-20170101.nc
float friver(time, lat, lon) ;
friver:standard_name = "water_flux_into_sea_water_from_rivers" ;
friver:long_name = "Water Flux into Sea Water From Rivers" ;
friver:comment = "computed as the river flux of water into the ocean divided by the area of the ocean portion of the grid cell" ;
friver:units = "kg m-2 s-1" ;
friver:cell_methods = "area: mean where sea time: mean" ;
friver:cell_measures = "area: areacello" ;
friver:missing_value = 1.e+20f ;
friver:_FillValue = 1.e+20f ;

/g/data1/ua8/JRA55-do/latest/runoff_all.2017.nc
float friver(time, latitude, longitude) ;
friver:units = "kg/m2/sec" ;
friver:long_name = "Water flux into sea water from rivers" ;
friver:standard_name = "water_flux_into_sea_water_from_rivers" ;
friver:missing_value = -9.99e+33f ;

nichannah added a commit to COSIMA/libaccessom2 that referenced this issue Jan 24, 2020
nichannah added a commit to COSIMA/libaccessom2 that referenced this issue Jan 24, 2020
nichannah added a commit to COSIMA/libaccessom2 that referenced this issue Jan 25, 2020
nichannah added a commit to COSIMA/libaccessom2 that referenced this issue Jan 25, 2020
@aekiss
Copy link
Contributor Author

aekiss commented Feb 3, 2020

JRA55-do 1.4.0 (and earlier) presumably suffer from the JRA55 cyclone sign problem: #186

nichannah added a commit to COSIMA/1deg_jra55_iaf that referenced this issue Mar 13, 2020
nichannah added a commit to COSIMA/libaccessom2 that referenced this issue Apr 16, 2020
nichannah added a commit to COSIMA/libaccessom2 that referenced this issue Apr 16, 2020
nichannah added a commit to COSIMA/libaccessom2 that referenced this issue Apr 16, 2020
nichannah added a commit to COSIMA/libaccessom2 that referenced this issue Apr 16, 2020
nichannah added a commit to COSIMA/1deg_jra55_iaf that referenced this issue Apr 22, 2020
nichannah added a commit to COSIMA/1deg_jra55_iaf that referenced this issue Apr 22, 2020
@aekiss
Copy link
Contributor Author

aekiss commented May 5, 2020

Closing - the code and configurations tagged v1.4.0 use JRA55-do v1.4.0 (the RYF variants use 1990-91):
https://github.com/COSIMA/access-om2/releases/tag/v1.4.0
https://github.com/COSIMA/1deg_jra55_iaf/releases/tag/v1.4.0
https://github.com/COSIMA/1deg_jra55_ryf/releases/tag/v1.4.0
https://github.com/COSIMA/025deg_jra55_iaf/releases/tag/v1.4.0
https://github.com/COSIMA/025deg_jra55_ryf/releases/tag/v1.4.0
https://github.com/COSIMA/01deg_jra55_iaf/releases/tag/v1.4.0
https://github.com/COSIMA/01deg_jra55_ryf/releases/tag/v1.4.0
These combine the solid runoff into the liquid runoff within CICE (and also ignore the sensible and latent heat this would require), passing this total liquid runoff to MOM. This approach is the same as what is done in JRA55-do v1.3.1. There are also coupling fields set up to pass solid runoff and heat from CICE into MOM. They are unused at present (passing zeros) but ready for future configurations which take a more realistic approach to solid runoff.

As a point of reference there are also v1.3.1 tags which are nearly the same, but use JRA55-do v1.3.1.
https://github.com/COSIMA/access-om2/releases/tag/v1.3.1
https://github.com/COSIMA/1deg_jra55_iaf/releases/tag/v1.3.1
https://github.com/COSIMA/1deg_jra55_ryf/releases/tag/v1.3.1
https://github.com/COSIMA/025deg_jra55_iaf/releases/tag/v1.3.1
https://github.com/COSIMA/025deg_jra55_ryf/releases/tag/v1.3.1
https://github.com/COSIMA/01deg_jra55_iaf/releases/tag/v1.3.1
https://github.com/COSIMA/01deg_jra55_ryf/releases/tag/v1.3.1

@aekiss aekiss closed this as completed May 5, 2020
@aekiss
Copy link
Contributor Author

aekiss commented May 5, 2020

I should also note here that the ak-dev branches will soon be merged in, and these would be a better place to start for new experiments.

@aekiss
Copy link
Contributor Author

aekiss commented May 6, 2020

Here's what it looks like diagrammatically

models-diagram-clip

@hakaseh
Copy link
Collaborator

hakaseh commented May 6, 2020 via email

@nichannah
Copy link
Contributor

nichannah commented May 6, 2020

Hi @hakeseh,

the units for "Land Ice Calving Flux" and "Water Flux into Sea Water from Rivers" are both kg m-2 s-1. So there is no need to worry about density. However, as noted in the diagram above the heat needed to melt the ice is not accounted for.

@hakaseh
Copy link
Collaborator

hakaseh commented May 6, 2020 via email

@nichannah
Copy link
Contributor

nichannah commented May 7, 2020

I'm not sure if I completely understand, however the code is not doing anything with density or volume. The (mass) flux of ice is being treated exactly as if it was an equal flux of water:

runoff flux = river runoff flux + land ice flux
(kg m-2 s-1) = (kg m-2 s-1) + (kg m-2 s-1)

i.e. we are working with mass flux not volume flux.

There's a description of fields on page 11 or 12 https://climate.mri-jma.go.jp/~htsujino/docs/JRA55-do/v1_4-manual/User_manual_jra55_do_v1_4.pdf

Let me know if this doesn't make sense.

@hakaseh
Copy link
Collaborator

hakaseh commented May 7, 2020 via email

@access-hive-bot
Copy link

This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/what-are-the-inputs-access-om-needs-to-run/458/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants