Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase dt and reduce ndtd, if possible #133

Closed
AndyHoggANU opened this issue Jan 29, 2019 · 29 comments
Closed

Increase dt and reduce ndtd, if possible #133

AndyHoggANU opened this issue Jan 29, 2019 · 29 comments
Assignees

Comments

@AndyHoggANU
Copy link
Contributor

AndyHoggANU commented Jan 29, 2019

The current status is that with the 8485-RYF forcing, Andrew is running stably with ntdt=2 and dt=720 at all times of the year EXCEPT every August. In August, there is a crash, unless dt is reduced to 450 seconds. The crash occurs because of strong wind stress from a particular storm in that particular year.

The issue here is that we would very much like to spin our simulation up rapidly using 3-month submits and long timesteps. We acknowledge that these runtimes may not be stable for IAF, but it would be good if they could be stabilised for RYF. The crash happens in a corner of the Canadian Archipelago, but it's not in a tiny strait, meaning that it's not easy to remove by bulldozing. Andrew might be able to show some more details.

I can't see a good option here, so any ideas welcome.

aekiss added a commit to aekiss/notebooks that referenced this issue Jan 30, 2019
@aekiss
Copy link
Contributor

aekiss commented Jan 30, 2019

I've successfully completed Aug only for ice dynamic timestep dt/ndtd=225, either with (dt=675, ndtd=3; walltime 2.85hr/mo) or (dt=450s, ndtd=2; walltime 3.34hr/mo). So both cases require 1 month submissions (whereas with the usual dt=720s, ndtd=2 we get 2 months in 4.2hr). For more details see /g/data/hh5/tmp/cosima/access-om2-run-summaries/run_summary_01deg_jra55v13_ryf8485_spinup6_newtopog.csv.

The crash occurs on 12 Aug. Here's the wind stress on that day, from /g/data3/hh5/tmp/cosima/access-om2-01/01deg_jra55v13_ryf8485_spinup6_newtopog/output429/ice/OUTPUT/iceh.1938-08-12.nc (plotted with this notebook). The change in sign at 100W is just due to the tripole seam: the +y direction is roughly eastward west of 100W and westward east of 100W in this region, and +x is southward west of 100W and northward east of 100W. So the strong wind at the southern coast is blowing roughly towards the southeast.

01deg_jra55v13_ryf8485_spinup6_newtopog_aug_crash_strairx_1938-08-12

01deg_jra55v13_ryf8485_spinup6_newtopog_aug_crash_strairy_1938-08-12

The circular arc at the bottom of the map is where we have excluded very small cells (see issue #126) and the crashes occur right against this new coastline where the cells are smallest and the wind is strong, either a MOM error like this (with dt=720s, ndtd=2):

FATAL from PE  1692:  Error: salinity out of range with value    -1.108449089711E+01 at (i,j,k) = (1715,2649, 12),  (lon,lat,dpt) = ( -101.4640,   68.3377,   22.2502 m)

or (with dt=600s, ndtd=2) a CICE CFL error like this

 remap transport: bad departure points
Warning: Departure points out of bounds in remap
 my_task, i, j =         303          16          12
 dt, uvel, vvel =    300.000000000000       6.738356576947546E-002
  -3.95288435627532
 dpx, dpy =  -20.2150697308426        1185.86530688260
 HTN(i,j), HTN(i+1,j) =   4147.20482738206        4152.87866108505
 HTE(i,j), HTE(i,j+1) =   1162.71320568993        1162.90990637348
 istep1, my_task, iblk =     3407494         303           2
 Global block:        3563
 Global i and j:        1715        2657

So the ice is moving westward pretty fast (vvel=-3.95m/s) and dpy = vvdel*dt/ndtd = 1185.86m is displacement in the y direction over one ice dynamic timestep; this exceeds y cell dimension HTE(i,j) = 1162.7m, hence the CFL error.

From 11-12 Aug the polynya on the southern coast is rapidly closed up. The ice in the polynya is sparse and thin, so presumably more readily accelerated by the strong wind.
01deg_jra55v13_ryf8485_spinup6_newtopog_aug_crash_aice_1938-08-11
01deg_jra55v13_ryf8485_spinup6_newtopog_aug_crash_aice_1938-08-12

Increasing ice-ocean drag (dragio) might help slow down the ice, but this is a global parameter and would alter ice behaviour everywhere just for the sake of fixing this problem on one day in one location, so seems a bit drastic.

We could also consider removing more of the fine cells so the strong wind region becomes land. This is a very ad-hoc fix but could work and we'd probably be able to keep Dease Strait open. I'd need to check whether there are other very windy August days that could cause problems though.

@aekiss
Copy link
Contributor

aekiss commented Jan 30, 2019

I'm not sure how important Dease Strait is for water and ice transport, but it's completely closed in the 0.25deg topography, with Victoria Island joined onto Canada, so maybe something like that would be reasonable...?
screen shot 2018-12-10 at mon 10-12 9 52am

@AndyHoggANU
Copy link
Contributor Author

OK, so a possibility is definitely to remove this channel. The question is, if we remove it in the RYF, do we also remove it in future IAF cases?
@russfiedler - would appreciate some advice on whether you think we can justify removing this channel here, at least for RYF runs.
Other opinions very welcome.

@aekiss
Copy link
Contributor

aekiss commented Jan 30, 2019

I think @PaulSpence's idea to modify JRA55 forcing for that one day is the best solution - unlike the other ideas this is a transient fudge to fix a transient problem. We would need to keep careful records of what was done. And also check whether there are other days in Aug with unusually strong wind near the tripole.

@PaulSpence
Copy link
Collaborator

PaulSpence commented Jan 31, 2019 via email

@russfiedler
Copy link

It might be nicer to add some metadata to the input file telling YATM to do the clipping of the winds on the fly Alternatively, have a field with a difference/multiplication factor to be applied. The field would compress really well since is constant nearly everywhere. If the field doesn't exist YATM does nothing so there is no overhead for all other times. You won't need to keep 2 copies of the data around.

@aekiss
Copy link
Contributor

aekiss commented Jan 31, 2019

@russfiedler - I really like your scaling field idea. This is a general solution to this issue (and any others like it) that makes the modification really clear and visible. If the scaling field is in a separate file we can leave the JRA files unaltered, and apply the same scaling at multiple times if necessary. We could use atm.nml to tell YATM when to apply the scaling field, so this would be made explicit and tracked with git etc.
@nichannah - would this be easy to implement?

@nichannah
Copy link
Contributor

@aekiss, I don't think this would be hard to implement. I like the fact that it can be specific to a geographic area. In this case it works well because we probably don't care much about that storm in that particular location.

Another option would be to dynamically modify the timestep for this month (or the submit that contains this month). Since libaccessom2 already programmatically sets the timesteps for all models it could do this. It could also be done by Payu of course.

@aidanheerdegen
Copy link
Contributor

The scaling field idea is a good one. I wonder if it couldn't be implemented as setting a maximum value for the wind? That way the scaling not only applies only in the area of interest, it also only applies when the wind exceeds a specified value.

I am not sure how this should be implemented. If it is a simple cutoff would it create discontinuities in the wind stress curl? In which case, would you need two fields, one specifying a maximum, and the other with the weightings to apply when the maximum is exceeded?

@aekiss
Copy link
Contributor

aekiss commented Jan 31, 2019

Thanks @nichannah, automated timestep modification is an interesting idea and could be very handy in other cases.

However for this RYF spinup we would most likely exceed the walltime limit, requiring multiple shorter runs, which in turn prevents us from saving output files that are averaged over a uniform duration, unless that duration is set by the shortest runs. e.g. if we usually have 3mo/submit but need 1mo/submit through July-Sept then all our outputs will need to be 1mo averages if we want them to be uniform, requiring greater storage.

So although reducing the timestep is not a kludgey fudge like modifying the wind, I think it would be less practical for our purposes...

@nichannah
Copy link
Contributor

nichannah commented Jan 31, 2019

Regarding the creation of the scaling file if we go down that path. It only needs to contain data for the times when it is needed. i.e. the time dimension need not be the same size as the forcing.

@aidanheerdegen
Copy link
Contributor

Good point. I hadn't thought about that.

@aidanheerdegen
Copy link
Contributor

For JRA55 IAF there would need to be a pre-processing step analysing the forcing data for anomalous events at points of interest to produce the scaling file. Not a big deal, but not nothing.

@nichannah
Copy link
Contributor

@aidanheerdegen I have a feeling this analysis would end up being done mostly by hand - as Andrew has done here. Mainly because we need to make a decision about whether it is scientifically OK to modify a particular event.

@nichannah
Copy link
Contributor

I'm starting to think about implementation now... I suggest that we put each event in it's own file, then leave it up to YATM to combine these. This will made documentation and management (such as adding or removing events) a lot easier.

@aekiss
Copy link
Contributor

aekiss commented Feb 1, 2019

Yes, we already know when and where there's problematic forcing. I don't see any reason to change anything else - if it ain't broke, don't fix it...

@aidanheerdegen
Copy link
Contributor

Not feasible to do it by hand for 60+ years of 3 hourly JRA55 data. In which case the two field approach might be best: set a limit, when exceeded apply a weighting.

@aekiss
Copy link
Contributor

aekiss commented Feb 1, 2019

I'm only proposing to do this for RYF at this stage. But yes, IAF is a different kettle of fish. For that I would suggest running with the unmodified JRA55 and introducing scaling only as and when necessary, i.e. when it falls over. Hopefully it wouldn't be needed very often.

@AndyHoggANU
Copy link
Contributor Author

Hi All -- firstly, thanks to @PaulSpence for proposing this as our solution, and all the rest of you for seeing a way of how to do it cleanly. Let's not worry about how this might apply with the IAF for now -- I think we can all accept that the IAF will need lower time steps and more babysitting to cope with all this variability, but right now we just want to the RYF to run as fast as possible, and as regularly as possible.

I think I am hearing that @nichannah and @aekiss will be able to take care of this. I realise that we have reached our notional deadline of starting these simulations, but at a minimum I think we need to implement and test this solution before we kick off the RYF pinup again.

@aekiss
Copy link
Contributor

aekiss commented Feb 2, 2019

Yes @nichannah will make the YATM changes and I'll make the scaling file.

This new forcing-scaling capability could also be useful for some types of perturbation experiments.

@aekiss
Copy link
Contributor

aekiss commented Feb 5, 2019

I've made scaling files - see
/g/data/v45/aek156/notebooks/github/aekiss/notebooks/RYF.u_10.1984_1985_scale.nc and
/g/data/v45/aek156/notebooks/github/aekiss/notebooks/RYF.v_10.1984_1985_scale.nc
(we need to scale both components so we don't change the wind direction).

The scaling is Gaussian in x,y,t and generated by https://github.com/aekiss/notebooks/blob/master/make-jra55-scaling.ipynb

nichannah added a commit to COSIMA/libaccessom2 that referenced this issue Feb 5, 2019
@nichannah
Copy link
Contributor

@aekiss
Copy link
Contributor

aekiss commented Feb 5, 2019

Great, thanks @nichannah - is there an executable I can test this with?

@ofa001
Copy link

ofa001 commented Feb 6, 2019

Sorry I am a bit late joining the discussion I have been unwell, but yes it is such a large wind anomaly, I would set a maximum wind that could be applied, we did something similar in the coupled model at start up. Also as you only working on repeat annual forcing at the moment you don't know how often it will occur in the inter-annual case. It is an issue near the tri pole where the grid boxes are small but it cant be avoided. it has to be somewhere it was hidden in 1 and 0.25 grids.

aekiss added a commit to aekiss/01deg_jra55_ryf that referenced this issue Feb 7, 2019
@aekiss
Copy link
Contributor

aekiss commented Feb 8, 2019

The test run with wind scaling in /home/156/aek156/payu/01deg_jra55v13_ryf8485_spinup6_newtopog_scalewind worked - got through July-August with dt=720 and ndtd=2 which has never been possible before with RYF8485. So now we can use dt=720 and ndtd=2 for all months in RYF. This allows 2mo/submit with minimal config (2064 CPUs), and presumably 3mo/submit will be achievable with more CPUs.

Here are the wind stress components for the same day as above (but a year later) - note the reduction near the tripole but nowhere else. I used these scaling files which are Gaussian in x,y,t with xpos, ypos, tpos, xscale, yscale, tscale, amp = 260, 68, 1900-08-12 09:00:00, 4, 1, 0.5, -0.6. It is applied only briefly in time, since tscale=0.5 days.

01deg_jra55v13_ryf8485_spinup6_newtopog_scalewind_aug_crash_strairx_1939-08-12

01deg_jra55v13_ryf8485_spinup6_newtopog_scalewind_aug_crash_strairy_1939-08-12

@aidanheerdegen
Copy link
Contributor

Awesome!

The wall time was 4h30m, but you still have daily diagnostics turned on, and a lot of 3D diagnostics, so maybe that time could be brought down for spin up type runs?

Also daily ice diagnostics are still turned on. Might be good to try one of @nichannah's latest cice builds with ice compression to test it is working correctly.

It would be great to aim for 6 months/submit.

@aekiss
Copy link
Contributor

aekiss commented Feb 8, 2019

@nichannah can you point me to a minimal 0.1deg CICE build with diag compression so I can test it out?

@aekiss
Copy link
Contributor

aekiss commented Jun 5, 2019

Documentation on using this YATM-based forcing scaling is here:
https://github.com/COSIMA/access-om2/wiki/Tutorials#Scaling-the-forcing-fields

@aekiss
Copy link
Contributor

aekiss commented Aug 6, 2019

I think we can close this now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants