-
Notifications
You must be signed in to change notification settings - Fork 26
2020 MARBL Dev team meetings
- Hi-Res run
-
CESM0010
expired, but I'm continuing the run withUGIT0016
- Currently running Oct - Dec of year 0015, should finish during this meeting
- UPDATE: job died due to machine error before finishing Oct
- Success last weekend: 3 months Friday morning before reservation started + 21 more during the reservation (no hiccups!)
- Globus has time series through 0014, as well as all logs and annual restarts. Restarts are typically from either December or January.
-
- Hi-Res diagnostics
- Merged #29 (Gary's scripts for converting -> time series + some
CaseClass
updates / testing the time series data)- Can now verify that all variables from history files are in time series as well
- No
da.indentical()
check, so no guarantee that fields match. I'm not sure this is big enough concern to warrant comparison?
-
plot_suite_maps
notebooks are way behind what's available (they only run through year 0007) - Have not yet started addressing #31 (supporting data outside my
scratch
space), but hope to find time this week
- Merged #29 (Gary's scripts for converting -> time series + some
- SE1 Hire: I don't think there is anything to discuss, but placeholder just in case
- CESM Port to GreenPlanet
- Updated
config_machines.xml
andconfig_compilers.xml
, can build the model with Intel 18 compiler + openmpi, netcdf, and pnetcdf - Working on
config_batch.xml
, have not yet successfully run the model- Need write permission to inputdata on that machine
- I think there are two different queues that actually run jobs on physically different nodes (12 cores vs 16 cores, etc), so far working towards 16-core
- Will probably need to talk to Keith M to figure out how to share setup with his group
- Everyone can run CESM 2.2.0 from my sandbox
- Everyone can copy necessary files to their own
~/.cime/
directory - I can set up CIME PR so future releases support the machine out of the box
- Updated
- Started work on
interior_tendency_compute()
call- Everything in place except forcing fields
- Pretty long to-do list even after tendencies are computed / applied
- Surface forcing is only applying January values for ndep
- Requested diagnostic is hard-coded, need to interact with
MARBL_generate_diagnostics_file.py
- Need to generate and read
marbl_in
- May need to introduce options for source of some forcings
- CO2 / alt_CO2 from coupler, not just namelist
- Dust and iron from file, not just coupler?
- Merged
development
intostable
and made cesm2.2-n00 release tag - Need to add better interface to allow updates to
marbl_domain_type
outside ofinit()
- E.g. MOM6 will need to update
zt
,zw
, anddelta_z
for every column before callinginterior_tendency_compute()
- I thought we had an issue ticket for this? I couldn't find it
- E.g. MOM6 will need to update
- Hi-res run
- Got 21 months run over the weekend, going through January 0012
- We've used ~7.9 million core hours from CESM0010, have 7.1 million left
- Feb 0012 - May 0012 is running now; started ~6:30a (first mid-week run in a while; should cost ~300k core-hours since it's in
premium
) - Time series through year 0010 is on campaign; next time short-term archiver runs I'll reshape 0011.
- Hi-res diagnostics
-
CaseClass
can now handle time series data (if time series and history both exist for same variable / year, preference to reading time series) - Making post-review changes to PR #29 that provides Gary's reshaping scripts
- Coming up next: address #31, updating
CaseClass
to work for cases outside my scratch space / archive directory; then we can compare004
to Kristen's 1 degree runs - Still to do: verify time series contains all history file fields and nothing was corrupted when written to disk
- Kevin P pointed me to an old (py2.7-old)
pyReshaper
tag that did this verification step, but tool now relies on regression tests instead - Keith, Anderson, and I chatted a little bit about possible
dask
-centric methods to easily do this verification
- Kevin P pointed me to an old (py2.7-old)
-
- No new progress, plan to get back to tackling
interior_tendency_compute()
this week.
- Hi-Res run / diagnostics
- Run update
- we're through November of year 8; I had a job start this morning in the middle of the glade / PBS issues, but it was killed a couple hours in
- we also lost ~12 hours of compute time in our reservation (a job hung Saturday night, but I didn't find out until email came in Sunday morning; then the last job finished at an awkward time where I couldn't squeeze out one more month before the reservation ended)
- After conversation with Keith, we're running four month sections that begin December, April, and August to distribute computation across jobs. Otherwise we either have {May, June, July, August} or {July, August, September, October} runs which both encompass 123 sim-days; current setup is 2x 122-day runs and 1x 121-day run. We'll be keeping December restart files on campaign.
- I have a somewhat kludgy way to read in both time series and history files that I am testing (PR #30)
- Some urgency, as I'm using 56 TB of my 60 TB scratch quota (I could ask for more, but I'd rather start using data from campaign)
- I think the goal is to eventually replace most of my logic with
intake-esm
but nothing I've done makes plugging inintake
any harder
- I also have an open PR to bring in the scripts that Gary provided
- Still to-do: more configurable directory roots rather than hard-coding my scratch and archive dirs (and now the bgcwg space on
campaign
) - Game plan for presenting tool to OS meeting tomorrow?
- I'm happy to walk through
CaseClass
and my Sanity Check notebook, and then either show some of the other plots or hand it off to someone else - I think Frank was interested in Anderson's notebook to display images as well
- I'm happy to walk through
- Run update
- Done with first pass of surface flux computation
- A few points about my ndep forcing file: took some of Keith's advice, but couldn't figure out how to avoid masking out resulting fields on MOM grid
- Kristen is working with a group that is running with CISO + cocco, and it looks like marbl_ciso_mod.F90 is missing support for explicit calcifiers... I'll help them put together a PR after they fix it
- Hi-res run
- Working on single script that submit Gary's scripts to slurm for converting
004
history files to time series - Merged Keith's latest branch (updating
004
through year 5) - Current run status: last four months of year 6 are in the
premium
queue - Anderson, Keith: anything to add?
- Working on single script that submit Gary's scripts to slurm for converting
- Kristen's issue with multiple zooplankton
- Mentioned on zulip
- She's out today, otherwise would have come to this meeting to talk about it
- Probably will set up a separate meeting soon to discuss it
- Saved state is in restart files and my branch passes ERS tests
- Surface fluxes are applied in call to
tracer_vertdiff()
- model doesn't crash, and answers do change, so there's a chance I did it right
- I'll double check with Andrew next week
- Forcing field updates for surface fluxes
- Still need to read ndep file
- Need to get dust flux and iron flux via coupler (latter will be derived from black carbon)
- Other than that, done with first pass at calling
surface_flux_compute()
(still need to come back and clean up some parts)
- Next step:
interior_tendency_compute()
- Hi-res run
- Keith and I decided to put run output in
/glade/campaign/cesm/development/bgcwg/projects/hi-res_JRA
- Still need to reach out to Gary S about best way to move / reshape / compress data
-
004
is through August 0005, latest run died in September 0005 but I think it's machine issues - Update on analysis tools? I'm a little behind
- Keith and I decided to put run output in
-
pop-tools
- Lots of discussion on zulip re: budget (I could reach out to Riley and Anna-Lena, but haven't done so yet)
- Kevin Paul has a fix for the issue with writing grids to netCDF: PR #64
- Frank asked for a new release
- I opened PRs for both NCAR/MOM6 and ESCOMP/MOM_interface
- Former is to show Andrew S where things stand, latter is to keep the NCAR MOM6 devs in the loop
- Neither is ready to be merged yet
- Driver progress:
- Almost done at first pass of loading surface flux forcing fields
- Still need to add saved state to restart files
- Second pass clean-up (I think I'll do this after getting the call to
interior_tendency_compute()
)- call
surface_flux_compute()
for multiple columns instead of one at a time - back-up options for forcing fields (some can come from namelist or coupler, others from coupler or file; hard-coding in primary option first)
- call
-
Hi-res runs
-
Progress report
- Increased node count, getting 3 months in a little over 8 hours of wallclock (should I push my luck and go for 4 months / 12 hours?)
-
003
is through June 0002,004
is through May 0002
-
Permanent location for output?
-
Each run is using 11 TB of scratch space; ~5 TB for history (including CICE) and the rest are restarts
- POP history files reach 200 TB total
- CICE history files will be another 28 TB
- January 1st restarts are 350 GB, 1st of other months are 429 GB (due to POP annual stream; once we add 5-day output that'll affect some months as well)
-
I only have 20 TB free on scratch
-
Does
/glade/campaign/cesm
make sense for it?Space Used Quota % Full # Files --------------------------------------- ----------- ----------- --------- ----------- /glade/campaign/collections/cmip/CMIP6 3016.69 TB 4096.00 TB 73.65 % 5871031 /glade/campaign/cesm 4300.58 TB 5120.00 TB 84.00 % 8123020 /glade/campaign/cgd/oce 444.54 TB 550.00 TB 80.82 % 1141773
-
-
Diagnostics
- I submitted a PR to improve testing: encourages users to setup
pre-commit
to runblack
; adds Github Actions forblack
andpytest
- Keith is working on a PR to add more plots: he's pointed out that the notebooks are getting extremely large, maybe I should get
papermill
running to break up the notebooks?
- I submitted a PR to improve testing: encourages users to setup
-
-
pop-tools
- Kevin P asked me to give an update on this repo in next week's Xdev meeting (this came up while I was on the MOM call, so I'm not totally sure what he's expecting the update to look like :)
- There's an Xdev mini-hack session to tackle low-hanging fruit tomorrow afternoon, I was going to try to fix #45 (
get_grid()
does not return something that can be written to netCDF) - After helping Frank get his new fill tool merged and then updating the tests, I'm starting to find my way around the code... hoping to keep that momentum going by trying to tackle the occasional issue ticket or PR
- I've emailed Andrew S to try to set up a meeting later this week or next week to answer some questions
- I'd like to finish up the
surface_flux_compute()
call, which still needs- Read some forcing fields from files
- Apply computed surface fluxes in MOM
- Once those questions are answered, I think
interior_tendency_compute()
will get implemented much faster - I also need to update the call to
surface_flux_compute()
so it's done once per task rather than column by column (don't need Andrew's help for this)
- I'd like to finish up the
- Should I also use my branch to set up CESM-MOM to build / run cobalt? I was thinking this could be useful for the FEISTY work
- Hi-res runs
- Slow progress
- 1 mo per 7 wallclock hours is tedious (
003
and004
just finished Nov 0001), should up increase PE count? - Long queue waits are terrible; spending days in the queue to get 7 hours on the machine
- 1 mo per 7 wallclock hours is tedious (
- Diagnostics
- Python package for development
- I'm doing lots of infrastructure, Keith has started making plots
- Current issue: binning
ocn.log
output by model day
- Slow progress
- July 28, 2020: Matt out of town
- High-res run
- Have one month with two different output sets (one with a 5-day stream, one with most of those fields in monthly stream instead)
- Can I do anything to help analyze this output? It's available on the CGD machine in
/project/oce02/mlevy/high-res_BGC_1mo/
- Can I do anything to help analyze this output? It's available on the CGD machine in
- Looks like 0.68 SYPD including output, which is 124 simulated days per twelve hours [max cheyenne walltime]
- rather than push the limits with a 4-month run, I'm thinking 3-month runs with 10 hour walltime?
- 3-month runs means 264 job submissions to get through 66 years, which is 2x 5-year with different initial condtions then continuing one of them for the last 56 years
- Any possibility of getting extension on the computer allocation? Even with a dedicated chunk of the machine there's not enough time to finish before September 30 (75-ish days remaining once Cheyenne maintenance period ends)
- Have one month with two different output sets (one with a 5-day stream, one with most of those fields in monthly stream instead)
- Release update
- Kristen's tuning updates are on
master
- For the high-res compset, I need to
- Move inputdata to correct location (currently in a
tmp/
directory) - Run
aux_pop
andaux_pop_MARBL
-
Question: the 1-month test is using
settings_latest+cocco.yaml
; do we need a 1-month run withsettings_latest.yaml
before creating the compset?
- Move inputdata to correct location (currently in a
- Kristen's tuning updates are on
- Xdev update: we're trying to highlight issues in the backlog queue that should be easy fixes; two issues from
pop-tools
appear to fit the mold. Would it be useful if xdev tackled these in a hackathon next week:-
#45:
get_grid
returns a file that cannot be written to netCDF (would use Keith's proposal from the most recent comment) -
#49: non-default
tol
value not propagated through fill call tree
-
#45:
-
marbl0.39.0 contains latest tunings (Kristen's
005
run) - Nothing else in the pipeline for the CESM 2.2.0 release
- I don't think I have much progress to report (with
glade
down I can't log in to see where things stand, but I've been focused on the CESM 2.2.0 freeze)
- Upcoming CESM 2.2 freeze: need to figure out order of POP tags
- Added complication, the entrainment update may be more than just round-off level changes
- My preferred path forward
-
Qing's entrainment update but keep default scheme in place (
langmuir_opt = 'vr12-ma'
) - New tunings (need to verify that above PR doesn't require re-tuning: another cycle? Shorter run?)
-
JRA / BGC high-res run
- Hoping to run a couple of 1-month simulations (one using new compset out of the box, other configured for our experiment)
- Do we need to figure out
langmuir_opt
first, or is that only going to affect the 1 degree?
- If
langmuir_opt = 'lf17'
is back on the table, it should be re-tested after the tuning update- Listed last only because I'm uncertain if it's necessary; can be done ahead of high-res compset (that may actually be preferable)
-
Qing's entrainment update but keep default scheme in place (
- CESM 2.1.4 release still needs a few things from POP
- Update namelist defaults to use cdf5 files rather than netcdf-4 files
- Update dt_count for SSP extension compsets
- I'm hoping to avoid thinking about these until after my MOM6 webinar talk in August (the first is actually ready to be merged, but the second may need a little more testing); I think the 2.1.4 code freeze won't happen until after the 2.2.0 release, but I'm not 100% certain about that.
- new ndep datasets for CAM (i.e., non-WACCM) SSP extension compsets (KL in charge of this)
- POP PR for updated tunings is waiting on corresponding MARBL PR
- Will also need to update the stable branch
- Can run a full month with reasonable surface forcings (includes using
T
andS
from the model physics) with correct surface values of tracers - Starting to put together talk for MOM6 webinar, but mostly focused on CESM 2.2 release
- June 16, 2020: CESM Workshop
-
Status for CESM 2.2 release
-
Plans call for several new tags
- WW3 entrainment update (Alper's responsibility)
- iron flux forcing bug (see below)
- new tunings for BGC (Kristen is waiting on iron flux forcing bug fix)
- new compset for high-res w/ BGC (see below; will include new tunings from Kristen but I have other aspects to attend to as well)
- bug in selecting
dt_count
default (POP issue #28)
-
Open PR: update iron flux forcing (tied to PR in marbl-forcing)
-
-
Current definition:
<compset> <!-- latest JRA forcing, ecosys, high-res --> <alias>GIAFECO_JRA_HR</alias> <lname>2000_DATM%JRA-1p4-2018_SLND_CICE%CICE4_POP2%ECO_DROF%JRA-1p4-2018_SGLC_SWAV</lname> </compset>
-
The existing eco + interannual forcing compset is
<compset> <!-- latest JRA forcing --> <alias>G1850ECOIAF_JRA</alias> <lname>1850_DATM%JRA-1p4-2018_SLND_CICE_POP2%ECO_DROF%JRA-1p4-2018_SGLC_WW3</lname> </compset>
-
I think that means we really want our compset to be
<compset> <!-- latest JRA forcing, ecosys, high-res --> <alias>G1850ECOIAF_JRA_HR</alias> <lname>1850_DATM%JRA-1p4-2018_SLND_CICE%CICE4_POP2%ECO_DROF%JRA-1p4-2018_SGLC_SWAV</lname> </compset>
-
-
-
JRA_HR
w/ MARBL-
using three autotrophs (no coccolithophores yet), seeing 0.77 SYPD (260k pe-hrs / simulated_year) on largest task count Alper provided:
<decomp nproc="7507" res="tx0.1v3" > <maxblocks >1</maxblocks> <bsize_x >25</bsize_x> <bsize_y >32</bsize_y> <decomptype>spacecurve</decomptype> </decomp>
-
Number above is a single day run with no output (I also ran for two days to verify initialization isn't included).
-
Just got text file from him outlining other task counts to try
-
Do we have a target SYPD? CICE is running at 1.7 SYPD (23 nodes); would need to increase task count there as well to get any faster
-
-
Still struggling with two issues in surface flux forcing
-
With surface tracer values set to 0, I tried to set forcing fields to the following (unlisted forcings set to 0):
u10_sqr = 2.5e5 atm_press = 1 xco2 = 284.7 xco2_alt_co2 = 284.7 sss = 35
But the run crashes during day 5; setting
sss = 0
instead lets me run for a full month. -
Setting surface tracer values to "true" values (
CS%tr(i,j,1,m)
) causes run to crash in day 7 (assumingsss=0`
)
-
- May 19, 2020: Matt unavailable
- JRA high-res
- Trying to track progress in real-time on Zulip
- CICE and CIME pull requests handle a few minor issues in those components
- Run is throwing errors in POP
-
NaN
in tracer tendencies was traced to bad copy ofDZT
with partial bottom cells - something in PFTs?
NaN
indiazChl
but really small values in all PFT fields... (see #50) -- few possibilities for fixing this to discuss:- Keith's suggestion (if so, can we remove
PAR_threshold
?) - Move
PAR_threshold
to settings file, and make it resolution-dependent?
- Keith's suggestion (if so, can we remove
-
-
Cleaned up configuration of MARBL per call with GFDL folks
-
Pulled MARBL out of submodules
- This caused Travis CI failures (building without access to MARBL), so I added the
_USE_MARBL_TRACERS
cpp - Currently, CESM interface always builds with
-D_USE_MARBL_TRACERS
; will put in logic to be smarter about that (and to only build MARBL itself) towards end of project
- This caused Travis CI failures (building without access to MARBL), so I added the
-
Have registered diagnostics with the model, though I may need to be smarter about pointing out 2D variables?
"ocean_model", "ECOSYS_IFRAC" [Unused] ! long_name: Ice Fraction for ecosys fluxes ! units: fraction ! cell_methods: xh:mean yh:mean zl:mean area:mean "ocean_model", "ECOSYS_IFRAC_xyave" [Unused] ! long_name: Ice Fraction for ecosys fluxes ! units: fraction ! cell_methods: zl:mean "ocean_model_z", "ECOSYS_IFRAC" [Unused] ! long_name: Ice Fraction for ecosys fluxes ! units: fraction ! cell_methods: xh:mean yh:mean z_l:mean area:mean "ocean_model_z", "ECOSYS_IFRAC_xyave" [Unused] ! long_name: Ice Fraction for ecosys fluxes ! units: fraction ! cell_methods: z_l:mean "ocean_model_rho2", "ECOSYS_IFRAC" [Unused] ! long_name: Ice Fraction for ecosys fluxes ! units: fraction ! cell_methods: xh:mean yh:mean rho2_l:mean area:mean "ocean_model_rho2", "ECOSYS_IFRAC_xyave" [Unused] ! long_name: Ice Fraction for ecosys fluxes ! units: fraction ! cell_methods: rho2_l:mean "ocean_model_d2", "ECOSYS_IFRAC" [Unused] ! long_name: Ice Fraction for ecosys fluxes ! units: fraction ! cell_methods: xh:mean yh:mean zl:mean area:mean "ocean_model_d2", "ECOSYS_IFRAC_xyave" [Unused] ! long_name: Ice Fraction for ecosys fluxes ! units: fraction ! cell_methods: zl:mean "ocean_model_z_d2", "ECOSYS_IFRAC" [Unused] ! long_name: Ice Fraction for ecosys fluxes ! units: fraction ! cell_methods: xh:mean yh:mean z_l:mean area:mean "ocean_model_z_d2", "ECOSYS_IFRAC_xyave" [Unused] ! long_name: Ice Fraction for ecosys fluxes ! units: fraction ! cell_methods: z_l:mean "ocean_model_rho2_d2", "ECOSYS_IFRAC" [Unused] ! long_name: Ice Fraction for ecosys fluxes ! units: fraction ! cell_methods: xh:mean yh:mean rho2_l:mean area:mean "ocean_model_rho2_d2", "ECOSYS_IFRAC_xyave" [Unused] ! long_name: Ice Fraction for ecosys fluxes ! units: fraction
- JRA high-res
- Division of labor for forcing / initial condition files? I'm happy to take some pre-existing scripts for generating x1 files and modify them for use w/ 0.1 degree, but don't want to duplicate labor if others are already on it
- More glade corruption?
- Mike Mills was running into file-system troubles that seemed reminiscent of errors I saw last month when testing merge of single-column MARBL branch in CESM
- CISL has re-opened my original ticket (if that link doesn't work, perhaps this one will)
- Tracking progress via github project board
- To-do: break down these big tasks into many smaller tasks. E.g. instead of add MARBL output to history file, create issues for
- running MARBL python script to generate diag list
- adding MARBL list to diag_table
- modifying fortran in driver to accumulate desired diagnostics correctly
- To-do: break down these big tasks into many smaller tasks. E.g. instead of add MARBL output to history file, create issues for
- Emailed Alistair and Andrew about bringing in MARBL as git submodule rather than using generic tracer
- Hi-res + BGC
-
#25 has been merged (brings Keith's CESM 2.1 updates for JRA to POP master) and the tag is in the plans for
cesm2_2_alpha04g
; this will let us start out ofcesm2_2_beta04
(need to add_HR
compset, put together emissions dataset, etc) - Keith asked to discuss output for the project
-
#25 has been merged (brings Keith's CESM 2.1 updates for JRA to POP master) and the tag is in the plans for
-
#338 has been merged.
-
Yay!
-
Three issues that were waiting for single-column test:
-
#53: migrate
k
loop (mostly done, three or four more function calls) -
#176: loop to
kmt
instead ofk
- #336: clean up stand-alone timer results
I could see spending some time on #53 and / or #176, though my feeling is that it's a low priority. #336 is a wishlist issue item, not something that needs attention right now
-
#53: migrate
-
-
Not BGC related: I was asked to help create a new compset for extending SSP runs
I don't want to fall down this rabbit hole, though I'm probably now in a better spot to clean this up than Alper
- I did generate initial conditions for tracers on the MOM grid last summer, and updated the slides from previous meeting accordingly
- Not much progress to report, but I really want to stop putting out small fires and attack the big fire
- March 24, 2020
- March 10, 2020 (No urgent need to meet, other projects taking precedence)
- February 25, 2020 (CGD Town Hall)
-
#338 is the stand-alone test of the
compute()
functions, just needs more documentation- Read through user guide, make sure it is all up-to-date with examples from the stand-alone driver (I think I finished this section last fall)
- Link to a page with details of POP's saved state implementation from the general saved state page?
- unit testing: need more detail on what the tests are doing
- Write up regression testing page
- More talk about CESM2 papers
- Updated nutrient plots (using different region mask for zonal means)
-
xpersist
: caching data in/glade/p/cgd/oce/projects/cesm2-marbl
- Geocat hack-a-thon
- Produce python notebooks to mimic popular examples from http://ncl.ucar.edu
- Lots of plots here (code in NCAR/GeoCAT-examples)
- Progress on tables / plots for CESM2 papers
-
cesm2-marbl
- Using
xpersist
to store time series of global averages in flux table (will also be used for time series plots) - Nutrient plots need to use the
pop-tools
region mask
- Using
- Keith update?
-
cesm2-marbl
- Working with Precious to get him using
intake-esm
for LENS study - Been talking to Matt about how to turn the cesm2-marbl repository into a more general analysis package (or packages)