Skip to content

2020 MARBL Dev team meetings

Michael Levy edited this page Nov 3, 2020 · 56 revisions

November 3, 2020

General discussion

  1. Hi-Res run
    • CESM0010 expired, but I'm continuing the run with UGIT0016
    • Currently running Oct - Dec of year 0015, should finish during this meeting
      • UPDATE: job died due to machine error before finishing Oct
    • Success last weekend: 3 months Friday morning before reservation started + 21 more during the reservation (no hiccups!)
    • Globus has time series through 0014, as well as all logs and annual restarts. Restarts are typically from either December or January.
  2. Hi-Res diagnostics
    • Merged #29 (Gary's scripts for converting -> time series + some CaseClass updates / testing the time series data)
      • Can now verify that all variables from history files are in time series as well
      • No da.indentical() check, so no guarantee that fields match. I'm not sure this is big enough concern to warrant comparison?
    • plot_suite_maps notebooks are way behind what's available (they only run through year 0007)
    • Have not yet started addressing #31 (supporting data outside my scratch space), but hope to find time this week
  3. SE1 Hire: I don't think there is anything to discuss, but placeholder just in case
  4. CESM Port to GreenPlanet
    • Updated config_machines.xml and config_compilers.xml, can build the model with Intel 18 compiler + openmpi, netcdf, and pnetcdf
    • Working on config_batch.xml, have not yet successfully run the model
      • Need write permission to inputdata on that machine
      • I think there are two different queues that actually run jobs on physically different nodes (12 cores vs 16 cores, etc), so far working towards 16-core
    • Will probably need to talk to Keith M to figure out how to share setup with his group
      • Everyone can run CESM 2.2.0 from my sandbox
      • Everyone can copy necessary files to their own ~/.cime/ directory
      • I can set up CIME PR so future releases support the machine out of the box

MOM Software Updates

  1. Started work on interior_tendency_compute() call
    • Everything in place except forcing fields
  2. Pretty long to-do list even after tendencies are computed / applied
    • Surface forcing is only applying January values for ndep
    • Requested diagnostic is hard-coded, need to interact with MARBL_generate_diagnostics_file.py
    • Need to generate and read marbl_in
    • May need to introduce options for source of some forcings
      1. CO2 / alt_CO2 from coupler, not just namelist
      2. Dust and iron from file, not just coupler?

MARBL Software Updates

  1. Merged development into stable and made cesm2.2-n00 release tag
  2. Need to add better interface to allow updates to marbl_domain_type outside of init()
    • E.g. MOM6 will need to update zt, zw, and delta_z for every column before calling interior_tendency_compute()
    • I thought we had an issue ticket for this? I couldn't find it

POP Software Updates


October 20, 2020

General discussion

  1. Hi-res run
    • Got 21 months run over the weekend, going through January 0012
    • We've used ~7.9 million core hours from CESM0010, have 7.1 million left
    • Feb 0012 - May 0012 is running now; started ~6:30a (first mid-week run in a while; should cost ~300k core-hours since it's in premium)
    • Time series through year 0010 is on campaign; next time short-term archiver runs I'll reshape 0011.
  2. Hi-res diagnostics
    • CaseClass can now handle time series data (if time series and history both exist for same variable / year, preference to reading time series)
    • Making post-review changes to PR #29 that provides Gary's reshaping scripts
    • Coming up next: address #31, updating CaseClass to work for cases outside my scratch space / archive directory; then we can compare 004 to Kristen's 1 degree runs
    • Still to do: verify time series contains all history file fields and nothing was corrupted when written to disk
      1. Kevin P pointed me to an old (py2.7-old) pyReshaper tag that did this verification step, but tool now relies on regression tests instead
      2. Keith, Anderson, and I chatted a little bit about possible dask-centric methods to easily do this verification

MOM Software Updates

  1. No new progress, plan to get back to tackling interior_tendency_compute() this week.

October 6, 2020

General discussion

  1. Hi-Res run / diagnostics
    • Run update
      • we're through November of year 8; I had a job start this morning in the middle of the glade / PBS issues, but it was killed a couple hours in
      • we also lost ~12 hours of compute time in our reservation (a job hung Saturday night, but I didn't find out until email came in Sunday morning; then the last job finished at an awkward time where I couldn't squeeze out one more month before the reservation ended)
      • After conversation with Keith, we're running four month sections that begin December, April, and August to distribute computation across jobs. Otherwise we either have {May, June, July, August} or {July, August, September, October} runs which both encompass 123 sim-days; current setup is 2x 122-day runs and 1x 121-day run. We'll be keeping December restart files on campaign.
    • I have a somewhat kludgy way to read in both time series and history files that I am testing (PR #30)
      1. Some urgency, as I'm using 56 TB of my 60 TB scratch quota (I could ask for more, but I'd rather start using data from campaign)
      2. I think the goal is to eventually replace most of my logic with intake-esm but nothing I've done makes plugging in intake any harder
    • I also have an open PR to bring in the scripts that Gary provided
    • Still to-do: more configurable directory roots rather than hard-coding my scratch and archive dirs (and now the bgcwg space on campaign)
    • Game plan for presenting tool to OS meeting tomorrow?
      • I'm happy to walk through CaseClass and my Sanity Check notebook, and then either show some of the other plots or hand it off to someone else
      • I think Frank was interested in Anderson's notebook to display images as well

MOM Software Updates

  1. Done with first pass of surface flux computation
    • A few points about my ndep forcing file: took some of Keith's advice, but couldn't figure out how to avoid masking out resulting fields on MOM grid

MARBL Software Updates

  1. Kristen is working with a group that is running with CISO + cocco, and it looks like marbl_ciso_mod.F90 is missing support for explicit calcifiers... I'll help them put together a PR after they fix it

September 22, 2020

General discussion

  1. Hi-res run
    • Working on single script that submit Gary's scripts to slurm for converting 004 history files to time series
    • Merged Keith's latest branch (updating 004 through year 5)
    • Current run status: last four months of year 6 are in the premium queue
    • Anderson, Keith: anything to add?
  2. Kristen's issue with multiple zooplankton
    • Mentioned on zulip
    • She's out today, otherwise would have come to this meeting to talk about it
    • Probably will set up a separate meeting soon to discuss it

MOM Software Updates

  1. Saved state is in restart files and my branch passes ERS tests
  2. Surface fluxes are applied in call to tracer_vertdiff()
    • model doesn't crash, and answers do change, so there's a chance I did it right
    • I'll double check with Andrew next week
  3. Forcing field updates for surface fluxes
    • Still need to read ndep file
    • Need to get dust flux and iron flux via coupler (latter will be derived from black carbon)
    • Other than that, done with first pass at calling surface_flux_compute() (still need to come back and clean up some parts)
  4. Next step: interior_tendency_compute()

September 8, 2020

General discussion

  1. Hi-res run
    • Keith and I decided to put run output in /glade/campaign/cesm/development/bgcwg/projects/hi-res_JRA
    • Still need to reach out to Gary S about best way to move / reshape / compress data
    • 004 is through August 0005, latest run died in September 0005 but I think it's machine issues
    • Update on analysis tools? I'm a little behind
  2. pop-tools
    • Lots of discussion on zulip re: budget (I could reach out to Riley and Anna-Lena, but haven't done so yet)
    • Kevin Paul has a fix for the issue with writing grids to netCDF: PR #64
    • Frank asked for a new release

MOM Software Updates

  1. I opened PRs for both NCAR/MOM6 and ESCOMP/MOM_interface
    • Former is to show Andrew S where things stand, latter is to keep the NCAR MOM6 devs in the loop
    • Neither is ready to be merged yet
  2. Driver progress:
    • Almost done at first pass of loading surface flux forcing fields
    • Still need to add saved state to restart files
    • Second pass clean-up (I think I'll do this after getting the call to interior_tendency_compute())
      • call surface_flux_compute() for multiple columns instead of one at a time
      • back-up options for forcing fields (some can come from namelist or coupler, others from coupler or file; hard-coding in primary option first)

August 25, 2020

General discussion

  1. Hi-res runs

    1. Progress report

      • Increased node count, getting 3 months in a little over 8 hours of wallclock (should I push my luck and go for 4 months / 12 hours?)
      • 003 is through June 0002, 004 is through May 0002
    2. Permanent location for output?

      • Each run is using 11 TB of scratch space; ~5 TB for history (including CICE) and the rest are restarts

        • POP history files reach 200 TB total
        • CICE history files will be another 28 TB
        • January 1st restarts are 350 GB, 1st of other months are 429 GB (due to POP annual stream; once we add 5-day output that'll affect some months as well)
      • I only have 20 TB free on scratch

      • Does /glade/campaign/cesm make sense for it?

          Space                                      Used       Quota    % Full      # Files
        --------------------------------------- ----------- ----------- --------- -----------
        /glade/campaign/collections/cmip/CMIP6   3016.69 TB  4096.00 TB   73.65 %     5871031
        /glade/campaign/cesm                     4300.58 TB  5120.00 TB   84.00 %     8123020
        /glade/campaign/cgd/oce                   444.54 TB   550.00 TB   80.82 %     1141773
    3. Diagnostics

      • I submitted a PR to improve testing: encourages users to setup pre-commit to run black; adds Github Actions for black and pytest
      • Keith is working on a PR to add more plots: he's pointed out that the notebooks are getting extremely large, maybe I should get papermill running to break up the notebooks?
  2. pop-tools

    • Kevin P asked me to give an update on this repo in next week's Xdev meeting (this came up while I was on the MOM call, so I'm not totally sure what he's expecting the update to look like :)
    • There's an Xdev mini-hack session to tackle low-hanging fruit tomorrow afternoon, I was going to try to fix #45 (get_grid() does not return something that can be written to netCDF)
    • After helping Frank get his new fill tool merged and then updating the tests, I'm starting to find my way around the code... hoping to keep that momentum going by trying to tackle the occasional issue ticket or PR

MOM Software Updates

  1. I've emailed Andrew S to try to set up a meeting later this week or next week to answer some questions
    • I'd like to finish up the surface_flux_compute() call, which still needs
      • Read some forcing fields from files
      • Apply computed surface fluxes in MOM
    • Once those questions are answered, I think interior_tendency_compute() will get implemented much faster
    • I also need to update the call to surface_flux_compute() so it's done once per task rather than column by column (don't need Andrew's help for this)
  2. Should I also use my branch to set up CESM-MOM to build / run cobalt? I was thinking this could be useful for the FEISTY work

August 11, 2020

General discussion

  1. Hi-res runs
    1. Slow progress
      • 1 mo per 7 wallclock hours is tedious (003 and 004 just finished Nov 0001), should up increase PE count?
      • Long queue waits are terrible; spending days in the queue to get 7 hours on the machine
    2. Diagnostics
      • Python package for development
      • I'm doing lots of infrastructure, Keith has started making plots
      • Current issue: binning ocn.log output by model day

No Meeting

  • July 28, 2020: Matt out of town

July 14, 2020

POP Software Updates

  1. High-res run
    • Have one month with two different output sets (one with a 5-day stream, one with most of those fields in monthly stream instead)
      • Can I do anything to help analyze this output? It's available on the CGD machine in /project/oce02/mlevy/high-res_BGC_1mo/
    • Looks like 0.68 SYPD including output, which is 124 simulated days per twelve hours [max cheyenne walltime]
      • rather than push the limits with a 4-month run, I'm thinking 3-month runs with 10 hour walltime?
      • 3-month runs means 264 job submissions to get through 66 years, which is 2x 5-year with different initial condtions then continuing one of them for the last 56 years
      • Any possibility of getting extension on the computer allocation? Even with a dedicated chunk of the machine there's not enough time to finish before September 30 (75-ish days remaining once Cheyenne maintenance period ends)
  2. Release update
    • Kristen's tuning updates are on master
    • For the high-res compset, I need to
      1. Move inputdata to correct location (currently in a tmp/ directory)
      2. Run aux_pop and aux_pop_MARBL
      3. Question: the 1-month test is using settings_latest+cocco.yaml; do we need a 1-month run with settings_latest.yaml before creating the compset?

General discussion

  1. Xdev update: we're trying to highlight issues in the backlog queue that should be easy fixes; two issues from pop-tools appear to fit the mold. Would it be useful if xdev tackled these in a hackathon next week:
    • #45: get_grid returns a file that cannot be written to netCDF (would use Keith's proposal from the most recent comment)
    • #49: non-default tol value not propagated through fill call tree

MARBL Software Updates

  1. marbl0.39.0 contains latest tunings (Kristen's 005 run)
  2. Nothing else in the pipeline for the CESM 2.2.0 release

MOM Software Updates

  1. I don't think I have much progress to report (with glade down I can't log in to see where things stand, but I've been focused on the CESM 2.2.0 freeze)

June 30, 2020

POP Software Updates

  1. Upcoming CESM 2.2 freeze: need to figure out order of POP tags
    1. New tunings
    2. JRA / BGC high-res run
    3. Qing's entrainment update
  2. Added complication, the entrainment update may be more than just round-off level changes
  3. My preferred path forward
    1. Qing's entrainment update but keep default scheme in place (langmuir_opt = 'vr12-ma')
    2. New tunings (need to verify that above PR doesn't require re-tuning: another cycle? Shorter run?)
    3. JRA / BGC high-res run
      • Hoping to run a couple of 1-month simulations (one using new compset out of the box, other configured for our experiment)
      • Do we need to figure out langmuir_opt first, or is that only going to affect the 1 degree?
    4. If langmuir_opt = 'lf17' is back on the table, it should be re-tested after the tuning update
      • Listed last only because I'm uncertain if it's necessary; can be done ahead of high-res compset (that may actually be preferable)
  4. CESM 2.1.4 release still needs a few things from POP
    1. Update namelist defaults to use cdf5 files rather than netcdf-4 files
    2. Update dt_count for SSP extension compsets
    3. I'm hoping to avoid thinking about these until after my MOM6 webinar talk in August (the first is actually ready to be merged, but the second may need a little more testing); I think the 2.1.4 code freeze won't happen until after the 2.2.0 release, but I'm not 100% certain about that.
    4. new ndep datasets for CAM (i.e., non-WACCM) SSP extension compsets (KL in charge of this)

MARBL Software Updates

  1. POP PR for updated tunings is waiting on corresponding MARBL PR
  2. Will also need to update the stable branch

MOM Software Updates

  1. Can run a full month with reasonable surface forcings (includes using T and S from the model physics) with correct surface values of tracers
  2. Starting to put together talk for MOM6 webinar, but mostly focused on CESM 2.2 release

General discussion


No Meeting

  • June 16, 2020: CESM Workshop

June 2, 2020

POP Software Updates

  1. Status for CESM 2.2 release

    • Plans call for several new tags

      1. WW3 entrainment update (Alper's responsibility)
      2. iron flux forcing bug (see below)
      3. new tunings for BGC (Kristen is waiting on iron flux forcing bug fix)
      4. new compset for high-res w/ BGC (see below; will include new tunings from Kristen but I have other aspects to attend to as well)
      5. bug in selecting dt_count default (POP issue #28)
    • Open PR: update iron flux forcing (tied to PR in marbl-forcing)

    • high-res compset

      • Current definition:

        <compset>
          <!-- latest JRA forcing, ecosys, high-res -->
          <alias>GIAFECO_JRA_HR</alias>
          <lname>2000_DATM%JRA-1p4-2018_SLND_CICE%CICE4_POP2%ECO_DROF%JRA-1p4-2018_SGLC_SWAV</lname>
        </compset>
      • The existing eco + interannual forcing compset is

        <compset>
          <!-- latest JRA forcing -->
          <alias>G1850ECOIAF_JRA</alias>
          <lname>1850_DATM%JRA-1p4-2018_SLND_CICE_POP2%ECO_DROF%JRA-1p4-2018_SGLC_WW3</lname>
        </compset>
        
      • I think that means we really want our compset to be

        <compset>
          <!-- latest JRA forcing, ecosys, high-res -->
          <alias>G1850ECOIAF_JRA_HR</alias>
          <lname>1850_DATM%JRA-1p4-2018_SLND_CICE%CICE4_POP2%ECO_DROF%JRA-1p4-2018_SGLC_SWAV</lname>
        </compset>
  2. JRA_HR w/ MARBL

    • using three autotrophs (no coccolithophores yet), seeing 0.77 SYPD (260k pe-hrs / simulated_year) on largest task count Alper provided:

      <decomp nproc="7507" res="tx0.1v3" >
        <maxblocks >1</maxblocks>
        <bsize_x   >25</bsize_x>
        <bsize_y   >32</bsize_y>
        <decomptype>spacecurve</decomptype>
      </decomp>
    • Number above is a single day run with no output (I also ran for two days to verify initialization isn't included).

    • Just got text file from him outlining other task counts to try

    • Do we have a target SYPD? CICE is running at 1.7 SYPD (23 nodes); would need to increase task count there as well to get any faster

MOM Software Updates

  1. Still struggling with two issues in surface flux forcing

    1. With surface tracer values set to 0, I tried to set forcing fields to the following (unlisted forcings set to 0):

      u10_sqr = 2.5e5
      atm_press = 1
      xco2 = 284.7
      xco2_alt_co2 = 284.7
      sss = 35

      But the run crashes during day 5; setting sss = 0 instead lets me run for a full month.

    2. Setting surface tracer values to "true" values (CS%tr(i,j,1,m)) causes run to crash in day 7 (assuming sss=0`)

General discussion

MARBL Software Updates


No Meeting

  • May 19, 2020: Matt unavailable

May 5, 2020

General discussion

  1. JRA high-res
    • Trying to track progress in real-time on Zulip
    • CICE and CIME pull requests handle a few minor issues in those components
    • Run is throwing errors in POP
      1. NaN in tracer tendencies was traced to bad copy of DZT with partial bottom cells
      2. something in PFTs? NaN in diazChl but really small values in all PFT fields... (see #50) -- few possibilities for fixing this to discuss:
        • Keith's suggestion (if so, can we remove PAR_threshold?)
        • Move PAR_threshold to settings file, and make it resolution-dependent?

MARBL Software Updates

POP Software Updates

MOM Software Updates

  1. Cleaned up configuration of MARBL per call with GFDL folks

  2. Pulled MARBL out of submodules

    • This caused Travis CI failures (building without access to MARBL), so I added the _USE_MARBL_TRACERS cpp
    • Currently, CESM interface always builds with -D_USE_MARBL_TRACERS; will put in logic to be smarter about that (and to only build MARBL itself) towards end of project
  3. Have registered diagnostics with the model, though I may need to be smarter about pointing out 2D variables?

    "ocean_model", "ECOSYS_IFRAC"  [Unused]
        ! long_name: Ice Fraction for ecosys fluxes
        ! units: fraction
        ! cell_methods: xh:mean yh:mean zl:mean area:mean
    "ocean_model", "ECOSYS_IFRAC_xyave"  [Unused]
        ! long_name: Ice Fraction for ecosys fluxes
        ! units: fraction
        ! cell_methods: zl:mean
    "ocean_model_z", "ECOSYS_IFRAC"  [Unused]
        ! long_name: Ice Fraction for ecosys fluxes
        ! units: fraction
        ! cell_methods: xh:mean yh:mean z_l:mean area:mean
    "ocean_model_z", "ECOSYS_IFRAC_xyave"  [Unused]
        ! long_name: Ice Fraction for ecosys fluxes
        ! units: fraction
        ! cell_methods: z_l:mean
    "ocean_model_rho2", "ECOSYS_IFRAC"  [Unused]
        ! long_name: Ice Fraction for ecosys fluxes
        ! units: fraction
        ! cell_methods: xh:mean yh:mean rho2_l:mean area:mean
    "ocean_model_rho2", "ECOSYS_IFRAC_xyave"  [Unused]
        ! long_name: Ice Fraction for ecosys fluxes
        ! units: fraction
        ! cell_methods: rho2_l:mean
    "ocean_model_d2", "ECOSYS_IFRAC"  [Unused]
        ! long_name: Ice Fraction for ecosys fluxes
        ! units: fraction
        ! cell_methods: xh:mean yh:mean zl:mean area:mean
    "ocean_model_d2", "ECOSYS_IFRAC_xyave"  [Unused]
        ! long_name: Ice Fraction for ecosys fluxes
        ! units: fraction
        ! cell_methods: zl:mean
    "ocean_model_z_d2", "ECOSYS_IFRAC"  [Unused]
        ! long_name: Ice Fraction for ecosys fluxes
        ! units: fraction
        ! cell_methods: xh:mean yh:mean z_l:mean area:mean
    "ocean_model_z_d2", "ECOSYS_IFRAC_xyave"  [Unused]
        ! long_name: Ice Fraction for ecosys fluxes
        ! units: fraction
        ! cell_methods: z_l:mean
    "ocean_model_rho2_d2", "ECOSYS_IFRAC"  [Unused]
        ! long_name: Ice Fraction for ecosys fluxes
        ! units: fraction
        ! cell_methods: xh:mean yh:mean rho2_l:mean area:mean
    "ocean_model_rho2_d2", "ECOSYS_IFRAC_xyave"  [Unused]
        ! long_name: Ice Fraction for ecosys fluxes
        ! units: fraction

April 21, 2020

General discussion

  1. JRA high-res
    • Division of labor for forcing / initial condition files? I'm happy to take some pre-existing scripts for generating x1 files and modify them for use w/ 0.1 degree, but don't want to duplicate labor if others are already on it
  2. More glade corruption?
    • Mike Mills was running into file-system troubles that seemed reminiscent of errors I saw last month when testing merge of single-column MARBL branch in CESM
    • CISL has re-opened my original ticket (if that link doesn't work, perhaps this one will)

MARBL Software Updates

POP Software Updates

MOM Software Updates

  1. Tracking progress via github project board
    • To-do: break down these big tasks into many smaller tasks. E.g. instead of add MARBL output to history file, create issues for
      1. running MARBL python script to generate diag list
      2. adding MARBL list to diag_table
      3. modifying fortran in driver to accumulate desired diagnostics correctly
  2. Emailed Alistair and Andrew about bringing in MARBL as git submodule rather than using generic tracer

April 7, 2020

General discussion

  1. Hi-res + BGC
    • #25 has been merged (brings Keith's CESM 2.1 updates for JRA to POP master) and the tag is in the plans for cesm2_2_alpha04g; this will let us start out of cesm2_2_beta04 (need to add _HR compset, put together emissions dataset, etc)
    • Keith asked to discuss output for the project

MARBL Software Updates

  1. #338 has been merged.

    • Yay!

    • Three issues that were waiting for single-column test:

      1. #53: migrate k loop (mostly done, three or four more function calls)
      2. #176: loop to kmt instead of k
      3. #336: clean up stand-alone timer results

      I could see spending some time on #53 and / or #176, though my feeling is that it's a low priority. #336 is a wishlist issue item, not something that needs attention right now

POP Software Updates

  1. Not BGC related: I was asked to help create a new compset for extending SSP runs

    • Not too much work to put together PR #27
    • Testing uncovered issue #28

    I don't want to fall down this rabbit hole, though I'm probably now in a better spot to clean this up than Alper

MOM Software Updates

  1. I did generate initial conditions for tracers on the MOM grid last summer, and updated the slides from previous meeting accordingly
  2. Not much progress to report, but I really want to stop putting out small fires and attack the big fire

No Meetings

  • March 24, 2020
  • March 10, 2020 (No urgent need to meet, other projects taking precedence)
  • February 25, 2020 (CGD Town Hall)

February 11, 2020

General discussion

MARBL Software Updates

  1. #338 is the stand-alone test of the compute() functions, just needs more documentation
    • Read through user guide, make sure it is all up-to-date with examples from the stand-alone driver (I think I finished this section last fall)
    • Link to a page with details of POP's saved state implementation from the general saved state page?
    • unit testing: need more detail on what the tests are doing
    • Write up regression testing page

POP Software Updates

MOM Software Updates

  1. Putting together slides on the process
    • Still waiting on #338 before making more progress

January 28, 2020

General discussion

  1. More talk about CESM2 papers
    • Updated nutrient plots (using different region mask for zonal means)
    • xpersist: caching data in /glade/p/cgd/oce/projects/cesm2-marbl
  2. Geocat hack-a-thon

MARBL Software Updates

POP Software Updates

MOM Software Updates


January 14, 2020

General discussion

  1. Progress on tables / plots for CESM2 papers
    • cesm2-marbl
      1. Using xpersist to store time series of global averages in flux table (will also be used for time series plots)
      2. Nutrient plots need to use the pop-tools region mask
    • Keith update?
  2. Working with Precious to get him using intake-esm for LENS study
  3. Been talking to Matt about how to turn the cesm2-marbl repository into a more general analysis package (or packages)

MARBL Software Updates

POP Software Updates

MOM Software Updates