Skip to content

ESGF_Node|LUCIDexample

Stephen Pascoe edited this page Apr 9, 2014 · 8 revisions
Wiki Reorganisation
This page has been classified for reorganisation. It has been given the category MOVE.
The content of this page will be revised and moved to one or more other pages in the new wiki structure.

LUCID

This is an example of how the publisher should be tweaked in order to be used by a CMIP5 related project.

These are notes I made while configuring the node and publishing data for the LUCID project. They might be incomplete and/or there might be better/easier ways to achieve the same goal. Feel free to correct or comment anything in here, thanks. --estani

Summary

  • add handler
  • lucid project configuration
  • lucid model/project
  • thredds_root to new lucid root
  • thredds url too?

LUCID procedure for a standard ESGF datanode

  1. Create directory to hold project catalogs at /esg/content/thredds/lucid (make sure the user publishing has write access to it)

  2. Add a catalog reference to /esg/content/thredds/catalog.xml that points to the main location of the lucid catalog

  3. Add the model name and project to /usr/local/cdat/lib/python2.6/site-packages/esgcet-2.8.5-py2.6.egg/esgcet/config/etc/esgcet_models_table.txt (or any other file pointed at by the config file (esg.ini) in use)

lucid | MPI-ESM-LR | http://www.ileaps.org/index.php?option=com_content&task=view&id=99 | LUCID

  1. Copy a valid esg.ini that will be used for this project

  2. Alter the esg.ini the following values:

    thredds_root = /esg/content/thredds/lucid thredds_url = http://cmip2.dkrz.de/thredds/lucid thredds_root_catalog_name = LUCID catalog thredds_dataset_roots = esg_dataroot | /esg/data ... #don't delete anythng you have had here! if you do those catalogs will get erased too! lucid | /gpfs_750/projects/LUCID/data/lucid

project_options =
  cmip5 | CMIP5 / IPCC Fifth Assessment Report | 1
  ipcc4 | IPCC Fourth Assessment Report | 2
  test | Test Project | 3
  lucid | Land-Use and Climate, Identification of robust impacts | 4

Don't remove anything from _ thredds_dataset_roots _ , just add what you need. The publisher will dump catalogs from all missing entries while restarting the TDS if you do.

  1. Add the following lucid project description:

    #------------------------------------------------------------------------------------------ # Project-specific configuration # LUCID [project:lucid]

# LUCID experiments
# project | experiment_name | experiment_description
experiment_options =
  lucid | L2A26 | model run without landuse change (after yr 2005) and with atmospheric CO2 from RCP2.6 scenario
  lucid | L2A85 | model run without landuse change (after yr 2005) and with atmospheric CO2 from RCP8.5 scenario

# Define the categories to be used for this project:
#   name | category_type | is_mandatory | is_thredds_property | display_order

categories =
  project | enum | true | true | 0
  experiment | enum | true | true | 1
  product | enum | true | true | 2
  model | string | true | true | 3
  time_frequency | enum | true | true | 4
  realm | enum | true | true | 5
  cmor_table | enum | true | true | 6
  ensemble | string | true | true | 7
  institute | enum | true | true | 8
  forcing | string | false | true | 9
  title | string | false | true | 10
  creator | enum | false | false | 11
  publisher | enum | false | false | 12
  creation_time | string | false | true | 13
  format | fixed | false | true | 14
  source | text | false | false | 15
  drs_id | string | false | true | 16
  description | text | false | false | 99

category_defaults =
  product | requested

# Enumerated values
realm_options = atmos, ocean, land, landIce, seaIce, aerosol, atmosChem, ocnBgchem
time_frequency_options = yr, mon, day, 6hr, 3hr, subhr, monClim, fx
cmor_table_options = 3hr, 6hrLev, 6hrPlev, Amon, LImon, Lmon, OImon, Oclim, Omon, Oyr, aero, cf3hr, cfDay, cfMon, cfOff, cfSites, day, fx, grids
institute_options =  BCC, CAWCR, CCCMA, CMCC, CNRM-CERFACS, CSIRO-QCCCE, EC-EARTH, GFDL, GISS, INM, IPSL, LASG, MIROC, MOHC, MPI-M, MRI, NCAR, NCC, NIMR, PCMDI

product_options = output1, output2, output

# Class name of the LUCID project handler.
handler = esgcet.config.lucid_handler:LUCIDHandler

# Format of generated dataset IDs
parent_id = wdcc.lucid
dataset_id = lucid.%(product)s.%(institute)s.%(model)s.%(experiment)s.%(time_frequency)s.%(realm)s.%(cmor_table)s.%(ensemble)s

# Directory format. This is used to determine field values by matching directory names.
#directory_format = /data/publish_test/cmip5_test #not used
dataset_name_format = lucid.%(product)s.%(institute)s.%(model)s.%(experiment)s.%(time_frequency)s.%(realm)s.%(cmor_table)s.%(ensemble)s.v%(version)s

# Exclude these variables from THREDDS catalogs. They are still added to the database.
thredds_exclude_variables = a, a_bnds, alev1, alevel, alevhalf, alt40, b, b_bnds, basin, bnds, bounds_lat, bounds_lon, dbze, depth, depth0m, depth100m, depth_bnds, geo_region, height, height10m, height2m, lat, lat_bnds, latitude, latitude_bnds, layer, lev, lev_bnds, location, lon, lon_bnds, longitude, longitude_bnds, olayer100m, olevel, oline, p0, p220, p500, p560, p700, p840, plev, plev3, plev7, plev8, plev_bnds, plevs, pressure1, region, rho, scatratio, sdepth, sdepth1, sza5, tau, tau_bnds, time, time1, time2, time_bnds, vegtype

# Maps
maps = institute_map, las_time_delta_map

institute_map = map(model : institute)
  MPI-ESM-LR | MPI-M

las_time_delta_map = map(time_frequency : las_time_delta)
  yr      | 1 year
  mon     | 1 month
  day     | 1 day
  6hr     | 6 hours
  3hr     | 3 hours
  subhr   | 1 minute
  monclim | 1 month
  fx      | fixed

# Set true if files follow the IPCC standard of one variable per file.
# If set, the THREDDS metadata is organized as per-variable datasets.
# Otherwise, the datasets are assumed to be per-time.
variable_per_file = true
  1. Create the lucid handler by copying the ipcc5 one

    cp /usr/local/cdat/lib/python2.6/site-packages/esgcet-2.8.5-py2.6.egg/esgcet/config/ipcc5_handler.py /usr/local/cdat/lib/python2.6/site-packages/esgcet-2.8.5-py2.6.egg/esgcet/config/lucid_handler.py

  2. And altering the file a little bit (replacing cmip5 by lucid mostly, but warning there's a cmip5_product that needs to remain so!)

    sed -e 's#cmip5.#lucid.#' -e 's#IPCC5#LUCID#' -e 's#CMIP5#LUCID#' -i /usr/local/cdat/lib/python2.6/site-packages/esgcet-2.8.5-py2.6.egg/esgcet/config/lucid_handler.py

  • I have published the GeoMIP data followed as the steps show in http://esgf.org/wiki/ESGF_Node/LUCIDexample , used "geomip" instead of lucid, and "GeoMIP" instead of "LUCID". But there is a point must be careful, at the 8th step, after "sed" the geomip_handler.py as: sed -e 's#cmip5\.#geomip.#' -e 's#IPCC5#GeoMIP#' -e 's#CMIP5#GeoMIP#' -i /usr/local/cdat/lib/python2.6/site-packages/esgcet-2.8.5-py2.6.egg/esgcet/config/geomip_handler.py We need to change "result = (project_id[:5]=="GeoMIP")" to "result = (project_id[:6]=="GeoMIP")" in line 144 of geomip_handler.py file. If not, there would be error info "project_id must be GeoMIP" when publishing the GeoMIP data with "--project geomip". That's because the result of project_id[:5] is "GeoMI" but not "GeoMIP".

Regards,

Qizhong Wu 2012/04/16

  1. Alter the __init__.py to point to this (this can be achieved certainly simpler, but you'll have to find out how. Feel free to correct this entry if you do!)

    [ 1] from ipcc4_handler import IPCC4Handler [ 2] from ipcc5_handler import IPCC5Handler [ 3] from tamip_handler import TAMIPHandler [ 4] from obs4mips_handler import Obs4mipsHandler [ 5] from lucid_handler import LUCIDHandler [ 6] builtinProjectHandlers = { [ 7] 'basic_builtin' : BasicHandler, [ 8] 'ipcc4_builtin' : IPCC4Handler, [ 9] 'ipcc5_builtin' : IPCC5Handler, [ 10] 'lucid_builtin' : LUCIDHandler, [ 11] 'tamip_builtin' : TAMIPHandler, [ 12] 'obs4mips_builtin' : Obs4mipsHandler, [ 13] } [ 14] builtinFormatHandlers = { [ 15] 'netcdf_builtin' : CdunifFormatHandler, [ 16] }

  2. Now add the created project and model names to the database by pointing to the created esg.ini which holds information on the project

    esginitialize -c -i lucid.esg.ini

  3. If you use a map file then start ingesting the data into the variables

    esgpublish --map test.dataset.map -i lucid.esg.ini

  4. If everything looks fine then proceed crating the TDS catalogs

    esgpublish --map test.dataset.map --project lucid -i lucid.esg.ini --noscan --thredds

  5. And finally try to publish to the gateway

    esgpublish --map test.dataset.map --project lucid -i lucid.esg.ini --publish

The configuration is very tricky, so check the FAQ and the Publisher documentation if anything fails.

Clone this wiki locally