Skip to content
Matthew Harris edited this page Jan 8, 2014 · 35 revisions

Partnerships for the development of next-generation software that will provide distributed access and analysis of simulated, observed, and reanalysis data from the climate and weather communities.


##Agenda ####Joint DOE, NASA, NOAA, and IS-ENES Meeting

####Day 1: Tuesday, 3 December 2013 #####Session One: Welcome, Safety, Introductions, Opening remarks, and Awards Project Requirements and ESGF Technical Presentations and Discussions

when what who length
08:00 08:30 Arrival time Dean N. Williams (Host) 30 min
08:30 08:40 Welcome and logistics and Introductions Dean N. Williams (DOE/LLNL) 10 min
08:40 08:50 Opening remarks Dean N. Williams, ESGF & UV-CDAT (Principal Investigator) (DOE/LLNL) 10 min
08:50 09:00 ESGF & UV-CDAT Awards Dean N. Williams (DOE/LLNL) 10 min

#####Session Two: Project Presentations Community Projects & Requirements

when what who length
09:00 09:15 Coupled Model Intercomparison Project, phase 6 (CMIP6) and other MIPs Karl Taylor (DOE/LLNL) 15 min
09:20 09:30 DOE Next-Generation Earth System Model: Development Enterprise David Bader (DOE/LLNL) 10 min
09:35 09:45 NASA-DOE Connections and Program Strategy Tsendgar Lee (NASA) 10 min
09:50 10:00 ENES Community Projects Sébastien Denvil (IS-ENES/IPSL) 15 min
10:05 10:15 NOAA-Affiliated Community Projects Cecelia DeLuca (NOAA/ESRL) 10 min
10:20 10:30 From obs4MIPs to Routine Benchmarking of Climate Models Peter Gleckler (DOE/LLNL) 10 min
10:35 10:50 BREAK 15 min
10:50 11:00 Distributing Reanalysis Data on ESGF Jerry Potter (NASA/Goddard) 10 min
11:05 11:15 ES-DOC Requirements for ESGF Eric Guilyardi (IS-ENES/IPSL) 10 min
11:20 11:35 Introduction to the NASA Computational Modeling; CyberInfrastrucutre (CMAC) program and architecture; and Regional Climate Model Evaluation System (RCMES) Chris Mattmann (NASA/JPL) 15 min
11:40 11:50 Climate Impact Portal Stephen Pascoe (IS-ENES/BADC) 10 min
11:55 12:05 Scaling the Comparison of Models and Observations for CMIP6 Dan Crichton (NASA/JPL) 10 min
12:10 12:30 DISCUSSION 20 min
12:30 13:20 LUNCH 50 min

#####Session Three: ESGF Technical Presentations ESGF Community Improvements

when what who length
13:20 13:30 ESGF Infrastructure - next steps and gaps? Luca Cinquini (NASA/ JPL) 10 min
13:35 13:45 Demonstration of Update Manager for ESGF Prashanth Dwarakanath (IS-ENES/NSC) 10 min
13:50 14:00 Reproducible Offline ESGF Node Installation: Towards a Multi-Release Installation System for ESGF Stephen Pascoe (IS-ENES/BADC) 10 min
14:05 14:15 ESGF Publication: Improving User Experience Rachana Ananthakrishnan (DOE/ANL) and Luca Cinquini (NASA/JPL) 10 min
14:20 14:30 Observational Data and Metadata Publication into ESGF Misha Krassovski et al. (DOE/ORNL) 10 min
  • | ESGF Data Versioning and Fixity: Past, Present, and Future | Stephen Pascoe (IS-ENES/BADC) | 10 min 14:35 14:45 | Automated Replication Procedures | Stephan Kindermann (IS-ENES/DKRZ) and Jeff Painter (DOE/LLNL) | 10 min 14:50 15:00 | Globus for Data Transfer in ESGF | Rachana Ananthakrishnan (DOE/ANL) and Eric Blau (DOE/ANL) | 10 min 15:05 15:20 | BREAK | | 15 min 15:20 15:30 | Persistent Identifiers for Data Management and Data Citation: from PIDs to DOIs | Stephan Kinderman (IS-ENES/DKRZ) | 10 min 15:35 15:45 | ESGF Security Infrastructure | Rachana Anaathakrishnan (DOE/ANL) and Phil Kershaw (IS-ENES/BADC) | 10 min 15:50 16:00 | Collecting Downloads and User Statistics from Federated Node Sites; and System Monitoring | Gavin M. Bell and Sandro Fiore | 10 min 16:05 16:15 | Enhancing the User Interface (UI) with CoG | Luca Cinquini (NASA/JPL), Cecelia DeLuca (NOAA/ESRL), and Sylvia Murphy (NOAA/ESRL) | 10 min 16:20 17:20 | DISCUSSION | Gavin M. Bell | 60 min 17:20 17:30 | Closing Comments and Adjourn Day 1 | | 10 min 18:00 21:00 | Workshop Dinner: Self-Organizing | | 2-3 hrs

####Day 2: Wednesday, 4 December 2013 #####Session Four: Welcome, Safety Data Center Requirements and UV-CDAT Technical Presentations and Discussions

when what who length
08:00 08:30 Arrival time Dean N. Williams (Host) 30 min
08:30 08:35 Recap of Day 1 Introduction of Day 2 Dean N. Williams (DOE/LLNL) 5 min

#####Session Five: Modeling and Data Center Presentations Modeling and Data Center Requirements

when what who length
08:35 08:45 Australia Ben Evans (ANU/NCI) 10 min
08:50 09:00 Germany Stephan Kindermann (IS-ENES/DKRZ) 10 min
09:05 09:15 France Sébastien Denvil (IS-ENES/IPSL) 10 min
09:20 09:30 The Status of ESGF-BNU Node in China Qizhong Wu (China) 10 min
09:35 09:45 Publishing Non-Conforming Data Products into ESGF Eric Nienhouse, Douglas Schuster, Don Middleton, Steve Worley (NSF/NCAR) 10 min
09:50 10:00 European Collaborative Data Infrastructure (EUDAT) Stephan Kindermann (IS-ENES/DKRZ) 10 min
10:05 10:20 BREAK 15 min
10:20 10:40 DISCUSSION 20 min

#####Session Six: UV-CDAT Technical Presentations UV-CDAT Community Improvements

when what who length
10:40 10:50 UV-CDAT Infrastructure: Next Steps and Gaps Dean Williams (DOE/ LLNL) and Claudio Silva (NYU-Poly) 10 min
10:55 11:05 Attaining Software Quality in Large Projects with Numerous Developers Dave DeMarle (Kitware) 10 min
11:10 11:20 Building the Next Generation UV-CDAT: A Web-based Informatics Platform for Climate Data Analysis and Visualization Elo Leung (DOE/LLNL) 10 min
11:25 11:35 Remote Rendering and Analysis for Big Data using ParaViewWeb Aashish Chaudhary 10 min
11:40 11:50 Diagnostics and Metrics Jeff Painter and Charles Doutriaux (DOE/LLNL), and Brian Smith (DOE/ORNL) 10 min
11:55 12:05 UV-CDAT Live Exploratory Analysis John Harney (DOE/ORNL) 10 min
12:10 12:20 Streaming Processing for Remote Data Exploration and Analysis Timo Bremer (DOE/LLNL) 10 min
12:25 13:15 LUNCH 50 min
13:15 13:25 VisTrails Workflow Parallelism & Scripting David Koop et al. (NYU-Poly) 10 min
13:30 13:40 Current and Future Directions for Parallel Processing in the UV-CDAT Framework Brian Smith (DOE/ORNL) 10 min
13:45 13:55 Spaito-Temporal Pipeline for Parallel Visualization and Analysis in ParaView Curt Canada (DOE/LLNL) 10 min
14:00 14:10 Parallel Rendering and Data Processing in ParaView Dave DeMarle 10 min
14:15 14:25 Unstructured Climate Data Visualization with DV3D: Next-Generation DV3D Tom Maxwell (NASA/Goddard) Jerry Potter (NASA/Goddard) 10 min
14:30 14:40 Data Analysis Tools (DAT) Ben Barnett (NYU-Poly) 10 min
14:45 15:45 DISCUSSION Charles Doutriaux 60 min
15:45 16:00 BREAK 15 min
16:00 17:00 DEMO 60 min
17:00 17:20 Open Discussion: Feedback from Demos & Collaboration Opportunities 20 min
17:20 17:30 Closing Comments and Adjourn Day 2 10 min
18:00 21:00 Workshop Dinner: Self-Organizing 2-3 hrs

####Day 3: Thursday, 5 December 2013 #####Session Seven: Welcome, Safety Technical Interoperability Discussions

when what who length
08:00 08:30 Arrival time Dean N. Williams (Host) 30 min
08:30 08:35 Recap of Day 1 & 2; Introduction of Day 3 Dean N. Williams (DOE/LLNL) 5 min

#####Session Eight: Community Improvements Technical Presentations UV-CDAT Community Improvements

when what who length
08:35 08:45 Data Transfer Performance in Preparing for CMIP6 Eli Dart (DOE/ESnet) 10 min
08:50 09:00 ES-DOC: Documentation Eco-System Progress Mark Greenslade (IS-ENES/IPSL) 10 min
09:05 09:15 Developing a systematic database of observations for evaluating climate models Huikyo Lee, Chris Mattmann, Duane Waliser and Daniel Crichton (NASA/JPL) 10 min
09:20 09:30 Driving DOE ESM: The Game Changing GUI Renata McCoy (DOE/LLNL) 10 min
09:35 09:45 Data Reduction Methods for Large Streaming Data Alex Sim (DOE/LBNL) 10 min
09:50 10:00 Cloud Installation Ben Evans (ANU/NCI) 10 min
10:05 10:15 ESGF Cluster “Reign Clouds” Gavin M. Bell (DOE/LLNL) 10 min
10:20 10:35 BREAK 15 min
10:35 10:45 Overview of OpenClimateGIS and ClimateTranslator: A Python Library and Web Interface for Geospatial Manipulations of CF Climate Data sets Benjamin Koziol et al. (NASA/JPL) 10 min
10:50 11:00 Using a next-generation climate architecture in education Jae Young Bang (USC) 10 min
11:05 11:15 Server-side Data Processing with PyWPS Luca Cinquini (NASA/JPL) 10 min
11:20 11:30 ClimatePipes: User-Friendly Data Access, Manipulation, Analysis & Visualization of Community Climate Models Aashish Chaudhary (Kitware) 10 min
11:35 12:30 DISCUSSION Charles Doutriaux 55 min
12:30 13:20 LUNCH 50 min
13:20 14:45 Open Discussion: Feedback & Collaboration Opportunities 85 min
14:45 15:00 Closing Remarks and Adjourn Meeting 15 min
15:00 17:00 Optional Working Group Breakout Sessions 2 hrs

##Presentation List

  • Presentation Title
  • Author(s)
  • E-Mail
  • Organization(s)
  • Abstract

####COMMUNITY PROJECTS & REQUIREMENTS

  • The Increasing WCRP Reliance on ESGF Poses Challenges

  • Karl Taylor

  • [email protected]

  • DOE/LLNL

  • The World Climate Research Programme’s (WCRP’s) increasing reliance on ESGF to satisfy the climate community’s demand for easy access to climate model and observational data poses new developmental and operational challenges for ESGF. Some of the immediate and longer-term priorities are presented with particular emphasis on CMIP and “satellite” MIPs. Following a review of the successes of CMIP5 and a summary of ESGF-related feedback from the CMIP5 research community, some near-term requirements have become evident. Priorities include: 1) simplifying procedures for obtaining needed data, and reducing the time required to obtain it, 2) better engagement of and communication with modeling center data nodes, 3) new capabilities for quantifying usage and impact of data, 4) UI’s specialized to individual projects, and 5) documenting, recording, and citing datasets used in research.

  • DOE Earth System Model (Project Enterprise)

  • NASA-DOE Connections and Program Strategy

  • ENES Community Projects

    • Sébastien Denvil
    • [email protected]
    • IS-ENES/IPSL
    • "This talk will present the overall strategy of the ENES community (European Network for Earth System modeling) and will particularly focus on its infrastructure activity and project : IS-ENES (InfraStructure for ENES). As a project infrastructure IS-ENES : 1. coordinates and operates the European deployment of ESGF and ES-DOC 2. supports a selected set of projects (like CMIP5, CORDEX, ...) 3. contribute developments to ESGF and ES-DOC software components 4. contribute bridging the gap towards the impact community (climate4impact application/portal)"
  • NOAA-Affiliated Community Projects

    • Cecelia DeLuca
    • [email protected]
    • NOAA/ESRL
    • This talk highlights a set of NOAA-affiliated community projects that are utilizing ESGF, CoG, and ES-DOC resources. Projects include the High Impact Weather Prediction Project (HIWPP), a MIP designed to evaluate new approaches in numerical weather prediction; the National Climate Predictions and Projections Platform (NCPP), which is focused on evaluation and translation of climate data products; the ongoing atmospheric Dynamical Core Model Intercomparison Project (DCMIP); and the NOAA Environmental Software Infrastructure and Interoperability (NESII) group.
  • From obs4MIPs to Routine Benchmarking of Climate Models

    • Peter Gleckler
    • [email protected]
    • DOE/LLNL
    • Through its distributed approach to data delivery, ESGF serves the needs of CMIP5 and related MIPs. By adhering to the Climate and Forecast (CF) metadata convention for model output, these MIPs ensure that critical metadata can be both readily searched via ESGF and efficiently analyzed by scientists. These advancements to the organization and delivery of climate model output are now being applied to observational datasets in the obs4MIPs project. Select NASA products routinely used for model evaluation are now accessible on ESGF via the obs4MIPs project, as are some from CFMIP-OBS, with others are also becoming available. This presentation will describe efforts underway to exploit the above infrastructural advancements for the purpose of improving how routine model benchmarking is performed in MIPs. Technological challenges to this endeavor will be highlighted.
  • Distributing Reanalysis Data on ESGF

    • Jerry Potter
    • [email protected]
    • NASA/GSFC
    • Reanalysis has become an important tool for use by the atmospheric science community and the data available from the various reanalysis centers is offered in a variety of formats and structure. This variety among the reanalysis efforts makes intercomparison a laborious process. In order to make the data more easily accessible, a new community project called ana4MIPs will be available from the ESGF distributed archive and will include selections from the major reanalysis centers. The data is formatted in a similar way to the CMIP5 archive and will be distributed though ESGF . The data adhere to all the standards used by CMIP5 allowing easy comparison among the various reanalyses and between reanalyses and CMIP5 model output. I also will discuss the strengths and weaknesses of reanalysis and the need for a reanalysis intercomparison.
  • ES-DOC and ESGF

    • Eric Guilyardii, Cecelia DeLuca, Sébastien Denvil, Bryan Lawrence, Mark Morgan, Sylvia Murphy, Karl Taylor
    • [email protected]
    • IS-ENES/IPSL
    • An update of ES-DOC (Earth System DOCumentation, http://earthsystemcog.org/projects/es-doc-models/), the legacy project of METAFOR and CURATOR is presented. This includes the status of CMIP5 metadata organisation and exploitation, community organisation and governance of related standards (CIM, controlled vocabulary), CIM tools (viewer and comparator) and future plans (CMIP6). The links with ESGF, including the contents, technical and human coordination, are then discussed.
  • Program and Architecture; and Regional Climate Model Evaluation System (RCMES)

  • Climate Impact Portal

  • Scaling the Comparison of Models and Observations for CMIP6

    • Dan Crichton
    • [email protected]
    • NASA/JPL
    • The explosion of data has been identified as one of the challenges of CMIP6. Bringing together the observational data and the model output requires new approaches in the era of big data. The paradigm of users downloading all data in order to perform analysis needs to be replaced by a shift towards distributing the computation across both model and data repositories. New architectural approaches need to be explored in order to develop a scalable architecture that will aide researchers in improving the return on analyzing data. This talk will discuss efforts at NASA/JPL to begin to explore alternative paradigms for model-to-data comparison.

####MODELING AND DATA CENTER REQUIREMENTS

  • Australia

    • Ben Evans
    • [email protected]
    • ANU/NCI
    • The CMIP5 have presented some unique challenges and there are more to come. Many requirements are common: The transition of data from modelling group to publishing; international replication of international data can deliver and maintain updates to data. Emerging requirements include a stronger provenance record from data generation and storage, well-maintained environments for analysing the data in-situ in a high performance and flexible environment; and the ability to republish this data with a provenance record.More broadly our centre is also supporting the drive for two changes: the alignment Climate and Weather research; the co-location of other reference data such as satellite and other environmental data. One example is our Earth Observation Data Cube.
  • Germany

    • Stephan Kindermann
    • [email protected]
    • IS-ENES/DKRZ
    • "Key requirements for ESGF from a long term archival perspective are summarized: - ESGF legacy system integration - ESGF publication of annotations (e.g. data quality related) - ESGF data quality control integration - ESGF administrative tools for data integrity and data completeness First experiences with a prototype implementation of ESGF / DKRZ legacy data system integration are presented, which might of interest for others."
  • France

    • Sébastien Denvil
    • [email protected]
    • IS-ENES/IPSL
    • "IPSL and CNRM-CERFACS as climate modeling centers needs to efficiently ingest/find/analyse their every day simulations. Furthermore they need to compare those results to state of the art reference database like CMIP, ECMWF reanalyses and selected sets of observations. This process needs to have a very high level of automation. Also this process deeply imply several national facilities (HPC and data centers). This talk will (1.) expose French requirements and contribution so as to ensure that such process can be built on top of ESGF. (2.) expose French requirements and contribution with respect to the CMIP6 perspective."
  • The Status of ESG-BNU node in China

  • Qizhong WU

  • [email protected]

  • China/BNU

  • In the presentation, we will do some introduction about the status of BNU node and other ESGF nodes in China, also some download statistics from thredds log files at BNU node, and what we should prepare for the next CMIP6, including the hardware or software.

  • European Collaborative Data Infrastructure (EUDAT)

    • Stephan Kindermann
    • [email protected]
    • IS-ENES/DKRZ
    • EUDAT is building up a cross-commuity data service infrastructure connecting data centers and HPC centers in Europe. The European Network for Earth System modeling (ENES) is one of the pilot communities defining initial use cases and contributing their data management experience. The talk will provide an overview of the initial services deployed, which include a data staging service connecting data centers with HPC centers, a safe replication service to connect data centers as well as a metadata portal. In the underlying software stack iRods as well as a persistent identifier service are central components, which might be also of interest for future ESGF developments.
  • Publishing Non-Conforming Data Products into ESGF

    • Eric Nienhouse, Douglas Schuster, Don Middleton, Steve Worley
    • [email protected]
    • NSF/NCAR
    • "As a national center NSF-NCAR manages over 5PB of climate and related data products. Integration of these products with distribution systems such as ESGF removes barriers to scientific data discovery and access. In this talk we will present key requirements for including these valuable data products in ESGF: - Publication of existing data product catalogs - Access to non-comforming data formats (eg. GRIB) - Aggregation of use metrics from distributed systems - Remote publication of geographically distributed products - Integration with legacy and tape storage systems"

####COMMUNITY SOFTWARE EFFORTS

  • ESGF & UV-CDAT Community Review
    • Dean N. Williams
    • [email protected]
    • DOE/LLNL
    • Overview of the ESGF and UV-CDAT and their importance to the climate research community. We will touch upon the furture direction of both software products and their interoperability as we move forward to CMIP6.

####ESGF COMMUNITY IMPROVMENTS

  • ESGF Infrastructure: Next Steps and Gaps

    • Luca Cinquini
    • [email protected]
    • NASA/JPL
    • This talk will present a list of the most critical areas of improvements where ESGF development should focus to improve the overall user experience, and better serve the climate community in the future (for CMIP6 and other projects).
  • Demonstration of Update Manager for ESGF

    • Prashanth Dwarakanath
    • [email protected]
    • IS-ENES/NSC
    • The Update Manager is a tool that would simplify the installation script and allow for easy updation of components of an ESGF node, either individually or all together. It would allow for easy deployment of bug fixes and patches and would foster further development efforts. It's designed to be a robust and fault-tolerant fully federated system.
  • Reproducible Offline ESGF Node Installation: Towards a Multi-Release Installation System for ESGF

    • Stephen Pascoe
    • [email protected]
    • IS-ENES/BADC
    • In order for the ESGF software stack to grow and mature any ESGF developer needs to be able to build nodes in multiple configurations and from multiple versions. We need to be able to integrate pre-production components into test deployments without these pre-production components being part of the official stable and development releases. Similarly, sites operating production ESGF services need to be able to scale out their ESGF infrastructure from a consistent version without having to upgrade their entire deployment to the latest version. We will present a mechanism for caching installation artefacts, allowing repeated installation of the same node version independent of the versions supported by the central ESGF installation server. Using the results from running installations in caching mode, we analyse what artefacts are downloaded, deducing what proportion of the ESGF installation is openly version controlled. This analysis leads us to recommend steps to make the ESGF installation system support multiple artefact sources and therefore multiple versions.
  • ESGF Publication: Improving User Experience

    • Rachana Ananthakrishnan and Luca Cinquini
    • [email protected]
    • DOE/AN NASA/JPL
    • We'll present some of the new publication services for harvesting catalogs, and a new service based solution for publishing to ESGF that would simplify the researcher's publciation workflow. [email protected]
  • Observational Data and Metadata Publication into ESGF

    • Misha Krassovski, John Harney, Sigurd Christensen, Tom Boden, Raymond McCord, Giri Palanisamy
    • [email protected]
    • DOE/ORNL
    • Current ESGF publisher is designed for models and is not quite suitable for observational data because of different metadata formats, no seamless mapping of hierarchies and many model parameters that are not applicable to observational data. This presentation will show how to overcome these and other difficulties by using THREDDS server coupled with ESGF stack.
  • ESGF Data Versioning and Fixity: Past, Present, and Future

    • Stephen Pascoe
    • [email protected]
    • IS-ENES/BADC
    • The CMIP5 archive introduced several new requirements for ESGF which had far reaching repercussions for the system architecture, including data versioning, notification and DOI publication. Several years on, ESGF has partially implemented these features but much remains to be done to enable robust multi-version publishing with distributed consistency checking. Many features that are highly valued by users, such as notification and DOIs, rely on reliable version tracking. In order to meet the needs of our users we have to improve our underlying technology in this area. This presentation will briefly review the status quo and historical context of version support within the ESGF publisher, the DRS, esgf-drslib and esgf-search before presenting a vision for a robust distributed versioning system for ESGF data.
  • Automated Replication Procedures

    • Stephan Kindermann, Jeff Painter
    • [email protected], [email protected]
    • IS-ENES/DKRZ, DOE/LLNL
    • The CMIP5 replication experience is summarized, as well as the tools used for replication in CMIP5. The key bottlenecks and missing parts for a stable replication solution in the existing ESGF system are collected. First steps towards a more sustainable and manageable solution are presented: the introduction and use of persistent identifiers as part of the replication process, as well as standadization efforts supporting the development of automatic PID based processing services.
  • Globus for Data Transfer in ESGF

    • Rachana Ananthakrishnan and Eric Blau
    • [email protected], [email protected]
    • DOE/LLNL
    • Present use of Globus Transfer for data download and transfers with in the ESGF infrastructure. We'll discuss what has been done today, and some planned improvements.
  • "Persistent Identifiers for data management and data citation: from PIDs to DOIs"

    • Stephan Kindermann
    • [email protected]
    • IS-ENES/DKRZ
    • After a summary of the CMIP5 experience with data quality control and DOI based data citation, a more light weight approach to get to stable, citable data references is described. It is based on the introduction of light weight persistent identifiers early in the data life cycle, e.g. at ESGF data publication time. These identifiers for referable data entities are later the basis for DOI based data citations.
  • ESGF Security Infrastructure

    • Rachana Ananthakrishnan and Phil Kershaw
    • [email protected], [email protected]
    • DOE/ANL, IS-ENES/BADC
    • Describe current methods and some suggested directions to simplify the ESGF security infrastructure.
  • ESGF Dashboard and Desktop

    • Gavin M. Bell, Sandro Fiore
    • [email protected], [email protected]
    • DOE/LLNL, IS-ENES/U. of Salento, Italy
    • The ESGF Dashboard is the distributed monitoring system of the Earth System Grid Federation. It is responsible for collecting historical information about the status of the ESGF federation in terms of: network topology (peer-groups composition), node-type (host/services mapping), registered users (including their idp-based distribution), downloaded data (both at site and federation level), system metrics (round trip time, service availability, CPU, memory, disk, processes, etc.). From an architectural point of view, the ESGF Dashboard is composed of the following three parts: the information provider, the dashboard catalog and the user interface. The first one represents the back-end of the system. It is responsible for retrieving and storing all of the P2P metrics. It strongly interacts with the node manager to synchronously get (and store into the dashboard catalog) updated snapshots related to the status of the federation. This module can be easily extended to collect new metrics, by implementing additional sensors associated to new classes of information. The new engine is able to manage and store also long-term metrics too. Finally, the user interface is a web-application developed in Java, JSP and Javascript exploiting the MVC design pattern. It relies on a strong adoption/implementation of Web2.0 concepts like mash-up, google-maps and permalinks. It provides several views at four different (hierarchical) granularity levels: federation, peer-group, host and service. The new GUI is named ESGF Desktop and it represents a web-based desktop interface with several gadgets presenting all the different views provided by the system.
  • Enhancing the User Interface (UI) with CoG

    • Luca Cinquini, Cecelia DeLuca, Sylvia Murphy
    • [email protected]
    • NASA/JPL, NOAA/ESRL, NOAA/ESRL
    • CoG will be the next generation web front end to the ESGF services. This talk will give an overview of the features and functionality of CoG, and outline the development roadmap and timeline for replacing the current ESGF web front end.

####UV-CDAT COMMUNITY IMPROVEMENTS

  • UV-CDAT Infrastrcuture: Next Steps and Gaps

    • Dean N. Williams, Claudio Silva
    • [email protected], [email protected]
    • DOE/LLNL
    • A list of the most critical UV-CDAT areas of improvements and where UV-CDAT development should focus to improve the overall user experience, and better serve the climate community in the future (for CMIP6, Project Enterprise, and other projects).
  • Attaining Software Quality in Large Projects with Numerous Developers

    • Dave DeMarle
    • [email protected]
    • Kitware
    • Ambitious software goals often necessitate sizable software projects. Large software projects require many developers, many dependencies, or both. Careful management of the software lifecycle in this case is crucial for success since the likelihood of bugs grows with both factors. Software processes that are not overly burdensome, but at the same time ensure sufficient code quality are required. In this talk we will compare the software development processes in use in the UV-CDAT, ParaView and VTK projects. Best practices for dependency handling, coding standards, regression testing, revision control, bug tracking and release management will be described.
  • Building the Next Generation UV-CDAT: a Web-based Informatics Platform for Climate Data Analysis and Visualization

    • Elo Leung
    • [email protected]
    • DOE/LLNL
    • The proposed web-based informatics platform is a client-server application that builds on top of the existing computing packages in UV-CDAT where compute intensive tasks are performed on the server-side. In this architecture, the rendering can be accomplished on the server side using existing UV-CDAT plots via CDAT, DV3D, ParaView, and VisIT and on the client side using WebGL, SVG and Canvas 2D API.
  • Remote Rendering and Analysis for Big Data using ParaViewWeb

    • Aashish Chaudhary
    • [email protected]
    • Kitware
    • ParaViewWeb is a collection of components that enables the use of server and client side visualization and data analysis capabilities within Web applications. Using the latest HTML 5.0 based technologies, such as WebSocket, and WebGL, ParaViewWeb enables communication with a ParaView server running on a remote visualization node or a cluster using a light-weight JavaScript API. Using this API, Web applications can easily embed interactive 3D visualization components. Application developers can write simple Python scripts to extend the server capabilities including creating custom visualization pipelines. ParaViewWeb makes it possible to extend web-based scientific workflows with the ability to visualizate and analyze big datasets efficiently on the thin clients.
  • Diagnostics and Metrics

    • Jeff Painter, Chales Doutriaux, Brian E. Smith
    • [email protected]
    • DOE/LLNL, DOE/ORNL
    • Ths UV-CDAT Diagnostic package is designed to compare model output data with observations, or output of other models. Generally a diagnostic is a plot or table of variables averaged over time and some spatial directions. These variables may be avaiable directly from model output, or derived from it by some calculations. The UV-CDAT Diagnostics will encompass the capabilities of the NCAR Diagnostics used with CAM, CCSM, and CESM; but is highly flexible and extensible. This utility has already computed diagnostics on CMIP5 data, for example. Several thousand diagnostic plots are available so far. The plots can be selected and manipulated either with the regular UV-CDAT GUI or with the new web-based GUI described in Elo Leung's talk "Web Informatics"; or they can be selected and computed in a batch script.
  • UV-CDAT Live Exploratory Analysis

    • John Harney
    • [email protected]
    • ORNL
    • In this talk, we describe a lightweight enhancement that introduces exploratory analysis capabilities to UV-CDAT-live (the web application of UV-CDAT). Exploratory analytics provides a brand of interactivity that is necessary to transform data into insight, thereby improving critical comprehension of Earth system processes. Furthermore, it manifests a novel means to ascertain many views and correlations of data quickly and seamlessly. Our new component will utilize a richly defined RESTful service interface that wraps the newly composed climate diagnostics tools that produces valuable diagnostics information from model output driven climatology files. We then show the manifestation of exploratory analysis using the well-adopted Data Driven Documents (D3) javascript library, leveraging its mature tree, map, and time series visualizations.
  • Streaming Processing for Remote Data Exploration and Analysis

    • Timo Bremer
    • [email protected]
    • LLNL/CASC/AIMS
    • ViSUS a framework for streaming processing and remote data exploration recently integrated into UV-CDAT. Using a simple spatial re-ordering of data ViSUS enables fast and streaming access to massive amounts of data. In particular, ViSUS allows users to remotely access one or multiple large scale climate simulations, compute their mean, standard deviation, or otherwise process the ensemble and transfer the results to an interactive remote visualization client. At any point the user can interactively choose which models to include in the ensemble and what operations to perform while browsing the entire time series. In this manner ViSUS provides a simple and intuitive way to determine the parameters of a final analysis step to be subsequently performed using some large scale computational resource.
  • VisTrails Workflow Parallelism & Scripting

    • David Koop, Ben Burnett, Rémi Rampin, Tommy Ellqvist, Juliana Freire, Cláudio Silva
    • [email protected]
    • NYU-Poly
    • In support of UV-CDAT, the VisTrails workflow and provenance infrastructure has been evolving to improve support for executing workflow modules in parallel when possible. In addition, a new VisTrails package supports job scheduling on HPC machines so workflows can orchestrate such runs in a larger analysis pipeline. Finally, to better integrate existing and new scripts, VisTrails has made strides to improve translations from scripts to workflows and vice versa.
  • Current and Future Directions for Parallel Processing in the UV-CDAT Framework

    • Brian Smith
    • [email protected]
    • ORNL
    • Data sets are increasing in size, detail, and scope at an incredible rate. Tools for statistical analysis, visualization, and processing of the datasets need to take advantage of parallel processing to enable scientists to manipulate the datasets that get generated for their scientific goals. The UV-CDAT framework has several existing parallel projects and plans on increasing the amount of parallelism available. We will review the current parallel processing projects in UV-CDAT and we will discuss future parallel efforts in the Project Enterprise timeframe.
  • Spatio-Temporal Pipeline for Parallel Visualization and Analysis in ParaView

    • Curt Canada
    • [email protected]
    • LANL
    • As computational resources have become more powerful over time, availability of large-scale data has exploded, with datasets greatly increasing their spatial and temporal resolutions. For many years now, I/O read time has been recognized as the primary bottleneck for parallel visualization and analysis of large-scale data. Read times ultimately depends on how the file is stored and the file access pattern used to read the file. In this paper, we introduce a model which can estimate the read time for a file stored in a parallel filesystem when given the file access pattern. The type of parallel decomposition used directly dictates what the file access pattern will be. The spatio-temporal pipeline is used to give greater flexibility to the file access pattern used. The spatio-temporal pipeline combines both spatial and temporal parallelism to create a parallel decomposition for a task. Within the spatio-temporal pipeline, all available processes are divided into groups called time compartments. Temporal parallelism is utilized as different timesteps are independently processed by separate time compartments, and spatial parallelism is used to divide each timestep over all processes within a time compartment. The ratio between spatial and temporal parallelism is controlled by adjusting the size of a time compartment. Using the model, we were able to configure the spatio-temporal pipeline to create optimized read access patterns, resulting in a speedup factor of approximately 400 over traditional file access patterns.
  • Parallel Rendering and Data Processing in ParaView

    • Dave DeMarie
    • [email protected]
    • Kitware
    • ParaView is a popular application for visualization and analysis of high resolution scientific datasets. It is built on a client server architecture and uses scalable distributed memory parallel processing techniques to process and even interactively visualize arbitrarily large data sets given access to a sufficiently capable back end supercomputer. This talk will focus on ParaView's architecture, including spatial and temporal domain decomposition and concurrent processing, remote data visualization, and practical issues that arise when using ParaView on HPC resources.
  • Unstructured Climate Data Visualization with DV3D: Next-Generation DV3D

    • Tom Maxwell and Jerry Potter
    • [email protected]
    • NASA/Goddard
    • The new DV3D point cloud plotter allows this data to be viewed directly- without requiring any preprocessing. It makes no assumptions regarding the geometrical layout of the points- it visualizes the points directly, with each point colored by the value of the variable at that location. Using this display method it is very easy to shift projections- e.g. to toggle between a lat-lon-level rectangular projection and a spherical projection.
  • Data Analysis Tools (DAT)

    • Ben Burnett, Rémi Rampin, David Koop, Juliana Freire, Cláudio Silva
    • [email protected]
    • NYU-Poly
    • In light of UV-CDAT's success and growing user base, we have been working to refine and extend some of the concepts introduced in UV-CDAT to the broader and simplified DAT framework. In addition to the variable and plot interactions built on VisTrails's provenance, workflow engine, and reproducibility infrastructure, the DAT introduces operations as a core ingredient to allow clearer variable composition and transformation. With a generalized framework, we expect that the DAT will make it easier to integrate tools with UV-CDAT as well as prove useful in other branches of science.

####COMMUNITY IMPROVEMENTS

  • Data Transfer Performance in Preparation for CMIP6

    • Eli Dart
    • [email protected]
    • Esnet
    • I will discuss national and international data transfer performance in support of climate science, and the implications for increasing data set size. I will also present ideas for increasing data transfer performance between major data centers.
  • ES-DOC: Documentation Eco-System Progress

  • Developing a systematic database of observations for evaluating climate models

    • Huikyo Lee, Chris Mattmann, Duane Waliser and Daniel Crichton
    • [email protected]
    • NASA/JPL
    • Accurate simulation of key variables such as temperature, precipitation and clouds still remains a major challenge in regional climate models. Hence, systematic comparison of models with observations from various sources is required to understand uncertainties of models and to provide guidance for model improvement. The Regional Climate Modeling Evaluation System (RCMES) is an open source software package developed to facilitate climate model evaluation. The database of RCMES (RCMED) provides single point access to wide range of global and regional observation datasets, with emphasis on NASA’s satellite datasets. RCMED has made the evaluation process for regional climate models simpler, quicker and physically more comprehensive. In the future, RCMES will have direct access to ESGF. Via this convenient access to observational and model data on ESGF, researchers could spend more time on analyzing results and less time coding and worrying about transferring and processing of observational and model data.
  • Data Reduction Methods for Large Streaming Data

    • Alex Sim
    • [email protected]
    • LBNL
    • Large streaming data is an essential part of science and engineering experiments and computational simulations. However, it is in general intractable to store, compute, search and retrieve large streaming data mostly due to the data volume and the high frequency of the data generation. This presentation addresses a fundamental issue, which is to reduce the size of large streaming data and still obtain accurate information in the data for analysis. For example, all network devices collect network traffic monitoring logs. In high-speed networks, the collected network traffic monitoring data rapidly grows, and in-depth network analysis is very challenging. We present a new dynamic algorithm that reduces the size of data records in exponential scale, and still provides accurate information for the large streaming data for analysis.
  • Cloud Installation

    • Ben Evans
    • [email protected]
    • ANU/NCI
    • The ESG software is just one of a large number of data services that are required to support the climate community. The ESG service requires a flexible way of upgrading either the full software package or sub-components, and scaling out. Many users wish to be able to analyse the data at the source, rather than having to perform downloads on the data - hence the need to provide tools and computational environments in-situ. In this talk I will describe our installation in the OpenStack cloud environment - the core technology of the NCI HPC cloud.
  • ESGF Cluster "Reign Clouds"

    • Gavin M. Bell
    • [email protected]
    • LLNL/AIMS
    • We are using a small cluster to demonstrate the power and utility of using "Big Data" distributed computing for climate science. We have experimented with several different frameworks and investigated the pro's and cons of each and have settled on a few promising technologies that in aggregate create a single system that will offer the right amount of flexibility, speed and durability to deal with today's and tomorrow's climate science needs. Beyond the application of Big Data to climate science, we also focused on the entire user and maintenance experience with new eyes, focused on ease of use, scalability and maintenance.
  • Overview of OpenClimateGIS and ClimateTranslator: A Python Library and Web Interface for Geospatial Manipulations of CF Climate Datasets

    • Benjamin Koziol, Luca Cinquini, Richard Rood, Cecelia DeLuca, Sylvia Murphy
    • [email protected]
    • NASA/JPL
    • OpenClimateGIS is an open source Python library exposing vector GIS operations for CF, netCDF datasets. In addition, the software supports a variety of subsetting, conversion, and computational functions on single and multiple netCDF files. This presentation will provide an overview of the software, examples of its application, directions for future development, and an introduction to the ClimateTranslator web interface using OpenClimateGIS as a backend.
  • Driving DOE ESM: The Game Changing GUI

    • Renata B. McCoy
    • [email protected]
    • LLNL/AIMS
    • The DOE ESM End-to-End Workflow and its Graphical User Interface will change the way climate models will be developed, diagnosed, improved and their data analyzed in the future. The prevailing use of workflows with complete data provenance coupled with strong knowledge discovery based, automated diagnostics, performance metrics and UQ analysis with state of the art web informatics and visualization will create a fast and furious Cadillac kind of a system. To drive it one needs simple an intuitive GUI, that will first of all serve the model developers needs and enable them fast and agile work, automating and simplifying routine operations, enabling complex diagnostic and analysis to be performed automatically and results presented in simplified forms. The workflows and data provenance will enable unprecedented data transparency and reproducibility, and will change the worldwide community’s collaboration in scientific endeavors.
  • ClimatePipes: User-Friendly Data Access, Manipulation, Analysis & Visualization of Community Climate Models

    • Aashish Chaudhary
    • [email protected]
    • Kitware
    • The impact of climate change will resonate through a broad range of fields including public health, infrastructure, water resources, and many others. Long­ term coordinated planning, funding, and action are required for climate change adaptation and mitigation. Unfortunately, widespread use of climate data (simulated and observed) in non­-climate science communities is impeded by factors such as large data size, lack of adequate metadata, poor documentation, and lack of sufficient computational and visualization resources.
  • Server side data processing with PyWPS

    • Luca Cinquini
    • [email protected]
    • NASA/JPL
    • "We will present a recent project at JPL that has enabled on-the-fly gridding of satellite observations into user-defined geospatial and temporal grid. We will argue that this this standard-based architecture is well suited to be adopted by the ESGF infrastructure."
  • Using a next-generation climate architecture in education

    • Jae Younb
    • [email protected]
    • "We used a next-generation climate architecture that was designed to share some of the features of the NASA Computational Modeling and Cyberinfrastructure (CMAC) for a set of domain-specific collaborative software modeling tasks designed for a graduate-level Software Architecture class. We present what we learned from the experience."

####LLNL EXCURSION Tour of the National Ignition Facility (NIF)


Clone this wiki locally