Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ObsOperator: proposal for a generic observation operator diagnostic #2636

Open
mbertolacci opened this issue Dec 13, 2024 · 0 comments
Open
Labels
category: Question Further information is requested topic: Diagnostics Related to output diagnostic data

Comments

@mbertolacci
Copy link

Your name

Michael Bertolacci

Your affiliation

The University of Western Australia

Please provide a clear and concise description of your question or discussion topic.

For a flux inversion project I'm involved in I've been working on a generic observation operator diagnostic for GEOS-Chem which I've tentatively termed ObsOperator.

I'm curious about the views of the GEOS-Chem community on the value of such a diagnostic, and what features it ought to have beyond what is already implemented.

Per offline discussions with @aschuh and @yantosca, I am tagging the following people who may be interested:
@msulprizio @lizziel @kbowman77 @dbajones @ltmurray @drewpendergrass @laestrada @djxjacob @randallvmartin @djvaron @nicholasbalasus @eastjames

The code is in a fork of GEOS-Chem. Most of it is here:
https://github.com/mbertolacci/geos-chem/tree/feature/obsoperator-diagnostic/ObsOperator

The basic idea is to return a (potentially weighted) average of the GEOS-Chem species concentration field. This could be a vertical average, box average, time average, or any combination thereof.

This is sufficient to do what the existing ObsPack diagnostic does. Due to the vertical averaging, it can also implement part of the sampling for OCO-2 or TCCON. I give examples of these operators below in a draft specification.

The rest of this is a draft specification, as currently implemented by the code above.

ObsOperator diagnostic

Overview

This ObsOperator diagnostic for GEOS-Chem is a flexible way to define and run observation operators over the GEOS-Chem species concentration field. The user specifies "observation operators" in YAML files, which are then used to calculate a set of output values for each hour of the run.

A single observation operator is defined as an operation on the GEOS-Chem species concentration field that produces a single number. A YAML file or files is used to define the set of observation operators for a given run, with a format described later in this document.

Every observation operator can apply to one or more species, and is the composition of a time operator, a horizontal operator, and a vertical operator. Each operator either samples a single value, or computed a (potentially weighted) average over the a set of values (conceptually, however, a point value is a single-value average with a weight of 1). The resulting calculated value is thus equal to

output[s] = sum(
  time_weight[t] * sum(
    horizontal_weight[i, j] * sum(
      vertical_weight[k] * value[t, i, j, k, s]
      for k in vertical_indices
    )
    for j in horizontal_indices
  )
  for t in time_indices
)

where s is the species index. The vertical indices can be defined directly as levels, by altitude, or by pressure. The weights can be fixed, or can be pressure weighted (and thus dynamically updated).

Once all time points have been processed, the output values are written to a CSV file.

geoschem_config.yml entry

The entry in the geoschem_config.yml file is as follows:

extra_diagnostics:
  obsoperator:
    activate: true  # Whether to activate the obsoperator diagnostic
    verbose: true  # Whether to print verbose output
    input_file: obsoperator-YYYYMMDD.yml  # The YAML file(s) containing the observation operators
    output_file: OutputDir/obsoperator_output-YYYYMMDD.csv  # The output file name

The input_file and output_file are expanded to include the year, month, and day if they contain YYYY, MM, or DD, as per usual.

YAML file format examples

The YAML file contains a list of observation operators under the entries key as follows:

entries:
  - # first entry
  - # second entry

Examples of entries are below, and full specification of the format is given later.

ObsPack-style time averaging

The ObsPack diagnostic implements either point or time-averaged sampling at a specific location and altitude. For example, suppose the ObsPack entry is:

obspack_id =  obspack_co2_1_GLOBALVIEWplus_v6.1_2021-03-01~co2_smo_surface-flask_1_representative~19583684 
model_sample_window_start =  2014-12-31 23:39:00 
model_sample_window_end =  2015-01-01 00:39:00 
longitude =  -170.5644 
latitude =  -14.2474 
altitude =  60.3 

The corresponding YAML entry could be:

- id: obspack_co2_1_GLOBALVIEWplus_v6.1_2021-03-01~co2_smo_surface-flask_1_representative~19583684
  species: CO2
  time_operator:
    type: range
    unit: time
    start: [20141231, 2339]
    end: [20150101, 0039]
    weights: normalized
  horizontal_operator:
    type: point
    unit: degrees
    longitude: -170.5644
    latitude: -14.2474
  vertical_operator:
    type: point
    unit: altitude
    value: 60.3

OCO-2 retrieval vertical averaging

For OCO-2, the observation operator for a retrieval at a specific time and location has the following form:

value = xco2_apriori + h^T (vco2_model - vco2_apriori)
      = xco2_apriori - h^T vco2_apriori + h^T vco2_model

where xco2_apriori is a given scalar, h is a vector of weights (a combination of the averaging kernel and the pressure weights), vco2_model is a vector of concentrations from GEOS-Chem at certain pressure levels, and vco2_apriori is the apriori vector of concentrations.

The ObsOperator diagnostic can compute h^T vco2_model; the user would then add the constant offset xco2_apriori - h^T vco2_apriori to this to get the final sampled value.

A corresponding YAML entry could be:

- id: oco2_20190701005646
  species: CO2
  time_operator:
    type: point
    unit: time
    time: [20190701, 52]
  horizontal_operator:
    type: point
    unit: degrees
    longitude: -166.4299927
    latitude: -38.4300003
  vertical_operator:
    type: exact
    unit: pressure
    values:
    - 0.102817
    - 54.1142132
    - 108.2284265
    - 162.3426321
    - 216.456853
    - 270.5710739
    - 324.6852641
    - 378.799485
    - 432.9137059
    - 487.0279268
    - 541.1421477
    - 595.256338
    - 649.3705283
    - 703.4847798
    - 757.5989701
    - 811.7132216
    - 865.8274119
    - 919.9416021
    - 974.0558537
    - 1028.1700439
    weights:
    - 6319.1598432
    - 19984.860155
    - 29375.7039415
    - 36636.2610202
    - 42524.5607233
    - 46564.1217474
    - 49287.5914229
    - 51220.5129134
    - 52574.6420179
    - 53401.2996122
    - 53860.9510212
    - 54059.2251766
    - 53992.2235359
    - 53769.4608382
    - 53338.4316021
    - 52930.0816228
    - 52541.1533714
    - 51655.6313083
    - 52050.4418275
    - 26359.1997691

(Here the weights have been multiplied by 1e6 to convert the output concentration from v/v to ppm.)

YAML file format specification

The YAML file contains a list of observation operators under the entries key as follows:

entries:
  - # first entry
  - # second entry

An individual observation operator is defined as a dictionary with the following keys:

id: myentryid
species:
  - species1
  - species2
  ...
time_operator:
  # The time operator definition
  ...
horizontal_operator:
  # The horizontal operator definition
  ...
vertical_operator:
  # The vertical operator definition
  ...

If only one species is specified, it can be given as a single string instead of a list.

Each of the operators is described below

Time operator

The time operator can apply to a single point in time or a range of times.

A point in time is defined as follows:

time_operator:
  type: point
  unit: time  # or time_index
  time: [YYYYMMDD, HHMM]  # or a single integer time index

where YYYYMMDD is the year, month, and day, and HHMM is the hour and minute, both as integers.

A range of times is defined as follows:

time_operator:
  type: range
  unit: time  # or time_index
  start: [YYYYMMDD, HHMM]  # or a single integer time index
  end: [YYYYMMDD, HHMM]  # or a single integer time index
  weights: normalized  # or 'equal'

The weight can be either normalized or equal. Normalized are the default, and mean that the weights are normalized to sum to 1 (thus we are averaging the values over the time range). Equal means that the weights are equal to one for all time points (thus we are summing the values over the time range).

Horizontal operator

The horizontal operator can apply to a single point or a box.

A point is defined as follows:

horizontal_operator:
  type: point
  unit: degrees  # or grid_index
  longitude: 90.0  # or a single integer grid index
  latitude: 10.0  # or a single integer grid index

Degrees are degrees north for latitude (-90 to 90), and degrees east for longitude (-180 to 180). The grid index depends on the choice of horizontal grid.

A box is defined as follows:

horizontal_operator:
  type: box
  unit: degrees  # or grid_index
  longitude_start: 90.0  # or a single integer grid index
  longitude_end: 90.0  # or a single integer grid index
  latitude_start: 10.0  # or a single integer grid index
  latitude_end: 10.0  # or a single integer grid index
  weights: 'normalized_area'  # or 'area', 'normalized', 'equal'

The weight can be either normalized area, area, normalized, or equal. Normalized means that the weights sum up to 1. Area means that the weights are proportional to area (or equal for weights = "area" itself). Equal means that the weights are equal to one for all grid points.

Vertical operator

The vertical operator can apply to a single point, a range of points, or a specific set of points.

A point vertical operator is defined as follows:

vertical_operator:
  type: point
  unit: pressure  # or altitude, pressure_level
  value: 1000  # or a single integer pressure level

Pressure is in hPa. Altitude is in meters. Pressure level is the integer pressure level index, which depends on the choice of vertical grid.

A range of points is defined as follows:

vertical_operator:
  type: range
  unit: pressure  # or altitude, pressure_level
  start: 0  # or a single integer pressure level
  end: 1000  # or a single integer pressure level
  weights: normalized_pressure  # or pressure, normalized, equal

As usual, normalized means that the weights sum up to 1. Pressure means that the weights are proportional to the pressure difference across the sampled vertical levels. Equal means that the weights are equal to one for all vertical levels.

The final vertical operator is a set of points, and is defined as follows:

vertical_operator:
  type: exact
  unit: pressure  # or altitude, pressure_level
  values: [0, 1000]  # or a list of integer pressure levels
  weights: [0.5, 0.5]  # or a list of weights

Here, the weight must be specified for each value.

Output format

The output is a CSV file with the following columns:

  • id: The id of the observation operator
  • species: The species name of the observation operator
  • value: The final accumulated value of the observation operator for the given species
@mbertolacci mbertolacci added the category: Question Further information is requested label Dec 13, 2024
@yantosca yantosca added the topic: Diagnostics Related to output diagnostic data label Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Question Further information is requested topic: Diagnostics Related to output diagnostic data
Projects
None yet
Development

No branches or pull requests

2 participants