Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for basic data summary #362

Open
m-burgoyne opened this issue Apr 27, 2022 · 6 comments
Open

Add support for basic data summary #362

m-burgoyne opened this issue Apr 27, 2022 · 6 comments

Comments

@m-burgoyne
Copy link
Collaborator

A common use case for the data returned by EDR query will be to generate a summary value from the data for the area and time of interest, adding optional support for basic data aggregation methods could improve performance for services by reducing the volume of data that is returned by queries.

The could be achieved by adding optional support for new query parameters to describe the methods and the axes to calculate across.

The proposal is to add functionlity to allow an EDR service advertise aggregation functionality (if it supports it), and for any support to be defined at an individual query type level.

This information could be added at the DataQuery metadata for collections in a service that supports data aggregation,
by adding a new property that listed the available methods and valid axis combintions

An example of the suggested metadata description can be seen below:

"area": {
  "link": {
    "href": "http://example.service.org/collections/demo/area",
    "hreflang": "en",
    "rel": "data",
    "variables": {
      "title": "Area query",
      "query_type": "area",
      "output_formats": [
        "CoverageJSON",
        "GeoJSON"
      ],
      "default_output_format": "GeoJSON",
      "crs_details": [
        {
          "crs": "EPSG:4326",
          "wkt": "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.01745329251994328,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]]"
        }
      ],
      "aggregation": {
        "agg_method": [
          { 
            "name": "sum",
            "desc": "Compute a total from the requested data"
          },
          { 
            "name": "average",
            "desc": "Compute the average value from the requested data"
          },
          { 
            "name": "Max",
            "desc": "Compute the Maximum value from the requested data"
          },
          { 
            "name": "Min",
            "desc": "Compute the Minimum value from the requested data"
          }
        ],
        "agg_axis": [
          { 
            "name": "x,y",
            "desc": "Aggregates across spatial dimensions"
          },
          { 
            "name": "x,y,t",
            "desc": "Aggregates across spatial and time dimensions"
          },
          { 
            "name": "x,y,z",
            "desc": "Aggregates across spatial and vertical dimensions"
          },
          { 
            "name": "t",
            "desc": "Aggregates across the time dimension"
          },
          { 
            "name": "z",
            "desc": "Aggregates across the vertical dimension"
          }
        ]
      }
    }
  }
}

The agg_method property contains a list of the supported aggregation methods with descriptions and the agg_axis property contains a list of the valid axis combinations for the query with descriptions.

A client application could then specify the required aggregation in the query by adding agg_method and agg_axis query parameters.

for example:

http://example.server.org/collections/demo/area?coords=POLYGON((-2.052 52.925,-0.476 51.017,0.887 51.566,-0.608 52.911,-2.052 52.925))&parameter-name=Air Temperature&datetime=2022-04-25T22:00Z/2022-04-27T10:00Z&crs=EPSG:4326&f=CoverageJSON&agg_method=sum&agg_axis=x,y
@m-burgoyne m-burgoyne added enhancement New feature or request API EDR V1.2 Non-breaking change for Version 1.2 labels Apr 27, 2022
@chris-little
Copy link
Contributor

EDR API SWG 75 clarified that this aggregation is not 'regridding', which could be another enhancement, but a summary aggregation.

@chris-little
Copy link
Contributor

Discussion at EDR API SWG76 clarified that the summary is for the full domain, and should specifically exclude sub-selection in the domain. Name of issue needs improving.

@chris-little chris-little changed the title Add support for basic data aggregation Add support for basic data summary May 19, 2022
@solson-nws
Copy link
Collaborator

@dblodgett-usgs @chris-little @m-burgoyne -- Probably need to define the boundary when a process service would come into play versus extending EDR capabilities.

@chris-little
Copy link
Contributor

chris-little commented May 26, 2022

@solson-nws Requiring data from more than one collection is definitely out of scope for summary aggregation, and in scope for API-Processes. E.g. picking out a max value versus combining wind components (u,v) to get speed and direction (ff,ddd).

  • It was suggested that a summary is allowed over one or multiple axes in one collection, such as over x and y together.
  • It was suggested that a summary statistic that cannot be solely calculated from the collection is not allowed. So this would allow mean, min, max, median, lower quartile etc., but not the nth percentile, as the n has to be supplied. Of course, the service could pre-calculate and provide the 95th percentile without defining the other 98.
  • It would be up to the service provider to decide whether values were calculated in advnace or on-the-fly.
  • @dblodgett-usgs suggested that the summary value should be for the whole domain, but @m-burgoyne suggested that if an extent is supplied via a query anyway, then it is reasonable to do the summary over that subset, on-the-fly of course.

@chris-little
Copy link
Contributor

EDR API SWG 81 encourages implementaters to develop a proof-of-concept to explore the feasibility of the proposal.

@chris-little
Copy link
Contributor

the API-Coverages are interested in summary stats

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants