Support for observational error measurements in data #994

steko · 2016-08-15T19:04:05Z

steko
Aug 15, 2016

Hey all, based on a discussion with @danfowler I'm submitting this proposal to add support for observational error measurements in data, a rather common occurrence in scientific datasets. I can't draft a full spec at the moment but I hope others will chime in with comments from their specific experience. Examples below are archaeology-based.

While the idea came out in the context of data packages, it seems JSON table schema is the area where this kind of support should be added.

Examples

Radiocarbon dates

As can be seen in the Mediterranean Radiocarbon dates dataset (one of the largest open datasets of this kind), radiocarbon dates need to be expressed at least by the conventional radiocarbon age and the error. While it's common to write 3340 ± 45 in text, datasets usually record the two separately. However, the radiocarbon age has no meaning without the attached error.

Neutron activation analysis

Compositional data from INAA (Neutron Activation Analysis) are expressed as parts per million with an attached measurement error as can be seen in the Chemical Composition by Neutron Activation Analysis (INAA) of Neo-Assyrian Palace Ware dataset (a rather common case). In this case, measurement and error are recorded in a single column, separated by ±.

Existing implicit conventions

Separate columns

id, data, error
0, 34, 0.2

Single column

id, data
0, 34 ± 0.2

Proposed approach

Add a field descriptor in the JSON schema to explicitly mark the values in one field as linked to another field, e.g.:

{
    "fields": [
      {
        "name": "measurement",
        "title": "The numeric value",
        "type": "number"
      },
      {
        "name": "error",
        "title": "The error attached to the numeric value",
        "type": "number",
        "errorOf": "measurement"
      }
    ]
}

An alternate approach:

{
    "fields": [
      {
        "name": "measurement",
        "title": "The numeric value",
        "type": "number",
        "errorField": "error"
      },
      {
        "name": "error",
        "title": "The error attached to the numeric value",
        "type": "number"
      }
    ]
}

This is just a basic description of the issue to get the discussion started, with no presumption of formal correctness nor exhaustive coverage of the various issues in other disciplines.

djvanderlaan · 2016-08-17T14:43:43Z

djvanderlaan
Aug 17, 2016

I am working mainly with statistical output tables (unemployment figures an such) where we sometimes also have the uncertainty. However, most often this is specified using a lower and upper bound of the confidence interval. We currently code this in the variable names (e.g. "measurement_lb" and "measurement_ub") and it has been on our todo list for a while to encode this in the meta data. So +1.

However, I think we need more than errorOf. A mentioned above we often have a lower and upper bound. What also is used are relative errors (%). The most flexible way would be to be able to specify arbitrary relations between columns. Perhaps something in the line of:

{
    "fields": [
      {
        "name": "measurement",
        "title": "The numeric value",
        "type": "number",
      },
      {
        "name": "error",
        "title": "The error attached to the numeric value",
        "type": "number",
        "relation" : { "type": "errorOf", "column": "measurement"}
      }
    ]
}

This will also allow people to specify custom relations. Although a list of suggested/default supported relations would be nice.

0 replies

rufuspollock · 2016-08-22T06:42:16Z

rufuspollock
Aug 22, 2016
Maintainer

@steko @djvanderlaan i think this is a perfect candidate for a "pattern" proposal. A pattern is something that would offer a suggestion of how to solve a particular problem - in this case linking error information to main measurement - without being a formal spec.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for observational error measurements in data #994

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Support for observational error measurements in data #994

steko Aug 15, 2016

Examples

Radiocarbon dates

Neutron activation analysis

Existing implicit conventions

Separate columns

Single column

Proposed approach

Replies: 2 comments

djvanderlaan Aug 17, 2016

rufuspollock Aug 22, 2016 Maintainer

steko
Aug 15, 2016

djvanderlaan
Aug 17, 2016

rufuspollock
Aug 22, 2016
Maintainer