-
Notifications
You must be signed in to change notification settings - Fork 115
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for observational error measurements in data #281
Comments
I am working mainly with statistical output tables (unemployment figures an such) where we sometimes also have the uncertainty. However, most often this is specified using a lower and upper bound of the confidence interval. We currently code this in the variable names (e.g. "measurement_lb" and "measurement_ub") and it has been on our todo list for a while to encode this in the meta data. So +1. However, I think we need more than {
"fields": [
{
"name": "measurement",
"title": "The numeric value",
"type": "number",
},
{
"name": "error",
"title": "The error attached to the numeric value",
"type": "number",
"relation" : { "type": "errorOf", "column": "measurement"}
}
]
} This will also allow people to specify custom relations. Although a list of suggested/default supported relations would be nice. |
@steko @djvanderlaan i think this is a perfect candidate for a "pattern" proposal. A pattern is something that would offer a suggestion of how to solve a particular problem - in this case linking error information to main measurement - without being a formal spec. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Hey all, based on a discussion with @danfowler I'm submitting this proposal to add support for observational error measurements in data, a rather common occurrence in scientific datasets. I can't draft a full spec at the moment but I hope others will chime in with comments from their specific experience. Examples below are archaeology-based.
While the idea came out in the context of data packages, it seems JSON table schema is the area where this kind of support should be added.
Examples
Radiocarbon dates
As can be seen in the Mediterranean Radiocarbon dates dataset (one of the largest open datasets of this kind), radiocarbon dates need to be expressed at least by the conventional radiocarbon age and the error. While it's common to write 3340 ± 45 in text, datasets usually record the two separately. However, the radiocarbon age has no meaning without the attached error.
Neutron activation analysis
Compositional data from INAA (Neutron Activation Analysis) are expressed as parts per million with an attached measurement error as can be seen in the Chemical Composition by Neutron Activation Analysis (INAA) of Neo-Assyrian Palace Ware dataset (a rather common case). In this case, measurement and error are recorded in a single column, separated by
±
.Existing implicit conventions
Separate columns
Single column
Proposed approach
Add a field descriptor in the JSON schema to explicitly mark the values in one field as linked to another field, e.g.:
An alternate approach:
This is just a basic description of the issue to get the discussion started, with no presumption of formal correctness nor exhaustive coverage of the various issues in other disciplines.
The text was updated successfully, but these errors were encountered: