Skip to content

Commit

Permalink
Add support for labeled missingness
Browse files Browse the repository at this point in the history
Resolves frictionlessdata/datapackage#880. When
paired with the `categorical` type, this gives us full support for the
value labels found in many statistical software packages.

I have included it here rather than a separate PR because these
issues are intertwined and there's a synergy in addressing them
simultaneously. That said, if you would rather see this in a PR, let me
know and I can revert.
  • Loading branch information
khusmann committed Apr 3, 2024
1 parent ceff8bf commit d66cdb1
Showing 1 changed file with 11 additions and 2 deletions.
13 changes: 11 additions & 2 deletions content/docs/specifications/table-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,9 +125,18 @@ Many datasets arrive with missing data values, either because a value was not co

`missingValues` dictates which string values `MUST` be treated as `null` values. This conversion to `null` is done before any other attempted type-specific string conversion. The default value `[ "" ]` means that empty strings will be converted to null before any other processing takes place. Providing the empty list `[]` means that no conversion to null will be done, on any value.

`missingValues` `MUST` be an `array` where each entry is a `string`.
`missingValues` `MUST` be an `array` where each entry is a `string`, or an `array` where each entry is an `object`.

**Why strings**: `missingValues` are strings rather than being the data type of the particular field. This allows for comparison prior to casting and for fields to have missing value which are not of their type, for example a `number` field to have missing values indicated by `-`.
If an `array` of `object`s is provided, each object `MUST` have a `value` and optional `label` property. The `value` property `MUST` be a `string` that matches the physical value of the field. The optional `label` property `MUST` be a `string` that provides a human-readable label for the missing value. For example:

```json
"missingValues": [
{ "value": "", "label": "OMITTED" },
{ "value": "-99", "label": "REFUSED" }
]
```

**Why strings**: `missingValues` are specified as strings rather than being the data type of the particular field. This allows for comparison prior to casting and for fields to have missing value which are not of their type, for example a `number` field to have missing values indicated by `-`.

Examples:

Expand Down

0 comments on commit d66cdb1

Please sign in to comment.