Promote the Missing Values per Field pattern to the Table Schema spec? #861

roll · 2024-01-03T14:03:23Z

Overview

Pattern - https://datapackage.org/patterns/missing-values-per-field/

This pattern is already supported by frictionless-py and, in general, is really easy to implement

The text was updated successfully, but these errors were encountered:

peterdesmet · 2024-01-03T17:14:30Z

While probably a useful pattern, I don’t think it will be easy to implement in frictionless-r, since it is not supported in its dependency readr::read_delim() (https://readr.tidyverse.org/reference/read_delim.html), which only supports global missing values.

roll · 2024-01-04T09:44:59Z

@peterdesmet
What do you think in general our strategy should be in cases like this?

Quickly checking the status:

Pandas (Python) ✔️
Polars (Python/JS/Scala) ✔️
PyArrow (Python) ❌ - Support separate null_values per column in pyarrow.csv.ConvertOptions apache/arrow#34637
readr (R) ❌

Shall we mark this issue as blocked and create (watch) issues in the backends or promote it to the specs anyway as it's really requested in Python - #551?

This problem is obviously is broader as there are some features in e.g. Table Dialect that obviously in the same situation e.g. lineTerminator

khusmann · 2024-01-04T18:52:26Z

In my efforts to implement enumLabels in frictionless-r, I've been creating wrappers for readr that could also be extended to also support field-level missing values. So I think this is pretty feasible for R, although it does require a little extra effort.

I think the larger question here re: implementation compatibility with frictionless features is: To what extent do we incorporate features into frictionless that are not universally available natively across backends, and require implementations to build adapters for?

If we only support the "lowest common denominator" of the features available across implementations, I think that puts us in an overconstrained spot. Instead, I think the spec should try to reflect an encoding for the data that makes sense for the data, independent of the backend / implementation. Because field-level missingness is something that commonly exists in data, I think it should be included in the spec, and then it should be up to the implementation (e.g. frictionless-py, frictionless-r, etc.) to do its best to make those data available in whatever form the backend (e.g. petl, polars, readr) supports. (Or throw an informative error if the feature has not been implemented or adapted for that backend, or throw a warning if information is being lost in the conversion).

The challenge with this, of course, is it means we'd have to keep track of support matrices for frictionless props across implementations (I'm imagining something like browser compatibility lists in web standards).

peterdesmet · 2024-01-29T08:29:43Z

I was hesitant about supporting this (since it’s not straightforward in R), but then I realized frictionless-r already doesn’t support all options when reading data (search for “not supported” in https://docs.ropensci.org/frictionless/reference/read_resource.html). So it think it is ok to add properties without (immediate) support, if they have real use cases and are carefully considered. I can’t assess if that is the case here.

I think a compatibility list is a good idea.

roll added the Table Schema label Jan 3, 2024

roll added this to the v2 milestone Jan 3, 2024

roll mentioned this issue Jan 3, 2024

How to define missing values in a single field? #551

Closed

roll self-assigned this Jan 6, 2024

roll added the discussion label Jan 6, 2024

roll mentioned this issue Jan 6, 2024

Frictionless Standards (v2). Jan 1 - Jan 7 #860

Closed

roll removed the discussion label Jan 25, 2024

roll removed their assignment Jan 26, 2024

roll mentioned this issue Jan 26, 2024

Added field.missingValues frictionlessdata/datapackage-v2-draft#24

Merged

roll self-assigned this Jan 26, 2024

roll added the proposal label Jan 29, 2024

pschumm mentioned this issue Feb 17, 2024

Promote the Enum Labels and Ordering pattern to the Table Schema spec? #875

Closed

roll closed this as completed in frictionlessdata/datapackage-v2-draft#24 Feb 19, 2024

roll mentioned this issue Feb 19, 2024

Frictionless Standards (v2). Feb 19 - Feb 25 #881

Closed

peterdesmet mentioned this issue Feb 19, 2024

Support field.missingValues frictionlessdata/frictionless-r#174

Open

3 tasks

roll modified the milestones: v2.0-draft, v2.0 Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Promote the Missing Values per Field pattern to the Table Schema spec? #861

Promote the Missing Values per Field pattern to the Table Schema spec? #861

roll commented Jan 3, 2024

peterdesmet commented Jan 3, 2024

roll commented Jan 4, 2024 •

edited

Loading

khusmann commented Jan 4, 2024

peterdesmet commented Jan 29, 2024

Promote the Missing Values per Field pattern to the Table Schema spec? #861

Promote the Missing Values per Field pattern to the Table Schema spec? #861

Comments

roll commented Jan 3, 2024

Overview

peterdesmet commented Jan 3, 2024

roll commented Jan 4, 2024 • edited Loading

khusmann commented Jan 4, 2024

peterdesmet commented Jan 29, 2024

roll commented Jan 4, 2024 •

edited

Loading