Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote the Missing Values per Field pattern to the Table Schema spec? #861

Closed
roll opened this issue Jan 3, 2024 · 4 comments · Fixed by frictionlessdata/datapackage-v2-draft#24
Assignees
Milestone

Comments

@roll
Copy link
Member

roll commented Jan 3, 2024

Overview

Pattern - https://datapackage.org/patterns/missing-values-per-field/

This pattern is already supported by frictionless-py and, in general, is really easy to implement

@roll roll added this to the v2 milestone Jan 3, 2024
@peterdesmet
Copy link
Member

While probably a useful pattern, I don’t think it will be easy to implement in frictionless-r, since it is not supported in its dependency readr::read_delim() (https://readr.tidyverse.org/reference/read_delim.html), which only supports global missing values.

@roll
Copy link
Member Author

roll commented Jan 4, 2024

@peterdesmet
What do you think in general our strategy should be in cases like this?

Quickly checking the status:

Shall we mark this issue as blocked and create (watch) issues in the backends or promote it to the specs anyway as it's really requested in Python - #551?


This problem is obviously is broader as there are some features in e.g. Table Dialect that obviously in the same situation e.g. lineTerminator

@khusmann
Copy link
Contributor

khusmann commented Jan 4, 2024

In my efforts to implement enumLabels in frictionless-r, I've been creating wrappers for readr that could also be extended to also support field-level missing values. So I think this is pretty feasible for R, although it does require a little extra effort.

I think the larger question here re: implementation compatibility with frictionless features is: To what extent do we incorporate features into frictionless that are not universally available natively across backends, and require implementations to build adapters for?

If we only support the "lowest common denominator" of the features available across implementations, I think that puts us in an overconstrained spot. Instead, I think the spec should try to reflect an encoding for the data that makes sense for the data, independent of the backend / implementation. Because field-level missingness is something that commonly exists in data, I think it should be included in the spec, and then it should be up to the implementation (e.g. frictionless-py, frictionless-r, etc.) to do its best to make those data available in whatever form the backend (e.g. petl, polars, readr) supports. (Or throw an informative error if the feature has not been implemented or adapted for that backend, or throw a warning if information is being lost in the conversion).

The challenge with this, of course, is it means we'd have to keep track of support matrices for frictionless props across implementations (I'm imagining something like browser compatibility lists in web standards).

@peterdesmet
Copy link
Member

I was hesitant about supporting this (since it’s not straightforward in R), but then I realized frictionless-r already doesn’t support all options when reading data (search for “not supported” in https://docs.ropensci.org/frictionless/reference/read_resource.html). So it think it is ok to add properties without (immediate) support, if they have real use cases and are carefully considered. I can’t assess if that is the case here.

I think a compatibility list is a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants