-
Notifications
You must be signed in to change notification settings - Fork 2
[epic] Exemplar and test data files and data packages #28
Comments
Test data files and data packages could be organised in many different ways. In my data-package-examples repo I created a data package for each data I created a goodtables.yml file that I could change to validate one, some or all the data packages. Unfortunately GoodTables can only provide a badge for the whole repo rather than a badge for each data package in the repo (see forum question). Other data packages could test for
Feedback on what's useful is welcome |
How is this different from the test suite repositories we have here? |
@pwalsh which test suite repositories? Getting a list of existing ones would be useful 😄 (and was your comment directed to general issue or @Stephen-Gates ?) For me the motivation of this issue is having one clear reference repo with test data -- I know there are various different sources and I'm not sure which is the best one, and I've personally ended up creating my own test data in e.g. https://github.com/datahq/data.js |
@pwalsh I had ignored testsuite-basic and testsuite-extended as they were labelled Specifically for Data Curator, as the focus was on local processing, I needed datapackage.zip files as that's the only way it can create or open a data package at present. |
@Stephen-Gates those are useful and i've added to the research section. If you have any sample data in your repos please link that. |
@roll can i get some kind of comment of where we could boot this and start work on it - this is something i think community members (including myself) could contribute to if it was clear where we can start. Also could you please document all the sources you know of please 😄 |
Existent resources:
I think the best repo for this work will be - https://github.com/frictionlessdata/example-data-packages - we could just continue Dan's work. Related to the Paul's words about duplication with |
Would you like me to contribute the data packages that go with the Point location data in CSV files guide into https://github.com/frictionlessdata/example-data-packages? |
If everyone (esp. @rufuspollock as a facilitator of this work) are agreed on the repo selection it would be great! Please let me (or OKI) know if we need to grant some github rights to simplify the process. |
I've started a PR #2 that currently contains 6 of the 7 examples in the Point location data in CSV files guide. I've changed the goodtables.yml file to only test these new data packages.
@rufuspollock would you like .zip files for each data package? Perhaps a directory called Lastly these packages have been hand crafted as I thought it would be good for the property order to mimic the specification and the json "beautified". (Data Curator doesn't do that yet.) If you have thoughts on how you'd like contributions, perhaps you could update the readme.md? |
@roll can you clarify the role and purpose of https://github.com/frictionlessdata/testsuite-extended - the README is not super informative to me. @Stephen-Gates as per the original issue thread I think we probably want two distinct sets of stuff:
I think the best thing right now would be to draft the README for these repos (even though they don't exist yet) from the point of view of someone using them 😄 I've booted a hackmd here: https://hackmd.io/CYBgnBBGCMCmC0BjYwBm8AsIQA56TFQVTAFZTUcBmCWAQwDYg=== Please dive in and contribute. Once we've got a repo set up we'll move there. |
@rufuspollock just to confirm - https://github.com/frictionlessdata/example-data-packages is for example/exemplar data packages, and not test data? Suggest there should be data packages supporting
Should the repo contain "experimental" data packages, for example the concepts being proposed in the Spatial Data Package Research? Should the repo contain .zip versions of each package? Started making notes in readme. |
@rufuspollock
The structure of test suites reflects it:
|
@Stephen-Gates somehow i missed your comment when you wrote it last week!
Yes, that is right.
I think that makes sense.
If needed for guidance, sure but I note we have not yet resolved on our bundle spec (we really should resolve on a pattern for that frictionlessdata/datapackage#132 |
@Stephen-Gates @roll I've now booted a test data repo here: https://github.com/frictionlessdata/test-data We can start opening issues and consolidating material there for test data. |
Thanks @rufuspollock I will focus on example data packages for now. There are 2 PRs for review although one has .zips I made before your answer above. I personally will find having .zips helpful and I believe most implementations support it. I also need some help with instructions for local validation as discussed in https://github.com/frictionlessdata/example-data-packages/issues/6 I've fixed the data packages referenced by the Guides but there are others that need to be fixed / archived / deleted frictionlessdata/examples#4 - thoughts? |
@roll @rufuspollock as suggested in #28 (comment), If not, please look at the PR frictionlessdata/examples#8 and the license proposed in frictionlessdata/examples#5 |
FIXED. We've started work on test data repo: https://github.com/frictionlessdata/test-data @Stephen-Gates @roll please take a look and open issues for any suggested improvements. We've got an in progress exemplar repo: https://github.com/frictionlessdata/example-data-packages |
Would still like request above resolved #28 (comment) |
@Stephen-Gates |
@Stephen-Gates just wanted to flag https://github.com/datapackage-examples - this are more focused on view examples but may be useful too. |
Creating a single unified source of test and exemplar data for the frictionless data community.
UPDATES: we need two repos one for test data (packages) and one for exemplar data (packages):
Hackmd for drafting READMEs etc:
https://hackmd.io/CYBgnBBGCMCmC0BjYwBm8AsIQA56TFQVTAFZTUcBmCWAQwDYg===
User Stories
Thinking about requirement for example (exemplar and test) data packages:
My sense is that the "exemplar" and "test" use cases are somewhat different. 1+3+4 are exemplar and want "nice' data packages". 2 (+1) are more test and are about testing the real range of sitautions and being super simple for testing.
I think we should focus on is the test (lib developer) case to start with.
TODO: create a separate issue for exemplar data files and packages.
Comment: probably want versioning and ability to git submodule so that users of the test data can pin the data they are developing against (e.g. if data package spec gets upgraded they can still keep old spec versions if they need them).
Acceptance criteria
Tasks
Research
The text was updated successfully, but these errors were encountered: