Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[super] Continuous Data Integration workflows around Data Packages #198

Closed
5 of 8 tasks
rufuspollock opened this issue Oct 12, 2015 · 3 comments
Closed
5 of 8 tasks

Comments

@rufuspollock
Copy link
Contributor

rufuspollock commented Oct 12, 2015

For a long time we've discussed "continuous data integration" and doing this with Data Packages. This is a discussion space for these ideas.

What do we mean by "Continuous Data Integration"?

Continuous Integration with code means automated running of tests (and sometimes deployments) whenever new code is pushed.

For data, this would mean automatically validating (testing) data and metadata on each new contribution and running any deployment tests.

Automatically validating a Data Package data against its schema on each commit (assuming the data was stored in git)

Sub parts

Initial Implementation

We could start out piggy-backing on existing code CI services such as Travis: instead of running code test we run data validate tests as a code test.

By hand version

  • create a demo repo - probably clone a core data package dataset
  • implement a tiny test script in scripts called datatest.py or similar. Use goodtables to test the data
  • turn on travis ci and run the test script
    • bonus points for somehow showing pretty output from goodtables

Standardize:

@rufuspollock rufuspollock changed the title Continuous Integration workflows around Data Packages Continuous Data Integration workflows around Data Packages May 1, 2016
@rufuspollock
Copy link
Contributor Author

@rufuspollock
Copy link
Contributor Author

Working pretty well.

@rufuspollock rufuspollock added this to the Current milestone May 6, 2016
@rufuspollock rufuspollock changed the title Continuous Data Integration workflows around Data Packages [super] Continuous Data Integration workflows around Data Packages May 6, 2016
@roll roll removed this from the Current milestone Aug 8, 2016
@roll roll added the backlog label Aug 8, 2016
@danfowler
Copy link
Contributor

This issue was moved to #48

@danfowler danfowler removed the backlog label Aug 10, 2016
@roll roll moved this to Done in Open Knowledge Dec 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants