Skip to content
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.

Sources support in job API #166

Closed
amercader opened this issue Mar 3, 2017 · 12 comments · Fixed by #243
Closed

Sources support in job API #166

amercader opened this issue Mar 3, 2017 · 12 comments · Fixed by #243
Assignees

Comments

@amercader
Copy link
Member

We currently have the following endpoints:

GET /api/job - Job list
GET /api/job/<id> - Job details
POST /api/job - Create job

To allow third party apps to create validation jobs we need to enforce that jobs created via the API have a Source (and provide a way to create sources). This is important to aggregate the jobs conceptually. For instance a source could be a single app (eg the Global Open Data Index) or a single dataset within an app (eg each of the tabular resources in a CKAN instance)

We need to add:

GET /api/source - (Your) Sources list
GET /api/source/<id> - Source details
POST /api/source - Create source

(optionally)
POST /api/source/<id> - Update source (or PUT)
DELETE /api/source/<id> - Delete source (but this needs thought, eg what happens to existing jobs, other users)

Then, to create a job we have two options:

  1. We keep the POST /api/job and enforce a source_id input parameter
  2. We replace the current endpoint with POST /api/source/<id>/job

2 seems more RESTful but I don't mind it that much.

@amercader
Copy link
Member Author

I'll add a separate ticket about API authorization (user tokens)

@amercader amercader added this to the Beta milestone Mar 3, 2017
@amercader amercader added the [1d] label Mar 3, 2017
@amercader
Copy link
Member Author

@roll I think this will actually take less than 1 day but just in case. Let me know what you think.

@brew Would that fit your use case? Note that the create job endpoint would return an id that allows you to check the status of the job and the report once finished. Eventually I want to add supports for webhooks as well

@roll
Copy link
Member

roll commented Mar 3, 2017

@amercader
Honestly I think it's more than 1d (may be much more) on the contrary. The idea is cool but it looks like the whole new concept to introduce to the system.

For example adding new source. What does it mean for non github/s3 random files? There should be designed new taxonomy of this source types and/or special format for storing source configurations. Or something like this.

Also even base task description requires some user authentication system - to get user source list etc. Without it not sure how to regulate access to sources.

So based on our initial discussion not sure why just don't repair api/job endpoint for Beta. It provides one-time validation for any random files/datapackages supported by goodtables without any additional actions to do. Also I guess on-time jobs could even better for GODI for now because there is no need to store meta-information about source list etc - just post job, get report by job_id.

We need only a few hours to make it work again and restrict access by api_access_key for example.

@amercader
Copy link
Member Author

@roll sorry, perhaps I wasn't clear on my original description of the issue.

I think there should not be any special taxonomy or model for API sources, you can validate web accessible files passing a job configuration. There is no path computation, so paths in files or inside data packages must be absolute.

Jobs created via POST /api/job currently work perfectly fine, we just need to make sure a source is attached to them.

  • We add a new integration entry on the table called api.
  • All jobs and sources created via the API will have integration_name = 'api'
  • Users will create a source POST /api/source. Currently they just need to provide a name (validated for uniqueness among sources of type api). If we need more fields in the future these can be stored in the source conf
  • Users create jobs against this source using POST /api/source/<id>/job

Yes, all these endpoints require authorization even listings or object details. This will be done via an API token that must be included in a header on all requests (Authorization: token ... not API_ACCESS_KEY) like you describe in #168. To create a job you must be the creator of the source (ie use the same token or a token for the same user).

We don't bother about UI for these sources for now (the job reports will be available if you navigate to them directly, we can restrict access by default to all users except the one who created the source.

This seems pretty straight-forward to me, perhaps not 1 day of work if we include authorization but nothing too complicated if we stick to a simple use case for the first iteration.

In any case don't work on this quite yet because GODI doesn't need it immediately so you can pick other beta tasks.

@roll
Copy link
Member

roll commented Mar 9, 2017

@amercader
Yes I've been confused by source term usage. Because for now source contains all information to start the job. And what you're describing is something like virtual source just to provide user -> source -> job relations.

Still not sure why we need such mechanism for Beta but it sounds promising as the general mechanism for random jobs if you prefer it over just providing access to api/job directly (as atomicity "stateless" on-demand calls).

@amercader
Copy link
Member Author

Yes, I think that matching the API jobs to the integrations model (Integration > Source > Job) will make our lives easier in the long run.

Unless @pwalsh or @brew need this in the next 2 weeks let's park this for now and focus on what we had agreed on for Beta first.

@roll
Copy link
Member

roll commented Mar 9, 2017

@brew
@pwalsh
Also here - #168 - is described how GODI could use goodtables.io already. So if we need for now temporarily restrict access to api/job we could whitelist or use api_key. If not - just close #168 and we're good to go to serve GODI jobs anyway.

@pwalsh
Copy link
Member

pwalsh commented Mar 16, 2017

@amercader I'm having trouble following between this and #168 which @roll links to as a possibility that already works, and the above comment from @roll that says "If not - just close #168 and we're good to go to serve GODI jobs anyway."

I need a tl;dr: can GODI hit an endpoint on gt.io to validate data, and then extract the report, and if so, how, and if not, when.

@amercader
Copy link
Member Author

@pwalsh

Tl;DR:

So GODI can start using the endpoint now, but eventually will have to change the requests slightly when #166 is implemented.

@pwalsh
Copy link
Member

pwalsh commented Mar 16, 2017

thanks @amercader and cc @brew

@roll
Copy link
Member

roll commented Mar 16, 2017

So possible actions:

  1. do nothing for beta because it already works for GODI
  2. implement Restrict acces to /api/job by API_ACCESS_KEY #168 (estimated as 2 hours) to restrict access in the simplest way possible
  3. implement Sources support in job API #166 - the full solution - could take some extended time

So based on our situation 1 or 2 should be easy choice (with backlogging #166).

@amercader
Copy link
Member Author

We'll do 2, 3 is backlogged but a requirement for whenever we want to make the API public (which will require proper auth anyway)

@amercader amercader modified the milestones: Backlog, Beta Mar 16, 2017
@amercader amercader modified the milestones: Backlog, Gamma Apr 27, 2017
@amercader amercader mentioned this issue May 16, 2017
7 tasks
@amercader amercader added [2d] and removed [1d] labels May 16, 2017
@amercader amercader self-assigned this May 17, 2017
@amercader amercader assigned roll and unassigned amercader May 26, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants