Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What sort of APIs do we want to have? #57

Closed
hcayless opened this issue Mar 29, 2017 · 43 comments
Closed

What sort of APIs do we want to have? #57

hcayless opened this issue Mar 29, 2017 · 43 comments

Comments

@hcayless
Copy link
Contributor

During the discussion around #56, we realized we have a fundamental disagreement as to API philosophy between two camps (not listed in order of preference):

  1. An API should always tell its client where it can go next from any API response, using link relations
  2. An API client should know what sort of API it is accessing, and so can infer where to go next based on the metadata provided in an API response

The resolution of this question is necessary in order to determine the form of the solution for #56, #54, and #39. If we choose (1), then the properties in question would be in the form of links (location, references, etc.). If (2) they would be in the form of boolean properties (referenceable, etc.).

@balmas
Copy link
Contributor

balmas commented Mar 31, 2017

My two cents: I'm not convinced it's a disagreement on what an API should do. I think what we've been defining up until now is the API functionality that is specific to a distributed texts collections service. Adding in HATEOS style links for navigation definitely needs to be done, the question is whether we want to bake that into the model or not. My personal vote would be to see this done via the implementation -- Swagger/OpenAPI 3.0 allows a way to do it with the links feature, I think we should try to take advantage of that and see if it meets our needs.

@hcayless
Copy link
Contributor Author

I'm confused though. I can see how this might work for plugging into the reference API: the API defines a reference lookup function, into which you plug an identifier, which you'd get from the collection metadata, and the collection metadata has a property that tells you it's "referenceable", so you know you can plug the id into your ref lookup function, yes?

I don't see how this can work for file download locations, which could be anywhere—they might be an API function, but could equally well be any URI.

Is the argument really over whether the API should have a fixed, known set of URI schemes as opposed to the API telling you where you can go next?

@PonteIneptique
Copy link
Member

PonteIneptique commented Mar 31, 2017

To me this is where the argument lies. I want "simple" properties that tells you you can use other routes (ie if referencable==true, then you can go to /dts/v1.0/references/{id}.

I would usually model this as a collection of references. If you want the client to specify an ID, I would do that using a URI template - the client can provide the id, but should not be required to know the URI or parse it.

@balmas
Copy link
Contributor

balmas commented Mar 31, 2017

Maybe we can get closer to a solution if we can confirm what we do agree upon.

Can we all agree on the following statements:

  1. Members of a DTS Collection may
    1a. be readable at the DTS Passages API
    1b. be readable at the DTS References API
    1c. have one or more related resources of various mime types which can be directly retrieved at a URL

  2. If of 1a-1c is true for a member of a collection, the DTS Collections API should state it explicitly

  3. For 1c, It's not enough for the API to say there is a related resource at URL X, it needs to say what the format (mimetype) of that resource is and what it's relationship to the collection member is.

  4. The location of the DTS Passages API and DTS Refrences endpoints may be the same for all members of a collection but may also be different for different members

  5. Unnecessarily repeating a potentially length URL string across many members of a DTS collection (e.g. such as http://mycollectionservice.org/references) is not desirable, particularly for responses which may contain thousands of items

@jonathanrobie
Copy link
Contributor

I think we should take one of three "purish" positions so that our design follows a well-known approach. Let me list them:

  1. Pure Swagger HTTP API - the URI structure is meaningful and drives the design. This is a tightly coupled approach, all servers must implement the same endpoints, and the server and client cannot evolve independently. The client API is defined by the documentation, and message responses do not tell you what you can do from a given state.
  2. Pure REST approach - the service has one entry point, which lists all available choices using link relations. A client needs to know the link relations and how to parse a message to find links and link relations. A client never needs to parse a URI. When URI parameters are provided, they are provided via URI templates. Versioning is done by changing the payloads, so a well behaved client continues to work as the service evolves.
  3. Layered approach - a pure REST API is built on top of a pure swagger API. A client can actually be written using either approach. Because some clients will take the pure Swagger approach, the server is tightly coupled to such clients, but clients that are RESTful continue to work as the service evolves.

I strongly prefer a #2 API. I would not be able to support a #1 API. Swagger 3.0 supports #3. Can it also document a #3 API as though it were a #2 API?

Worse than any of these is an ad-hoc approach, where our API has its own quirky conventions.

@jeffreycwitt
Copy link

I prefer number 2 as well. But think 3 is possible, but always derivative of number 2.

@hcayless
Copy link
Contributor Author

hcayless commented Apr 5, 2017

I think I agree with all of Bridget's points, except I'm uneasy about 5. Do we know providing links will add massive overhead?

@hcayless hcayless closed this as completed Apr 5, 2017
@hcayless hcayless reopened this Apr 5, 2017
@PonteIneptique
Copy link
Member

Notice : I am teaching in Lyon today and it was not that much expected. I won't be able to join the meeting today.

@jonathanrobie
Copy link
Contributor

I also agree with all of Bridget's points except #5. I doubt that providing links in responses will result in massive overhead. If that is a concern, and we use json-ld, we can use @context to shorten the links, e.g.

  "@context": {
    "ical": "http://www.w3.org/2002/12/cal/ical#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "ical:dtstart": {
      "@type": "xsd:dateTime"
    }

But each link is surrounded by metadata and other data that probably adds up to more bytes than the link itself, so I'm not sure this is a concern. If you have thousands of results, you are probably paging or querying results, or both.

@PonteIneptique
Copy link
Member

PonteIneptique commented Apr 19, 2017

So, the famous, awaited, critic of where some of us want to go.

1. Coherence

I think we can all agree here. Whatever we decide (use full links, use canonical API routes [all the same wherever your implementation lies], etc.) for this should be applied to both other routes ( https://github.com/distributed-text-services/references-api , https://github.com/distributed-text-services/passage-api ). In this situation, the realities of both those routes should be taken into account.

2. Providing links : an overhead

Do we know providing links will add massive overhead?

If you have thousands of results, you are probably paging or querying results, or both.

I think this has been for a long time my argument : providing full link is an awful overhead. There is use cases where you want to retrieve all references from a text. Paging would actually create more overhead because you would add up HTTP queries on top of data weight (With no paging : 12000 references + 1 http request, with paging : 12100 references + 12 http requests*).

I wrote a small but speaking-for-itself example : I queried the Iliad of Homer for its lines numbers. A good example of why would I like to do that is simply computing how much of the lines should I group together to get something nice to read. This is currently what we do with Nemo (Get all resources at the lowest level, compute how we should group these references).

https://gist.github.com/PonteIneptique/959bb902299ccdb9090221b3982327b4

Resource Size Comparison (base No link 132kb)
CTS GetValidReff 964.3 kB 730.30 %
Full Link Potential DTS 1,716.756 kB 1300 %
No Link Potential DTS 132.9 kB 100 %
No Link Prefix DTS 237.kb 179.54 %

Both JSON output were minified.

Benchmark of parsing to get the full URI (Comparison between prefixed and no URN). The regexp is a necessary way to deal with prefix, because it would be unknown before reaching the client and could quite easily change from one query to the other depending on implementation.

check_time_for_dts_link_ jsperf-_2017-04-19_13 20 32

98% slower is the important score.

3. Similar routes

What really got me into CTS is the fact the routes are forced. Not giving links forces also to have forced URIs. What's great about it ? Wherever I go, CTS should work the same way, I can try to query directly by knowing an identifier ( say urn:cts:latinLit:phi1294.phi002.perseus-lat2) if it's on Perseus or Perseids, because I know I have to do GetValidReff or GetPassage. Enforcing this kind of structure makes sure that it's easily to swap one endpoint by another without expecting much differences, without having to reparse and go back from /collections route.

I hope this proved why full link is MUCH heavier.

@hcayless
Copy link
Contributor Author

Some thoughts:

this isn't an issue of the collections api though, is it? This looks like what you'd get if you called whatever will replace GetValidReff. Is there the same level of problem if we give a URI for each member of a collection?

Your example is comparing the overhead of applying a regular expression to a string to doing string concatenation. And, yeah, it's more expensive. I'd note that it still may be acceptably fast.

Even so, could the URIs in your reference list not be relative instead of absolute? If the API request URI is http://ctsstage.dh.uni-leipzig.de/api/dts/texts/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/references, why not have passage/1.1 etc. in the reference list?

I do wonder if there's a sensible way to deliver a passage URI pattern that would tell the client not just what the big list of possible references is list getValidReff does, but what passage query URIs look like.

@PonteIneptique
Copy link
Member

this isn't an issue of the collections api though, is it? This looks like what you'd get if you called whatever will replace GetValidReff. Is there the same level of problem if we give a URI for each member of a collection?

For this point, I would like to refere to my point 1 : Coherence. If we do move to links, we move to links everywhere. That does not make sense if we do not.

Your example is comparing the overhead of applying a regular expression to a string to doing string concatenation. And, yeah, it's more expensive. I'd note that it still may be acceptably fast.

Of course, but how else are you going to do prefix replace ? This has to be taken into account. 300 times slower is not an acceptable difference to me though, even if it "still may be acceptably fast"

Even so, could the URIs in your reference list not be relative instead of absolute? If the API request URI is http://ctsstage.dh.uni-leipzig.de/api/dts/texts/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/references, why not have passage/1.1 etc. in the reference list?

If you have only "/passage/1.1", it's technically saying as much as "1.1", except it takes more space, because this is not per se a URI, and as such, you cannot use it "as a Web page" as Jonathan said. Because there is nothing, as a string, that differentiates "/passage/1.1" and "1.1", and as such, it still forces the person to check the documentation. Then why should they have to take the burden of more bytes / reff ?

I do wonder if there's a sensible way to deliver a passage URI pattern that would tell the client not just what the big list of possible references is list getValidReff does, but what passage query URIs look like.

I actually thought also about that : what about if I want to know more about what other reffs there is in 1.1 ? Do I need to put two links ? :) And double the size of the answer ? If we do not use links, why not simply enforce a route scheme that would make everything MUCH simpler and lighter for end users to program with ?

@jonathanrobie
Copy link
Contributor

jonathanrobie commented Apr 20, 2017 via email

@PonteIneptique
Copy link
Member

Could you propose an example of what you're speaking of ? Because otherwise it's not much clearing the fog out...

@jonathanrobie
Copy link
Contributor

jonathanrobie commented Apr 20, 2017 via email

@jonathanrobie
Copy link
Contributor

jonathanrobie commented Apr 20, 2017 via email

@PonteIneptique
Copy link
Member

PonteIneptique commented Apr 20, 2017

My use cases are :

  1. Ask what references are available for a collection
  2. Ask for all instances of these references in a query (would a user really do that?)

Yes they would

That's all that is needed. A list of references

@PonteIneptique
Copy link
Member

PonteIneptique commented Apr 24, 2017

@hcayless I rework your edit to make it run : https://jsperf.com/check-time-for-dts-link/9

check_time_for_dts_link_ jsperf-_2017-04-24_16 03 41

It still a 1/8 ratio. As a probable future user of the API, this seems to me not acceptable.

I would really like that we take the path of an API that is light to produce and to use. I am looking forward for another proposal of @jonathanrobie

For the routes, we have to remind that if we had solid, shared routes, the reuse and navigation from one of our API to one of another of our APIs would be much, much easier to do client-wise. In the end, I want people (Not the providers) to use the API as well...

Edit :

For the sake of the argument, I share a new screen of the test, in the same session, with the same Firefox version.
check_time_for_dts_link_ jsperf-_2017-04-24_16 08 56

@hcayless
Copy link
Contributor Author

I just wanted to see what the actual overhead of the Regex was, but then it wouldn't let me publish the fixed version, so I gave up :-). I still think this only proves that doing more work is more expensive, not that using link relations is bad.

@balmas
Copy link
Contributor

balmas commented Apr 24, 2017

My preference is for a layered API approach - allowing people to build a purely RESTful API on top of the Swagger API if that's what they want to do, but using Swagger/OpenAPI as the base for defining and describing the functionality. For better or worse, that's where we are now. We made an attempt to start with the pure RESTful approach and it stalled so I went the Swagger route and that's what we have.

I disagree with the point about not having predefined routes. The link relations approach is nice from a theoretical perspective, but in practice, I don't really believe that predefined routes are difficult to implement, support or write clients to.

To borrow from the IIIF terminology, we might think of what we have right now in the Collections API swagger spec as a collection 'manifest'. It describes the collection and its members, and allows for member items to describe their relationship to the parent collection using vocabulary that is specific to the publisher of the collection.

The only navigation that is really part of it at the moment is getting from a list of all collections to retrieving one collection and I think that's fairly non-controversial.

Where we are getting tripped up is in describing how to navigate into a collection for the passages and references API.

I think @PonteIneptique has demonstrated that using full URIs for the equivalent of the CTS GetValidReff command is probably not viable. I think there is more benchmarking needed to determine whether the relative path or pattern approach is reasonable. One issue I have with the benchmark tests is that I think for the use case that retrieves all of the references at once, it's probably not a case where you also need to do pattern replacement on all of those URIs - i.e if you're doing it to compute grouping, you only need to complete the pattern once you've done the grouping, so it's for a much smaller number of cases.

My preference would be to move forward by looking at how to use OpenAPI 3.0 to describe navigation into the collection, taking advantage of the pattern matching it offers. I will volunteer to make a proposal using that in the coming days, probably not by this week's standup, but definitely by the next one (i.e. by May 3rd).

I do think that requiring a pure RESTful approach would be the point at which I jump off the effort. Or at least, I would step back and let someone else take the lead on defining what that would look like both for the collections API and the passages/references APIs.

@PonteIneptique
Copy link
Member

PonteIneptique commented Apr 24, 2017

@hcayless As a potential heavy user of such an API, 1/8 or 1/40 ratio is a pretty big deal to me. Call me crazy :') Though, I do agree with you that the site is awful when it comes to edit 👍

Until @jonathanrobie shows his proposal, I'll not comment much more. I do agree with @balmas on some points. I actually still think that full or relative URI on collections is gonna be as expensive, albeit there will be less data to parse (The way @hcayless implementation and mine compare, there will ALWAYS be a 1/8 ratio, ratio that will grow on small amount on data because of dict access. But could be ignored given its importance.).

I am all for one thing : a light to parse, light to produce/transmit API. If it becomes much heavier over philosophical points, I am not sure I'll be interested as a user (given that I won't be a provider anymore :) ).

@jonathanrobie
Copy link
Contributor

jonathanrobie commented Apr 24, 2017 via email

@jonathanrobie
Copy link
Contributor

jonathanrobie commented Apr 24, 2017 via email

@PonteIneptique
Copy link
Member

Well, I definitely think this proposal is out of what I'd like to support, by completely negating the access via unified routes and routes that represents objects. ie, Collections has references, Collections has readable passages.

I think RESTful and this kind of models are out of scope from me. We are going really far away from a unified API that would be easy to communicate with and standard it its answers. CTS is in the end fitting much more my needs.

If the result of this effort is everyone implement its own routes, its own prefixing or not prefixing system, its own full response, I won't support it much further, because I won't take the time to write such a client and support it over time.

@balmas
Copy link
Contributor

balmas commented Apr 24, 2017

First, I don't think my participation should be the touchstone by which any decisions are made. The needs of the community of users and developers should be the main issue here. I'm not sure I qualify as either user or developer of DTS right now.

We started this effort because of these primary problems with the CTS model:

  1. it didn't allow for text collections which couldn't adhere to the rigid CTS textgroup/work/edition model

  2. the XML overhead of the request/responses was too high, particularly for the GetValidReff and GetCapabilities calls

  3. the routes were not RESTful

I think we didn't all have same idea about what the last point meant -- it does seem clear now that for some of us the priority was for more web-friendly routes whereas for others it's the full HATEOS approach that is critical.

It could be that there is no meeting of the minds to be had. I am hopeful that OpenAPI 3.0 might offer us a middle ground. Most compromises leave everyone at least a little unhappy though.

@hcayless
Copy link
Contributor Author

So thought experiment: if my collections API, when it reaches an edition (i.e. something you can grab passages from, could be a work with a default edition), has a) a link relation that says "go here for valid references on this work", and b) a link relation that says "query this link with a reference to retrieve a passage", and a) gives you a list of references that you can plug into b), have we suddenly become horrible?

I don't think the passage API can be super RESTful in a way, because it's really about querying, just with a contextually-constrained "language" (e.g. you can have ranges, which aren't something you'd get from your big list of references, and what you can query depends on the structure of the work). But I do think the collections API is very amenable to the RESTful treatment and I don't want to give that up.

@jonathanrobie
Copy link
Contributor

jonathanrobie commented Apr 24, 2017 via email

@jonathanrobie
Copy link
Contributor

jonathanrobie commented Apr 24, 2017 via email

@balmas
Copy link
Contributor

balmas commented Apr 24, 2017

The suggestion by @hcayless seems reasonable to me and that is essentially what I was going to try to represent with OpenApi 3.0.

@jonathanrobie
Copy link
Contributor

jonathanrobie commented Apr 24, 2017 via email

@hcayless
Copy link
Contributor Author

I think one motivator for this is that, particularly in the case of specific editions, the citation scheme may not be totally regular. An edition may change the order of lines, or mark interpolated or repositioned lines as 1a, 1b, etc., or start at line 10 (or do the same with larger units). So it's not necessarily enough to know the citation scheme and end point—you may actually need a list of all the referenceable units in order. This is a pretty common circumstance. Along with this issue (and because of it), if you want to align two editions, you'll probably need to proceed with full lists of references for both.

@PonteIneptique
Copy link
Member

Okay, so with the current situation, would people be happy with the following compromise :

  1. The DTS API has standard routes / URI. The location of the DTS API is depending on the implementer (eg /dts, /api/dts, /api/dts/1.0, /api, /text-api) but the routes starting after this one are fixed (eg /dts would have - if we agree on these routes - /dts/collections, /dts/references, /dts/passages )
  2. The DTS API has full links (can use prefix in the context of JSON-LD) on the /collections API but not on the /references and /passages. (to me, this would mean losing coherence, but if that's what it takes to get 1, I'll go with it)
  3. The DTS API has short, light answer and not URI when it comes to references. (Kind of a repetition of 2. though.)

I think to me 3. and 1. are the most important. I can give up on the full links on the /collection API if that leaves those two points open.

@PonteIneptique
Copy link
Member

(on another note, @jonathanrobie , as @hcayless said, what you envision is something that would only be numeric and incremental. A lot of system have alphenumeric citation system. In Perseus, we have some "pr" poems, we have inversed numbers [a lot in drama], etc. :) )

@hcayless
Copy link
Contributor Author

@PonteIneptique Re your point 1, I can think of a couple of objections: first, I think we're already starting to see that the three APIs may work in different ways: collections is about browsing, references is about discovering the internal structure of a document, and passages is about identifier (and reference) resolution. Do we know they'll want similar kinds of endpoint? I'm not so sure... This is also why I think we'd not necessarily be inconsistent if we adopt this way of doing things.

Second, your fixed routes rule out certain kinds of implementation. When I worked at the UNC Library, the systems team mostly wouldn't let us have new rewrite rules—so my endpoint would have had to be collections.php or something like that (I will happily stipulate that this was mad, but nonetheless it was an operational constraint). I don't see why that sort of implementation style should be illegal. An implementation might equally well defer passage lookup to a different service, so maybe it's on a different port or virtual host than the collections service. Again, I don't see why that's bad. Your suggested routes look perfectly fine, and I might expect a reference implementation to look like that, but I'm not sure why they need to be mandatory.

@PonteIneptique
Copy link
Member

PonteIneptique commented Apr 25, 2017

@hcayless My point is that a user want to work with standards API without having to figure where is the route for passage or references. If people do not want passage and reference, then I do not know why they are here, because DTS was driven through the need to adapt CTS for more structure, as stated by @balmas

When I know someone has a CTS API, I go there, I know what request to do, I do not need to read the exact changes that this person did to the original standard. Technically, so does IIIF by saying "here is how a transformation route should be implemented"

To me, this compromise is the minimal base I can accept as a user. I think that if we do not meet this really minimal base (I gave up a large chunk for this consensus), I'll simply leave the project.

Addendum : Do not take that as blackmail or the likes. It would just mean that my goals as a library provider and CMS provider would not be aligned anymore with those of the project. And as such, it would make no sense at all for me to continue to spend time on this :)

@hcayless
Copy link
Contributor Author

If we were going to be all drawing lines in the sand about this, it's a shame it took this long to get there. For my part, CTS is pretty useless for the sorts of material I work with, so I suppose I can go back to ignoring it.

@PonteIneptique
Copy link
Member

@hcayless I agree. I just would not have thought that having structured and standard URI would be so much trouble for some of us.

I'd just add that removing myself from this does not mean it can't continue. It seems that at least 3 of you are agreeing with each others. Dunno for Bridget.

I also think it is a shame that we lost so many people on the path down there because their voice would have been an interesting factor to weight in.

@balmas
Copy link
Contributor

balmas commented Apr 25, 2017

I would really hate to see all the effort we've put in so far go to waste over philosophical differences.

@balmas
Copy link
Contributor

balmas commented Apr 25, 2017

I really would like a chance to see if we can use OpenAPI 3.0 to get mostly where we need to. The route question is one that I would like a little time to think about more.

@jonathanrobie
Copy link
Contributor

jonathanrobie commented Apr 25, 2017

Lets go back to the requirements. Bridget listed them as follows:

We started this effort because of these primary problems with the CTS model:

  1. it didn't allow for text collections which couldn't adhere to the rigid CTS textgroup/work/edition model

  2. the XML overhead of the request/responses was too high, particularly for the GetValidReff and GetCapabilities calls

  3. the routes were not RESTful

From my perspective, if we replace the rigid CTS model with rigid URLs of our own, that means we really haven't improved over #1, because people want to organize collections in a wide variety of ways, and implementation constraints may affect the URIs a particular server wants to offer.

From my perspective, if the real API is about knowing the structure of the URIs because you read the documentation somewhere and going there directly instead of finding the URIs via discovery, that's rejecting requirement #3.

Meeting only requirement #2 is an improvement, I guess, but not enough of one to make me want to implement the API. I think we are basically asking if we believe in the original requirements or not. I still do.

@PonteIneptique
Copy link
Member

  1. I do not see how rigid URLs would be a problem for organizing collections as you like. I'd be happy to have a user story proving this point.
  2. It might be rejecting 3, as you see RESTful, but it also was a GREAT strength of CTS. Today, with tools like MyCapytain, I don't have to browse a lot of pages to get what I want. And it makes sense. I don't know a lot of people that browse API like webpage. Maybe I don't know the good ones :)

Most probably my last comment because I am fed up with this and being told that I cannot possibly understand things. I want to underline the constructiveness comments made here : "Lets go back to the requirements.", "I don't think anyone here would design the API the way that you did in your example.". I cannot possibly have a nice discussion when I am either treated as an idiot or as a newcomer.

I want to remind that we came to make CTS lighter and more compatible with all of our collections. We were between 10 and 15 people. Today, we are only 5 to speak, and in this discussion only 4 to discuss things a lot. The overly technical turn, with its English expression, has excluded a lot of people. My experience (only mine) is that strict standards are much easier to include, because most of our domain (DH) do not have really advanced engineers and clear, imposed, standards works well (as did CTS, and even CTS was sometime implemented wrongly because of the lack of clarity on the sense of some items).

I spent a lot of time on this DTS project, including on writing an export for MyCapytain, on writing benchmark, but it feels like it's not worth much. I also spent a lot of time implementing clients for projects, and in my experience again, I do not see how DTS, by going down this road, will be easy to implement on the client perspective. I have tried to argue about it, but it seems it's dismissed. I really feel like only the provider point of view is taken into account here, while the point of having standards API is also to get them reused. We do not have the fire power of W3C / IIIF, and nor should we think we have. But even if we had, text transformation in IIIF is a standard route.

@jonathanrobie
Copy link
Contributor

jonathanrobie commented Apr 26, 2017 via email

@PonteIneptique
Copy link
Member

This debate was fixed a long time ago .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants