Design doc: Embed APIv3 #8222

humitos · 2021-05-31T11:15:29Z

I made too many changes to the original document that I'm creating a new PR based on Santos' PR: #8039. It basically extends and replaces it.

📝 read the rendered version at: https://docs--8222.org.readthedocs.build/en/8222/development/design/embed-api.html

Closes #7963
Closes #8039

docs/development/design/embed-api.rst

stsewd · 2021-05-31T15:48:45Z

docs/development/design/embed-api.rst

+* Rewrite all links returned in the content to make them absolute
+* Always return valid HTML structure
+* Delete HTML tags from the original document if needed
+* Support ``?nwords=`` and ``?nparagraphs=`` to return chunked content


I think we still were discussing this. I don't think we can support it without having to handle specific cases, not all returned html will have p tags or the word will end in a valid tag, we also need to skip tags from the lenght

<p>Cut <strong>here?</strong></p>

Having the client using css to "hide" long texts is more easy to handle.

The example you shared would return:

with ?nwords=1: <p>Cut</p>

with ?nwords=2: <p>Cut <strong>here?</strong></p>

with ?nparagraphs=1: <p>Cut <strong>here?</strong></p>

yeah, but the problem is calculating the words and taking the tags into account, you'll be removing tags from the original html, possible breaking some styles like with tables/definition lists. And also how would you know if there is a paragraph when you have content surrounded in other tags like lists or divs.

We have been playing with this already in https://github.com/readthedocs/readthedocs-ext/pull/304/. How to do it is an implementation detail and I'm sure there is going to be some problems we will need to solve. However, I don't think it's impossible.

I've been playing a little with BeautifulSoup already and this seems to work close enough:

# nparagraphs.py from bs4 import BeautifulSoup soup = BeautifulSoup(open('install.html')) nparagraphs = 3 for element in soup.find('div', attrs={'id': 'development-installation'}).findAll(): if nparagraphs == 0: element.replaceWith('') if element.name == 'p' and nparagraphs > 0: nparagraphs -= 1

$ wget https://docs.readthedocs.io/en/stable/development/install.html $ python nparagraph.py

Click to see the output

<div class="section" id="development-installation"> <h1> Development Installation <a class="headerlink" href="#development-installation" title="Permalink to this headline"> ¶ </a> </h1> <p> These are development setup and <a class="hoverxref tooltip reference internal" data-doc="development/install" data-docpath="/development/install.html" data-project="docs" data-section="core-team-standards" data-version="stable" href="#core-team-standards"> <span class="std std-ref"> standards </span> </a> that are adhered to by the core development team while developing Read the Docs and related services. If you are a contributor to Read the Docs, it might a be a good idea to follow these guidelines as well. </p> <div class="admonition note"> <p class="admonition-title"> Note </p> <p> We do not recommend to follow this guide to deploy an instance of Read the Docs for production usage. Take into account that this setup is only useful for developing purposes. </p> </div> </div>

docs/development/design/embed-api.rst

stsewd · 2021-05-31T15:51:50Z

docs/development/design/embed-api.rst

+      }
+
+
+   When used together with ``?expand=identifiers`` the follwing field is also returned:


I'm not sure about returning this information on the same endpoint, is confusing returning the content of a section plus the list of sections, we should have a different endpoint to return data about the page itself.

Can you expand here why it may cause confusion and how having a different endpoint to only show the available id= for a page is better? I think I'm not against a different endpoint, but I want to understand your position better.

Because you are requesting information for a section, but getting information about the page too, you don't want to have to query a random section just to get information about the page. This also allow us to return more information about the page itself like the title.

Pinging @agjohnson here since he wrote:

More metadata around headers, such as heading level. I'd like to display the topics in a nested menu, as they don't make as much sense in sequential order

in #7117

@agjohnson can you take a look at this proposal and suggest that would be the ideal response to you and how you would like to use it?

@agjohnson returning headers may not be useful if those headers don't have an id= that we can find. Also, I realized that it's not trivial to "find the title for a particular id=" (we have this hack currently implemented that only works over a tags. The id= could be in the div surrounding the h1 tag or the dl tag or any other.

<div id="configuration"> <h1>Configuration<a class="headerlink" href="#configuration" title="Permalink to this headline">¶</a></h1> <p>...</p> </div>

For this case, we could get the .next() from the id=configuration. However, we can't guarantee that this will always work. Even if we get it, we will want to remove the trailing ¶ from its name as well in this case (but it could be a different char)

<dt id="confval-hoverxref_modal_class"> <code class="sig-name descname">hoverxref_modal_class</code> <a class="headerlink" href="#confval-hoverxref_modal_class" title="Permalink to this definition">¶</a> </dt>

This one is similar, but the .next() element is exactly the title we are looking for. It does not contain the ¶ char.

However, if the title we want is exactly on the h1 or dl (instead of their child), we will fail to detect it and we would return something invalid.

So, I think it's still useful to return all the available id=s from the page so developers can explore them, but it's not easy (maybe impossible) to know the exact title for that particular id= and the exact hierarchy as well.

I moved this to a particular endpoint /api/v3/embed/identifiers/ and define the initial response: "return all the available identifiers for a specific page".

We can expand it later if we need something else and there is a good way to do it.

We talked more here, and the use case or feature that I'm describing is currently in a strange place.

More explicitly, I require a list of headings -- basically the toctree on a document -- with heading text, URL to link to the heading, and the heading nesting level or heading nesting as a data structure. This gives me what I need to basically expose the toctree in our application. This is the feature that I had wanted to expose for commercial use -- embedding documentation metadata/headings/etc in customer applications. A hoverxref type extension might be useful in addition here, but is separate. Customers would still need to get metadata out of a particular document in order to inject a toctree into an application view.

So, where I'm at is that I feel like this feature is more of a relic of where the embed API started and it is dragging the direction of the embed API down -- however this might just be my interpretation. The embed API is more focused on having generic support, and so therefore parsing HTML, and what I want is basically exposing the contents of objects.inv. I could be talking about a separate feature, or we could still be talking about keeping this as a separate API endpoint.

I am going to research this a bit more, I may be talking about a separate feature entirely at this point.

objects.inv is already indexed & exposed w/ the SphinxDomain modeling, isn't it?

Yup, so we talked about exposing a separate API for that instead.

We need to decide if this feature is going to be a separate product and how users will interact with it. I think we'd be maintaining 1 client JS library that does both embed of hover cards and injection of document header/refs from Sphinx, as the use case is similar.

@humitos Did we talk about what mechanism we'd use if this function was a subpath on the embed API? Are we using Sphinx refs via the SphinxDomain modeling?

@humitos Did we talk about what mechanism we'd use if this function was a subpath on the embed API? Are we using Sphinx refs via the SphinxDomain modeling?

Nope, we didn't talk about this and I haven't thought too much either. I usually forget that we have SphinxDomain model --and I haven't used it yet.

Something like /api/v3/embed/headers/ may work for this use case. However, we have to keep in mind that:

it will only work with Sphinx if we use the SphinxDomain model

making it generic by parsing the HTML is probably impossible

we can use the HTTP request arguments for other doctools, as ?doctool=mkdocs&version=1.0.1 and parse a known HTML structure

docs/development/design/embed-api.rst

humitos · 2021-06-03T14:22:11Z

Notes to myself: it seems I missed some points from #7117 that @agjohnson wants, like language as argument and/or HTTP header. So, I'll come back to this document and update it with those points.

docs/development/design/embed-api.rst

agjohnson · 2021-06-07T23:44:15Z

docs/development/design/embed-api.rst

+      }
+
+
+   When used together with ``?expand=identifiers`` the follwing field is also returned:


I am using the embed API list of sections to inject a (currently flat, not nested) list of headings for help topics. Displaying heading listing would be an important feature of an embed client, for providing customers with a way to inject topics to their application.

docs/development/design/embed-api.rst

- re-order goals - allow CORS only for public projects - new section with the definition of the contract - define `/api/v3/embed/identifiers/` endpoint - remove `title` field from it because it's not easy to get it - return only available identifiers - add `_links` to make the API browseable - handle project's domain changes querying for 3xx status codes

humitos · 2021-06-09T13:31:32Z

@ericholscher @stsewd @agjohnson I updated the document with the latest conversation we had. I think I covered all the concerns risen here. Let's see if you agree with what I wrote 😄

docs/development/design/embed-api.rst

nguyenminhhoang888

Good

ericholscher

Lots of small language nitpicks. I think in general this is heading in a good direction. I like having a way to know when to apply sphinx magic as needed.

docs/development/design/embed-api.rst

ericholscher · 2021-06-09T21:15:48Z

docs/development/design/embed-api.rst

+   only under ``?doctool=`` and ``?version=`` arguments.
+
+If no ``id`` selector is sent to the request, the content of the first meaningfull HTML tag
+(``<main>``, ``<div role="main">``, etc) identifier found is returned.


Is this useful? Seems like a lot of content to return -- is the use case here for small pages or browsing?

I kept it because it's how it works currently. I don't have a strong opinion and I'm happy to remove it if there is no well-defined use case yet. In fact, I didn't add this originally but then I came back and added this paragraph because I realized that I was removing a feature that we currently have.

Yea, I'm not sure. I use this functionality a lot when tested (eg. I just throw a URL at the API and it returns the whole page content) -- but I'm not sure if this is generally useful. It definitely makes the API easier to browse and understand, so I think that's valuable. A lot of users will probably do this instead of any metadata endpoint to get the page sections.

I think this functionality probably makes sense, but we should be very clear about what it does (no etc :D)

but I'm not sure if this is generally useful

I'm not sure we are already using the feature as is right now. However, I opened a PR in sphinx-hoverxref to support :doc: that would require this: readthedocs/sphinx-hoverxref#68

but we should be very clear about what it does (no etc :D)

He he, I left the "etc" here because I didn't do the research required to grab the most used ones. The two that I mentioned here are HTML5 and Sphinx default's one, but it may be others. This will be clear in the documentation of the endpoint itself without magic/guessing involved, tho.

Also, this feature combined with ?nparagraphs= makes more sense.

ericholscher · 2021-06-09T22:07:41Z

docs/development/design/embed-api.rst

+
+.. note::
+
+   This leaves the door open to be able to support more special cases (e.g. for other doctools) without breaking the actual behavior.


This seems like a better solution than doing it for all requests, and gives us a way to deprecate things 👍

docs/development/design/embed-api.rst

ericholscher · 2021-06-09T22:12:03Z

docs/development/design/embed-api.rst

+Unanswered questions
+--------------------
+
+* How do we distinguish between our APIv3 for resources (models in the database) from these "feature API endpoints"?


Definitely a good question.

does it make sense to switch the version and API name so we have the version after its name? That way we can release a new version of a particular API without touching the others.

/api/search/v3/

/api/footer/v3/

/api/resources/v2/ and /api/resources/v3/

/api/embed/v3/

etc

cc @stsewd @ericholscher

That naming feels weird to me, I think we just need to document which endpoints are part of what.

@stsewd can you expand how that would be and give some examples?

this is already done in part for search for example https://docs.readthedocs.io/en/stable/server-side-search.html#api we just need to do the same with the embed api and footer

@stsewd I don't follow you. That search endpoint is under /api/v2/ --so we are not differentiating the "Feature APIs" from the "Resources API" there.

print(stsewd.dump("brain").verbose())

We want to make this distinction in our docs, not in the URLs

Co-authored-by: Eric Holscher <[email protected]>

agjohnson

Noting a few questions left over after talking through this with @humitos

docs/development/design/embed-api.rst

humitos · 2021-06-14T09:59:15Z

I think we are ready for a final review (cc @stsewd, @ericholscher, @agjohnson) and to start with its implementation. We had discussed a lot of things and it seems that we may add extra features in the future when we have more clear use-cases, but I think that's fine. I'm happy that we are covering the general case in a simple way and the special cases with an explicit argument.

ericholscher · 2021-06-16T18:07:24Z

I think this looks good to get started on. I still have a couple implementation questions, and think we should focus on the core implementation first and then worry about Sphinx edge cases later.

agjohnson

If we are to separate out the conversation around embedding Sphinx refs/headings, then I think this particular document is very close and we can continue. We need to come back to a separate conversation on a replacement API next.

agjohnson · 2021-06-22T21:16:40Z

docs/development/design/embed-api.rst

+
+* Add a note in the documentation mentioning this endpoint is deprecated
+* Promote the usage of the new Embed APIv3
+* Migrate the ``sphinx-hoverxref`` extension to use the new endpoint


If we are specifying deprecation in this document, then the use case for embedding Sphinx refs should also be mentioned here. Deprecation should depend on surfacing an API to expose documentation refs. We need to have a discussion about this end point and how it is implemented/etc next, but that can be separate.

stsewd and others added 11 commits March 22, 2021 16:14

Embed: design doc for new embed API

9d4ebe5

Merge branch 'master' into design-doc-embed-api

fe633ec

Improvements from review

66d839e

Merge branch 'master' into design-doc-embed-api

06c8a97

Small update

7448158

Mention intersphinx

1d3c097

Merge branch 'master' into design-doc-embed-api

1d43423

List of extra js/css: rejected

81edc9c

Merge branch 'master' into design-doc-embed-api

89a5795

Small updates

4efd3ec

Updates after our roadmap planning meeting

adebe39

humitos changed the title ~~Embed: design doc for new embed API~~ Design doc: Embed APIv3 May 31, 2021

stsewd reviewed May 31, 2021

View reviewed changes

docs/development/design/embed-api.rst Outdated Show resolved Hide resolved

stsewd reviewed May 31, 2021

View reviewed changes

docs/development/design/embed-api.rst Outdated Show resolved Hide resolved

stsewd reviewed May 31, 2021

View reviewed changes

Remove # from the id's field response

e307563

humitos commented Jun 1, 2021

View reviewed changes

docs/development/design/embed-api.rst Outdated Show resolved Hide resolved

humitos requested a review from a team June 1, 2021 16:27

stsewd reviewed Jun 1, 2021

View reviewed changes

docs/development/design/embed-api.rst Outdated Show resolved Hide resolved

humitos mentioned this pull request Jun 3, 2021

Embed: replace pyquery with selectolax #7988

Closed

Update docs/development/design/embed-api.rst

f783339

humitos requested a review from a team June 7, 2021 13:45

agjohnson reviewed Jun 8, 2021

View reviewed changes

humitos added 2 commits June 9, 2021 15:26

Clarify embedding content from pages hosted outside readthedocs

efe5abc

stsewd reviewed Jun 9, 2021

View reviewed changes

docs/development/design/embed-api.rst Outdated Show resolved Hide resolved

stsewd reviewed Jun 9, 2021

View reviewed changes

docs/development/design/embed-api.rst Outdated Show resolved Hide resolved

Rename endpoint to be /metadata/

5063bbc

nguyenminhhoang888 reviewed Jun 9, 2021

View reviewed changes

ericholscher reviewed Jun 9, 2021

View reviewed changes

humitos mentioned this pull request Jun 10, 2021

Add more context data to embed API #7117

Open

Apply suggestions from code review

84c7ce0

Co-authored-by: Eric Holscher <[email protected]>

agjohnson reviewed Jun 10, 2021

View reviewed changes

docs/development/design/embed-api.rst Outdated Show resolved Hide resolved

docs/development/design/embed-api.rst Outdated Show resolved Hide resolved

humitos added 2 commits June 14, 2021 11:38

Updates from feedback

3b320e8

Improve API-docs render

b34fce0

humitos mentioned this pull request Jun 17, 2021

Hover/mouseover tooltips for references. jupyter-book/jupyter-book#1297

Open

ericholscher approved these changes Jun 22, 2021

View reviewed changes

agjohnson approved these changes Jun 22, 2021

View reviewed changes

humitos merged commit c86f18a into master Jun 23, 2021

humitos deleted the humitos/design-doc-embed-api branch June 23, 2021 09:43

humitos mentioned this pull request Jul 6, 2021

Embed APIv3: initial implementation #8319

Merged

9 tasks

		}


		When used together with ``?expand=identifiers`` the follwing field is also returned:


		.. note::

		This leaves the door open to be able to support more special cases (e.g. for other doctools) without breaking the actual behavior.

Design doc: Embed APIv3 #8222

Design doc: Embed APIv3 #8222

Conversation

humitos commented May 31, 2021 • edited Loading

Choose a reason for hiding this comment

humitos Jun 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

humitos Jun 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

humitos commented Jun 3, 2021

Choose a reason for hiding this comment

humitos commented Jun 9, 2021

nguyenminhhoang888 left a comment

Choose a reason for hiding this comment

ericholscher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericholscher Jun 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agjohnson left a comment

Choose a reason for hiding this comment

humitos commented Jun 14, 2021

ericholscher commented Jun 16, 2021

agjohnson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

humitos commented May 31, 2021 •

edited

Loading

humitos Jun 1, 2021 •

edited

Loading

humitos Jun 3, 2021 •

edited

Loading

ericholscher Jun 10, 2021 •

edited

Loading