-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embed: design doc for new embed API #8039
Changes from all commits
9d4ebe5
fe633ec
66d839e
06c8a97
7448158
1d3c097
1d43423
81edc9c
89a5795
4efd3ec
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,221 @@ | ||||||
Embed API | ||||||
========= | ||||||
|
||||||
The Embed API allows to embed content from docs pages in other sites. | ||||||
For a while it has been as an *experimental* feature without public documentation or real applications, | ||||||
but recently it has been used widely (mainly because we created a Sphinx extension). | ||||||
|
||||||
This improvement is part of the `CZI grant`_. | ||||||
Due to this we need to have more friendly to use API, | ||||||
and general and stable enough to support it for a long time and with external sites. | ||||||
|
||||||
.. _CZI grant: https://blog.readthedocs.com/czi-grant-announcement/ | ||||||
|
||||||
.. contents:: | ||||||
:local: | ||||||
:depth: 3 | ||||||
|
||||||
Current implementation | ||||||
---------------------- | ||||||
|
||||||
The current implementation of the API is partially documented in :doc:`/guides/embedding-content`. | ||||||
Some characteristics/problems are: | ||||||
|
||||||
- There are three ways of querying the API, and some rely on Sphinx's concepts like ``doc``. | ||||||
- Doesn't support MkDocs. | ||||||
- It returns all sections from the current page on every request. | ||||||
- Lookups are slow (~500 ms). | ||||||
- IDs returned aren't well formed (like empty IDs `#`). | ||||||
- The content is always an array of one element. | ||||||
- The section can be an identifier or any other four variants or the title of the section. | ||||||
- It doesn't return valid HTML for definition lists (``dd`` tags without a ``dt`` tag). | ||||||
- It doesn't support external sites. | ||||||
|
||||||
New API | ||||||
------- | ||||||
|
||||||
The API would be split into two endpoints, and only have one way of querying the API. | ||||||
|
||||||
Get page | ||||||
-------- | ||||||
|
||||||
Allow us to query information about a page, like its list of sections. | ||||||
|
||||||
.. http:get:: /_/api/v3/embed/pages?project=docs&version=latest&path=install.html | ||||||
|
||||||
:query project: (required) | ||||||
:query version: (required) | ||||||
:query path: (required) | ||||||
|
||||||
.. sourcecode:: json | ||||||
|
||||||
{ | ||||||
"project": "docs", | ||||||
"version": "latest", | ||||||
"path": "install.html", | ||||||
"title": "Installation Guide", | ||||||
"url": "https://docs.readthedocs.io/en/latest/install.html", | ||||||
"sections": [ | ||||||
{ | ||||||
"title": "Installation", | ||||||
"id": "installation" | ||||||
}, | ||||||
{ | ||||||
"title": "Examples", | ||||||
"id": "examples" | ||||||
} | ||||||
] | ||||||
Comment on lines
+58
to
+67
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the use case for this field? Are we returning nested sections here or only the first level? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All sections, nested as well, so the client knows what sections it can query, this is only from h elements, but I guess it could be expanded for more tags. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In that case, why only show In any case, I think we should add a request GET argument to expand this and avoid returning a lot of data when it's not going to be used. Example: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is in another endpoint, so I don't think we need that feature here. |
||||||
} | ||||||
|
||||||
Get section | ||||||
----------- | ||||||
|
||||||
Allow us to query the content of the section, with all links re-written as absolute. | ||||||
|
||||||
.. http:get:: /_/api/v3/embed/sections?project=docs&version=latest&path=install.html#examples | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure about the naming conventions here. We are naming a resource, so it should be plural, but we only have a single resource to get the data from. |
||||||
|
||||||
:query project: (required) | ||||||
:query version: (required) | ||||||
:query path: Path with or without fragment (required) | ||||||
|
||||||
.. sourcecode:: json | ||||||
|
||||||
{ | ||||||
"project": "docs", | ||||||
"version": "latest", | ||||||
"path": "install.html", | ||||||
"url": "https://docs.readthedocs.io/en/latest/install.html#examples", | ||||||
"id": "examples", | ||||||
"title": "Examples", | ||||||
"content": "<div>I'm a html block!<div>" | ||||||
} | ||||||
|
||||||
Implemention | ||||||
------------ | ||||||
|
||||||
If a section or page doesn't exist, we return 404. | ||||||
This guarantees that the client requesting this resource has a way of knowing the response is correct. | ||||||
|
||||||
All links are re-written to be absolute. | ||||||
Allow the content to be located in any page and in external sites | ||||||
(this is already done). | ||||||
|
||||||
All sections listed are from html tags that are linkeable. | ||||||
This is, they have an ``id`` | ||||||
(we don't rely on the toctree from the fjson file anymore). | ||||||
This way is more easy to parse and get the wanted section, | ||||||
instead of restricting to some types of contents. | ||||||
|
||||||
The IDs returned don't contain the redundant ``#`` symbol. | ||||||
The fragment part could be used in external tools. | ||||||
|
||||||
The content is an string with a well formed HTML block. | ||||||
Malformed HTML can cause the content to be rendered in unexpected ways. | ||||||
Some HTML tags are required to be be inside other tags or be surrounded by other tags, | ||||||
examples are ``li`` tags inside ``ul`` or ``dd`` tags inside ``dl`` and having a ``dt`` tag. | ||||||
|
||||||
For example extracting the ``title`` section from this snipped: | ||||||
|
||||||
.. code:: html | ||||||
|
||||||
<dl class="some-class"> | ||||||
... | ||||||
|
||||||
<dt id="foo">Foo</dt> | ||||||
<dd>Some definition</dd> | ||||||
|
||||||
<dt id="title">Title<dt> | ||||||
<dd>Some definition</dd> | ||||||
|
||||||
... | ||||||
</dl> | ||||||
|
||||||
Would result in | ||||||
|
||||||
.. code:: html | ||||||
|
||||||
<dl class="some-class"> | ||||||
<dt id="title">Title<dt> | ||||||
<dd>Some definition</dd> | ||||||
</dl> | ||||||
|
||||||
Instead of | ||||||
|
||||||
.. code:: html | ||||||
|
||||||
<dd>Some definition</dd> | ||||||
|
||||||
Note that we only try to keep the current structure, | ||||||
if the page contains malformed HTML, we don't try to *fix it*. | ||||||
This improvement can be shared with the current API (v2). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can't back-port this since it's a breaking change from a user's perspective: the render of the content in the tooltip will be completely different (see #8039 (comment)) |
||||||
|
||||||
Parse the HTML page itself rather than the relying on the fjson files. | ||||||
This allow us to use the embed API with any page or tool, and outside Read the Docs. | ||||||
We can re-use code from the search parsing to detect the main content. | ||||||
This improvement can be shared with the current API (v2). | ||||||
|
||||||
.. note:: | ||||||
|
||||||
We should probably make a distinction between our general API that handles Read the Docs resources, | ||||||
vs our APIs that expose features (like server side search, footer, and embed, all of them proxied). | ||||||
This way we can version each endpoint separately. | ||||||
stsewd marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
Support for external sites | ||||||
-------------------------- | ||||||
|
||||||
Currently this document uses ``project``, ``version``, and ``path`` to query the API, | ||||||
but since the CZI grant and intersphinx support requires this to work with external sites, | ||||||
those arguments can be replaced with ``url`` (or have two ways of querying the API). | ||||||
|
||||||
Considerations | ||||||
`````````````` | ||||||
|
||||||
If a project changes its custom domain, current usage of the API would break. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we could check There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could also save all the historic domains for a project. Instead of delete them, we can mark them as inactive. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That seems very difficult to scale. What if another project reuses the domain? |
||||||
|
||||||
We would need to check if the domain belongs to a project inside RTD and fetch the file from storage, | ||||||
and if it's from an external site fetch it from the internet. | ||||||
|
||||||
The API could be missused. | ||||||
This is already true if we don't support external sites, | ||||||
since we host arbitrary HTML already. | ||||||
But it can be abussed to do requests to external sites without the consent of the site owner (SSRF_). | ||||||
We can integrate support for external sites in a later stage, | ||||||
or have a list of allowed sites. | ||||||
|
||||||
.. _SSRF: https://en.wikipedia.org/wiki/Server-side_request_forgery | ||||||
|
||||||
We would need to make our parsing code more generic. | ||||||
This is already proposed in this document, | ||||||
but testing is going to be done with Sphinx and MkDocs mainly. | ||||||
|
||||||
If we want to support external site to use the API, | ||||||
then we would need to expose it in a general public endpoint | ||||||
instead of the proxied API. | ||||||
|
||||||
Deprecation | ||||||
----------- | ||||||
|
||||||
We should have a section in our docs instead of a guide where the embed API is documented. | ||||||
There we can list v2 as deprecated. | ||||||
We would need to migrate our extension as well. | ||||||
Most of the parsing code could be shared between the two APIs, so it shouldn't be a burden to maintain. | ||||||
|
||||||
API Client | ||||||
---------- | ||||||
|
||||||
Do we really need a JS client? | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think so. IIRC @agjohnson idea here was to include the popup/modal form of this into the JS client. Besides, we could eventually be able to use this JS client from our extension itself, instead of maintaining the front-end in our own extension (or adding it as another popup extension together with the ones we currently support) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, the JS client is going to be doing a lot of work in addition to just doing the GET. We definitely want nice default widgets, along with helper methods that make this a much nicer UX for users. |
||||||
The API client is a JS script to allow users to use our API in any page. | ||||||
Using the fetch and DOM API should be easy enough to make this work. | ||||||
Having a guide on how to use it would be better than having to maintain and publish a JS package. | ||||||
|
||||||
Most users would use the embed API in their docs in form of an extension (like sphinx-hoverxref). | ||||||
Users using the API in other pages would probably have the sufficient knowledge to use the fetch and DOM API. | ||||||
|
||||||
Rejected/posponed ideas | ||||||
----------------------- | ||||||
|
||||||
Including a list of extra js/css files that may be required to make the embedded content work. | ||||||
The client should be aware of the content it's embedding, | ||||||
and it's their responsibility to include the required js/css to make it work. | ||||||
We can't guarantee that the given files are necessary, | ||||||
and could present a security threat. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we will have only one way of querying the API, we should probably use
?url=
instead of sending multiple attributes. The implementation of the extensions is easier (and there are things not possible to implement without it --e.g. intersphinx) and the usage of the endpoint will be simplified too.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a URL makes it depend on the domain, if the domain changes everything is broken. I'm fine supporting both to support intersphinx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure if we want to make this a v3 API, given that it works totally differently than the others :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why my note in #8039 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is super bad. The other endpoints are for resources and this one is not.
However, I'm starting to think that this could be a different service as well? But maybe that's too much? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. Maybe (?), I guess. If they use
https://readthedocs.org/api/v3/embed/?url=
endpoint and it shouldn't break if they don't remove the Domain for that Project from our platform. Even if the DNS for that domain is dead.