Embed: design doc for new embed API #8039

stsewd · 2021-03-22T21:17:32Z

Mostly based on my notes from #7963.

humitos

I think there are good ideas here.

I don't think we should have 2 separate endpoints for this and we should only rely on ?url= attribute unless there is a good reason to not to.

Also, we should support nparagraph= or nwords= or both to be able to chunk the content of the section returned (see https://github.com/readthedocs/readthedocs-ext/pull/304)

humitos · 2021-03-29T15:19:08Z

docs/development/design/embed-api.rst

+- There are three ways of querying the API, and some rely on Sphinx's concepts like ``doc``.
+- Doesn't cache responses or doesn't purge the cache on build.
+- Doesn't support MkDocs.
+- It returns all sections from the current page.


You can send section= to get only the section you want.

What I mean here is that it always returns all the sections (title and id), not the section content.

humitos · 2021-03-29T15:33:54Z

docs/development/design/embed-api.rst

+These improvements aren't breaking changes, so we can implement them in the old and new API.
+
+- Support for MkDocs.
+- Always return a valid/well formed HTML block.


If this refers to the dd/dt issue, I'm not sure if this will be easily doable for all the cases. We may end up re-parsing the dd to convert that HTML tag to another one, but that will break the style based on dd tags. So, this will be tricky to do it right and make it work for all the themes --which is also related with the new extras field.

We just need to extract only that element. For example

<dl class="foo"> .. more.. <dt>foo</dt> <dd>bar</dd> .. more... </dl>

becomes

<dl class="foo"> <dt>foo</dt> <dd>bar</dd> </dl>

Extracting content isn't hard to do here.

Hrm... I think that has the same problem.

When parsing citation/glossary we don't need to return the title/definition (dt), we only want the description (dd). Example in code

readthedocs.org/readthedocs/embed/views.py

Lines 329 to 335 in eb682d8

# Structure:

# <dl class="glossary docutils">

# <dt id="term-definition">definition</dt>

# <dd>Text definition for the term</dd>

# ...

# </dl>

query_result = query_result.next()

Following, your example, to make it valid it should be:

<dl class="foo"> <dt></dt> <dd>bar</dd> </dl>

However, this won't probably render correctly inside the tooltip, since in that context is not a description list --it's not a list at all-- but just the definition of one of the term from that list.

In those cases, citation/glossary will be rendered in the same than any other definition list and I'm not sure we want that. We need to think how we will support this. Example:

Glossary (only includes the definition)

Config value (includes the definition list completely repeating the term hovered)

I think we should repeat the title, it doesn't necessary match the text where the tooltip is present. Removing the title if it's the same as the text can be done in the client side.

Glossary and Citation special cases can be managed from the frontend/client since we have all the data we need (note that we return the .next() HTML element). We will just need to parse it remove the content we don't want to display.

However, for the dt case (definition of a config, Python method/function, etc) it won't work because we need the .parent() of HTML element picked by ID. Take a look at this example:

<dl class="std confval"> <dt id="confval-hoverxref_role_types"> <code class="sig-name descname">hoverxref_role_types</code> <a class="headerlink" href="#confval-hoverxref_role_types" title="Permalink to this definition">¶</a> </dt> <dd> <p>Description: Style to use by default when hover each type of reference (role).</p>  </dd> </dl>

(example from https://sphinx-hoverxref.readthedocs.io/en/latest/configuration.html#confval-hoverxref_role_types)

With the id confval-hoverxref_role_types we are currently returning the whole dl because we access to the .parent() element. If we return just the HTML tag matched, we will be returning only the dt which does not include the description nor any content.

Without handling this special case properly, we will show tooltips like:

We need to find a general solution for these special cases.

@humitos not sure to understand the problem there? that's the same example we have in the document. The id is on the dt element, so we fetch the dt and the next dd enclosed in the dl tag.

I may not have expressed myself properly.

This is a special case because we are manipulating the HTML to manage a particular case on Sphinx: return the parent of a particular ID, instead of itself. Since we are going to support external sites that will not be generated with Sphinx (*), we can't assume that we always have to do this special HTML manipulation. That's why I'm saying that we need a general solution to differentiate these special cases from regular (non-Sphinx) pages.

I'm thinking that we will need to communicate this to the backend somehow. Maybe a request GET argument: ?doctool=sphinx&version=3.5.1

(*) or even with a different version of Sphinx that may adds the id= in the dl.

that's part of how we expect the html to be, following semantic html and using correct tags, if a site has malformed tags or if the id is in a different element we don't try to fix it, we just return it as is.

humitos · 2021-03-29T15:39:48Z

docs/development/design/embed-api.rst

+
+Allow us to query information about a page, like its list of sections.
+
+.. http:get:: /_/api/v3/embed/pages?project=docs&version=latest&path=install.html


If we will have only one way of querying the API, we should probably use ?url= instead of sending multiple attributes. The implementation of the extensions is easier (and there are things not possible to implement without it --e.g. intersphinx) and the usage of the endpoint will be simplified too.

Using a URL makes it depend on the domain, if the domain changes everything is broken. I'm fine supporting both to support intersphinx.

Suggested change

.. http:get:: /_/api/v3/embed/pages?project=docs&version=latest&path=install.html

.. http:get:: /_/api/v3/embed/page/?project=docs&version=latest&path=install.html

I'm also not sure if we want to make this a v3 API, given that it works totally differently than the others :/

I'm also not sure if we want to make this a v3 API, given that it works totally differently than the others :/

That's why my note in #8039 (comment)

I don't think this is super bad. The other endpoints are for resources and this one is not.

However, I'm starting to think that this could be a different service as well? But maybe that's too much? 🤔

Using a URL makes it depend on the domain, if the domain changes everything is broken

Yes and no. Maybe (?), I guess. If they use https://readthedocs.org/api/v3/embed/?url= endpoint and it shouldn't break if they don't remove the Domain for that Project from our platform. Even if the DNS for that domain is dead.

humitos · 2021-03-29T15:42:49Z

docs/development/design/embed-api.rst

+         "sections": [
+            {
+               "title": "Installation",
+               "id": "installation"
+            },
+            {
+               "title": "Examples",
+               "id": "examples"
+            }
+         ],


I like the idea of listing all the sections. I'd also like to see a way to explore/surf the API just by clicking these links, similar as we do with the _links field in other endpoints. This way, you can discover the API by clicking on those, but also expose these links to developers.

humitos · 2021-03-29T15:50:50Z

docs/development/design/embed-api.rst

+API Client
+----------
+
+Do we really need a JS client?


I think so. IIRC @agjohnson idea here was to include the popup/modal form of this into the JS client. Besides, we could eventually be able to use this JS client from our extension itself, instead of maintaining the front-end in our own extension (or adding it as another popup extension together with the ones we currently support)

Yea, the JS client is going to be doing a lot of work in addition to just doing the GET. We definitely want nice default widgets, along with helper methods that make this a much nicer UX for users.

stsewd · 2021-03-30T00:51:28Z

Also, we should support nparagraph= or nwords= or both to be able to chunk the content of the section returned

Feels like that's a client thing, to be able to cut the content exactly where they want and show a "See more" link. And maybe can be done with css, without messing around the content itselff.

humitos · 2021-03-30T09:35:29Z

Also, we should support nparagraph= or nwords= or both to be able to chunk the content of the section returned

Feels like that's a client thing, to be able to cut the content exactly where they want and show a "See more" link

I don't think we want to return/download a whole page just to show the first paragraph of the first section. That's my point.

ericholscher

I think this is a good start, but I think we need a lot more thought and explanation behind some of the design decisions. I'd love to see more structure for a design doc here, explicitly stating the goals in a more detailed way. There is a ton of implied context in each "list of 10 items" that we should be explaining more fully.

Most importantly, we should have a sections of goals, and why they are valuable. For example, why is "Always return a valid/well formed HTML block." important? How are we going to ensure that at the code level?

docs/development/design/embed-api.rst

ericholscher · 2021-03-30T21:38:18Z

docs/development/design/embed-api.rst

+
+Allow us to query information about a page, like its list of sections.
+
+.. http:get:: /_/api/v3/embed/pages?project=docs&version=latest&path=install.html


Suggested change

.. http:get:: /_/api/v3/embed/pages?project=docs&version=latest&path=install.html

.. http:get:: /_/api/v3/embed/page/?project=docs&version=latest&path=install.html

ericholscher · 2021-03-30T21:38:33Z

docs/development/design/embed-api.rst

+
+Allow us to query the content of the section, with all links re-written as absolute.
+
+.. http:get:: /_/api/v3/embed/sections?project=docs&version=latest&path=install.html#examples


Suggested change

.. http:get:: /_/api/v3/embed/sections?project=docs&version=latest&path=install.html#examples

.. http:get:: /_/api/v3/embed/section/?project=docs&version=latest&path=install.html#examples

I'm not sure about the naming conventions here. We are naming a resource, so it should be plural, but we only have a single resource to get the data from.

docs/development/design/embed-api.rst

ericholscher · 2021-03-30T21:46:38Z

docs/development/design/embed-api.rst

+API Client
+----------
+
+Do we really need a JS client?


Yea, the JS client is going to be doing a lot of work in addition to just doing the GET. We definitely want nice default widgets, along with helper methods that make this a much nicer UX for users.

ericholscher · 2021-03-30T21:48:24Z

docs/development/design/embed-api.rst

+  but them querying a section would require to query a page to get the extra js/css files.
+- We could not return the ``title`` of the page/section as it would require more parsing to do
+  (but we can re-use the code from search).
+  Titles can be useful to build an UI like https://readthedocs.org/projects/docs/tools/embed/.


How would a user build something like this without the ability to also get all the pages in a project? We aren't exposing an API for that in this design doc. Should we?

Hmm, good point, we could have an endpoint for that, but now that you mentioned we need to support external sites, I don't think it would be possible. Or it would be rtd only feature.

docs/development/design/embed-api.rst

humitos · 2021-03-31T15:40:26Z

docs/development/design/embed-api.rst

+
+  Note that we only try to keep the current structure,
+  if the page contains malformed HTML, we don't try to *fix it*.
+  This improvement can be shared with the current API (v2).


We can't back-port this since it's a breaking change from a user's perspective: the render of the content in the tooltip will be completely different (see #8039 (comment))

docs/development/design/embed-api.rst

humitos · 2021-03-31T15:45:42Z

docs/development/design/embed-api.rst

+Considerations
+``````````````
+
+If a project changes its custom domain, current usage of the API would break.


I think we could check Referer/Origin header as a fallback to try to find the project in case they changed the domain (sent via ?url=).

We could also save all the historic domains for a project. Instead of delete them, we can mark them as inactive.

That seems very difficult to scale. What if another project reuses the domain?

humitos · 2021-03-31T15:47:38Z

docs/development/design/embed-api.rst

+The API could be missused.
+This is already true if we don't support external sites,
+since we host arbitrary HTML already.
+But it can be abussed to crawl external sites without the consent.
+We can integrate support for external sites in a later stage,
+or have a list of allowed sites.


I'm not sure to understand what's the problem or abuse we can face here.

I think we definitely don't have a list of allowed sites, here. We want this to be general/global.

We would be exposed to SSRF, basically an attacker can use our servers to make get requests to other insecure sites or DDOS another site.

humitos · 2021-05-27T15:12:36Z

docs/development/design/embed-api.rst

+         "sections": [
+            {
+               "title": "Installation",
+               "id": "installation"
+            },
+            {
+               "title": "Examples",
+               "id": "examples"
+            }
+         ]


What's the use case for this field? Are we returning nested sections here or only the first level?

All sections, nested as well, so the client knows what sections it can query, this is only from h elements, but I guess it could be expanded for more tags.

In that case, why only show h tags when the backend can reply to any anchor link? If we want expose all the possible queries we should return all the valid anchors.

In any case, I think we should add a request GET argument to expand this and avoid returning a lot of data when it's not going to be used. Example: ?expand=sections

This is in another endpoint, so I don't think we need that feature here.

Embed: design doc for new embed API

9d4ebe5

stsewd requested a review from a team March 22, 2021 21:17

stsewd mentioned this pull request Mar 26, 2021

Flyout and Footer API design document #8052

Merged

humitos reviewed Mar 29, 2021

View reviewed changes

ericholscher reviewed Mar 30, 2021

View reviewed changes

stsewd added 2 commits March 30, 2021 18:04

Merge branch 'master' into design-doc-embed-api

fe633ec

Improvements from review

66d839e

humitos reviewed Mar 31, 2021

View reviewed changes

astrojuanlu mentioned this pull request Apr 1, 2021

Embedding Content From Your Documentation, blog post readthedocs/blog#83

Closed

humitos mentioned this pull request Apr 19, 2021

Config to add JS/CSS to all generated HTML pages sphinx-doc/sphinx#9115

Closed

stsewd added 7 commits April 19, 2021 17:53

Merge branch 'master' into design-doc-embed-api

06c8a97

Small update

7448158

Mention intersphinx

1d3c097

Merge branch 'master' into design-doc-embed-api

1d43423

List of extra js/css: rejected

81edc9c

Merge branch 'master' into design-doc-embed-api

89a5795

Small updates

4efd3ec

humitos reviewed May 27, 2021

View reviewed changes

humitos mentioned this pull request May 31, 2021

Design doc: Embed APIv3 #8222

Merged

humitos closed this in #8222 Jun 23, 2021

stsewd deleted the design-doc-embed-api branch October 11, 2023 21:39

	# Structure:
	# <dl class="glossary docutils">
	# <dt id="term-definition">definition</dt>
	# <dd>Text definition for the term</dd>
	# ...
	# </dl>
	query_result = query_result.next()


		Allow us to query information about a page, like its list of sections.

		.. http:get:: /_/api/v3/embed/pages?project=docs&version=latest&path=install.html

	.. http:get:: /_/api/v3/embed/pages?project=docs&version=latest&path=install.html
	.. http:get:: /_/api/v3/embed/page/?project=docs&version=latest&path=install.html


		Allow us to query the content of the section, with all links re-written as absolute.

		.. http:get:: /_/api/v3/embed/sections?project=docs&version=latest&path=install.html#examples

	.. http:get:: /_/api/v3/embed/sections?project=docs&version=latest&path=install.html#examples
	.. http:get:: /_/api/v3/embed/section/?project=docs&version=latest&path=install.html#examples

Embed: design doc for new embed API #8039

Embed: design doc for new embed API #8039

Conversation

stsewd commented Mar 22, 2021

humitos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Glossary (only includes the definition)

Config value (includes the definition list completely repeating the term hovered)

Choose a reason for hiding this comment

humitos May 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stsewd commented Mar 30, 2021 • edited Loading

humitos commented Mar 30, 2021

ericholscher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

humitos May 27, 2021 •

edited

Loading

stsewd commented Mar 30, 2021 •

edited

Loading