From 9d4ebe535997b5312b8e05e7fdae765cb327149a Mon Sep 17 00:00:00 2001
From: Santos Gallegos <stsewd@protonmail.com>
Date: Mon, 22 Mar 2021 16:13:45 -0500
Subject: [PATCH 01/15] Embed: design doc for new embed API

---
 docs/development/design/embed-api.rst | 154 ++++++++++++++++++++++++++
 1 file changed, 154 insertions(+)
 create mode 100644 docs/development/design/embed-api.rst
diff --git a/docs/development/design/embed-api.rst b/docs/development/design/embed-api.rst
new file mode 100644
index 00000000000..b375b9af54d
--- /dev/null
+++ b/docs/development/design/embed-api.rst
@@ -0,0 +1,154 @@
+Embed API
+=========
+
+The embedded API allows to embed content from docs pages in other sites.
+For a while it has been as an *experimental* feature without public documentation or real applications,
+but recently it has been used widely (mainly because we created a Sphinx extension).
+
+Due to this we need to have more friendly to use API,
+and general and stable enough to support it for a long time.
+
+.. contents::
+   :local:
+   :depth: 3
+
+Current implementation
+----------------------
+
+The current implementation of the API is partially documented in :doc:`/guides/embedding-content`.
+Some characteristics/problems are:
+
+- There are three ways of querying the API, and some rely on Sphinx's concepts like ``doc``.
+- Doesn't cache responses or doesn't purge the cache on build.
+- Doesn't support MkDocs.
+- It returns all sections from the current page.
+- Lookups are slow (~500 ms).
+- IDs returned aren't well formed (like empty IDs `#`).
+- The content is always an array of one element.
+- The section can be an identifier or any other four variants or the title of the section.
+- It doesn't return valid HTML for definition lists (``dd`` tags without a ``dt`` tag).
+- The client doesn't know if the page requires extra JS or CSS in order to make it work or look nice.
+
+Improvements
+------------
+
+These improvements aren't breaking changes, so we can implement them in the old and new API.
+
+- Support for MkDocs.
+- Always return a valid/well formed HTML block.
+
+New API
+-------
+
+The API would be split into two endpoints, and only have one way of querying the API.
+
+Get page
+--------
+
+Allow us to query information about a page, like its list of sections.
+
+.. http:get:: /_/api/v3/embed/pages?project=docs&version=latest&path=install.html
+
+   :query project: (required)
+   :query version: (required)
+   :query path: (required)
+
+   .. sourcecode:: json
+
+      {
+         "project": "docs",
+         "version": "latest",
+         "path": "install.html",
+         "title": "Installation Guide",
+         "url": "https://docs.readthedocs.io/en/latest/install.html",
+         "sections": [
+            {
+               "title": "Installation",
+               "id": "installation"
+            },
+            {
+               "title": "Examples",
+               "id": "examples"
+            }
+         ],
+         "extras": {
+            "js": ["https://docs.readthedocs.io/en/latest/index.js"],
+            "css": ["https://docs.readthedocs.io/en/latest/index.css"],
+         }
+      }
+
+Get section
+-----------
+
+Allow us to query the content of the section, with all links re-written as absolute.
+
+.. http:get:: /_/api/v3/embed/sections?project=docs&version=latest&path=install.html#examples
+
+   :query project: (required)
+   :query version: (required)
+   :query path: Path with or without fragment (required)
+
+   .. sourcecode:: json
+
+      {
+         "project": "docs",
+         "version": "latest",
+         "path": "install.html",
+         "url": "https://docs.readthedocs.io/en/latest/install.html#examples",
+         "id": "examples",
+         "title": "Examples",
+         "content": "<div>I'm a html block!<div>",
+         "extras": {
+            "js": ["https://docs.readthedocs.io/en/latest/index.js"],
+            "css": ["https://docs.readthedocs.io/en/latest/index.css"],
+         }
+      }
+
+Notes
+-----
+
+- If a section or page doesn't exist, we return 404.
+- All links are re-written to be absolute (this is already done).
+- All sections listed are from html tags that are linkeable, this is, they have an ``id``
+  (we don't rely on the toctree from the fjson file anymore).
+- The IDs returned don't contain the redundant ``#`` symbol.
+- The content is an string with a well formed HTML block.
+- We could also support only ``url`` as argument for ``/sections`` and ``/pages``,
+  but this introduces another way of querying the API.
+  Having two ways of querying the API makes it *less-cacheable*.
+- Returning the extra js and css requires parsing the HTML page itself,
+  rather than only the content extracted from the fjson files (this is for sphinx).
+  We can use both, the html file and the json file, but we could also just start parsing the full html page
+  (we can re-use code from the search parsing to detect the main content).
+- ``extras`` could be returned only on ``/pages``, or only on ``/sections``.
+  It makes more sense to be only on ``/pages``,
+  but them querying a section would require to query a page to get the extra js/css files.
+- We could not return the ``title`` of the page/section as it would require more parsing to do
+  (but we can re-use the code from search).
+  Titles can be useful to build an UI like https://readthedocs.org/projects/docs/tools/embed/.
+- MkDocs support can be added easily as we make our parsing code more general.
+
+.. note::
+
+   We should probably make a distinction between our general API that handles Read the Docs resources,
+   vs our APIs that expose features (like server side search, footer, and embed, all of them proxied).
+   This way we can version each endpoint separately.
+
+Deprecation
+-----------
+
+We should have a section in our docs instead of guide where the embed API is documented.
+There we can list v2 as deprecated.
+We would need to migrate our extension as well.
+Most of the parsing code could be shared between the two APIs, so it shouldn't be a burden to maintain.
+
+API Client
+----------
+
+Do we really need a JS client?
+The API client is a js script to allow users to use our API in any page.
+Using the fetch and DOM API should be easy enough to make this work.
+Having a guide on how to use it would be better than having to maintain and publish a JS package.
+
+Most users would use the embed API in their docs in form of an extension (like sphinx-hoverxref).
+Users using the API in other pages would probably have the sufficient knowledge to use the fetch and DOM API.

From 66d839ed5652f11e588dd99f0d85e094821f7382 Mon Sep 17 00:00:00 2001
From: Santos Gallegos <stsewd@protonmail.com>
Date: Tue, 30 Mar 2021 19:10:45 -0500
Subject: [PATCH 02/15] Improvements from review

---
 docs/development/design/embed-api.rst | 135 ++++++++++++++++++++------
 1 file changed, 103 insertions(+), 32 deletions(-)

diff --git a/docs/development/design/embed-api.rst b/docs/development/design/embed-api.rst
index b375b9af54d..ebc36c1c7aa 100644
--- a/docs/development/design/embed-api.rst
+++ b/docs/development/design/embed-api.rst
@@ -1,12 +1,15 @@
 Embed API
 =========
 
-The embedded API allows to embed content from docs pages in other sites.
+The Embed API allows to embed content from docs pages in other sites.
 For a while it has been as an *experimental* feature without public documentation or real applications,
 but recently it has been used widely (mainly because we created a Sphinx extension).
 
+This improvement is part of the `CZI grant`_.
 Due to this we need to have more friendly to use API,
-and general and stable enough to support it for a long time.
+and general and stable enough to support it for a long time and with external sites.
+
+.. _CZI grant: https://blog.readthedocs.com/czi-grant-announcement/
 
 .. contents::
    :local:
@@ -19,23 +22,15 @@ The current implementation of the API is partially documented in :doc:`/guides/e
 Some characteristics/problems are:
 
 - There are three ways of querying the API, and some rely on Sphinx's concepts like ``doc``.
-- Doesn't cache responses or doesn't purge the cache on build.
 - Doesn't support MkDocs.
-- It returns all sections from the current page.
+- It returns all sections from the current page on every request.
 - Lookups are slow (~500 ms).
 - IDs returned aren't well formed (like empty IDs `#`).
 - The content is always an array of one element.
 - The section can be an identifier or any other four variants or the title of the section.
 - It doesn't return valid HTML for definition lists (``dd`` tags without a ``dt`` tag).
 - The client doesn't know if the page requires extra JS or CSS in order to make it work or look nice.
-
-Improvements
-------------
-
-These improvements aren't breaking changes, so we can implement them in the old and new API.
-
-- Support for MkDocs.
-- Always return a valid/well formed HTML block.
+- It doesn't support external sites.
 
 New API
 -------
@@ -45,7 +40,7 @@ The API would be split into two endpoints, and only have one way of querying the
 Get page
 --------
 
-Allow us to query information about a page, like its list of sections.
+Allow us to query information about a page, like its list of sections and extra js/css scripts.
 
 .. http:get:: /_/api/v3/embed/pages?project=docs&version=latest&path=install.html
 
@@ -104,29 +99,80 @@ Allow us to query the content of the section, with all links re-written as absol
          }
       }
 
-Notes
------
+Implemention
+------------
+
+If a section or page doesn't exist, we return 404.
+  This guarantees that the client requesting this resource has a way of knowing the response is correct.
 
-- If a section or page doesn't exist, we return 404.
-- All links are re-written to be absolute (this is already done).
-- All sections listed are from html tags that are linkeable, this is, they have an ``id``
+All links are re-written to be absolute.
+  Allow the content to be located in any page and in external sites
+  (this is already done).
+
+All sections listed are from html tags that are linkeable.
+  This is, they have an ``id``
   (we don't rely on the toctree from the fjson file anymore).
-- The IDs returned don't contain the redundant ``#`` symbol.
-- The content is an string with a well formed HTML block.
-- We could also support only ``url`` as argument for ``/sections`` and ``/pages``,
-  but this introduces another way of querying the API.
-  Having two ways of querying the API makes it *less-cacheable*.
-- Returning the extra js and css requires parsing the HTML page itself,
-  rather than only the content extracted from the fjson files (this is for sphinx).
-  We can use both, the html file and the json file, but we could also just start parsing the full html page
-  (we can re-use code from the search parsing to detect the main content).
-- ``extras`` could be returned only on ``/pages``, or only on ``/sections``.
+  This way is more easy to parse and get the wanted section,
+  instead of restricting to some types of contents.
+
+The IDs returned don't contain the redundant ``#`` symbol.
+  The fragment part could be used in external tools.
+
+The content is an string with a well formed HTML block.
+  Malformed HTML can cause the content to be rendered in unexpected ways.
+  Some HTML tags are required to be be inside other tags or be surrounded by other tags,
+  examples are ``li`` tags inside ``ul`` or ``dd`` tags inside ``dl`` and having a ``dt`` tag.
+
+  For example extracting the ``title`` section from this snipped:
+
+  .. code:: html
+
+     <dl>
+      ...
+
+      <dt id="foo">Foo</dt>
+      <dd>Some definition</dd>
+
+      <dt id="title">Title<dt>
+      <dd>Some definition</dd>
+
+      ...
+     </dl>
+
+  Would result in
+
+  .. code:: html
+
+     <dl>
+      <dt id="title">Title<dt>
+      <dd>Some definition</dd>
+     </dl>
+
+  Instead of
+
+  .. code:: html
+
+     <dd>Some definition</dd>
+
+  Note that we only try to keep the current structure,
+  if the page contains malformed HTML, we don't try to *fix it*.
+  This improvement can be shared with the current API (v2).
+
+Parse the HTML page itself rather than the relying on the fjson files.
+  This allow us to use the embed API in any page and tool, and outside Read the Docs.
+  We can re-use code from the search parsing to detect the main content.
+  This improvement can be shared with the current API (v2).
+
+Return extra js and css that may be required to render the page correctly.
+  We return a list of js and css files that are included in the page ``style`` and ``script`` tags.
+  The returned js and css files aren't guaranteed to be required in order to render the content,
+  but a decision for the client to make. Of course users can also anticipate the kind of content
+  they want to embed and extract the correct css and js in order to make it work.
+  We won't check for inline scripts.
+
+``extras`` could be returned only on ``/pages``, or only on ``/sections``.
   It makes more sense to be only on ``/pages``,
   but them querying a section would require to query a page to get the extra js/css files.
-- We could not return the ``title`` of the page/section as it would require more parsing to do
-  (but we can re-use the code from search).
-  Titles can be useful to build an UI like https://readthedocs.org/projects/docs/tools/embed/.
-- MkDocs support can be added easily as we make our parsing code more general.
 
 .. note::
 
@@ -134,6 +180,31 @@ Notes
    vs our APIs that expose features (like server side search, footer, and embed, all of them proxied).
    This way we can version each endpoint separately.
 
+Support for external sites
+--------------------------
+
+Currently this document uses ``project``, ``version``, and ``path`` to query the API,
+but since the CZI grant requires this to work with external sites, those arguments can be replaced with ``url``.
+
+Considerations
+``````````````
+
+If a project changes its custom domain, current usage of the API would break.
+
+We would need to check if the domain belongs to a project inside RTD and fetch the file from storage,
+and if it's from an external site fetch it from the internet.
+
+The API could be missused.
+This is already true if we don't support external sites,
+since we host arbitrary HTML already.
+But it can be abussed to crawl external sites without the consent.
+We can integrate support for external sites in a later stage,
+or have a list of allowed sites.
+
+We would need to make our parsing code more generic.
+This is already proposed in this document,
+but testing is going to be done with Sphinx and MkDocs mainly.
+
 Deprecation
 -----------
 

From 74481580087d5c9de9eac0388389633739f1dad1 Mon Sep 17 00:00:00 2001
From: Santos Gallegos <stsewd@protonmail.com>
Date: Mon, 19 Apr 2021 17:59:12 -0500
Subject: [PATCH 03/15] Small update

---
 docs/development/design/embed-api.rst | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/docs/development/design/embed-api.rst b/docs/development/design/embed-api.rst
index ebc36c1c7aa..637a08fa1f1 100644
--- a/docs/development/design/embed-api.rst
+++ b/docs/development/design/embed-api.rst
@@ -197,7 +197,7 @@ and if it's from an external site fetch it from the internet.
 The API could be missused.
 This is already true if we don't support external sites,
 since we host arbitrary HTML already.
-But it can be abussed to crawl external sites without the consent.
+But it can be abussed to crawl external sites without the consent of the site admin.
 We can integrate support for external sites in a later stage,
 or have a list of allowed sites.
 
@@ -205,6 +205,10 @@ We would need to make our parsing code more generic.
 This is already proposed in this document,
 but testing is going to be done with Sphinx and MkDocs mainly.
 
+If we want to support external site to use the API,
+then we would need to expose it in a general public endpoint
+instead of the proxied API.
+
 Deprecation
 -----------
 

From 1d3c097aa4b4e033f301b1cbb1394944b16a9ffd Mon Sep 17 00:00:00 2001
From: Santos Gallegos <stsewd@protonmail.com>
Date: Mon, 19 Apr 2021 18:09:27 -0500
Subject: [PATCH 04/15] Mention intersphinx

---
 docs/development/design/embed-api.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/development/design/embed-api.rst b/docs/development/design/embed-api.rst
index 637a08fa1f1..46faf6af44d 100644
--- a/docs/development/design/embed-api.rst
+++ b/docs/development/design/embed-api.rst
@@ -184,7 +184,8 @@ Support for external sites
 --------------------------
 
 Currently this document uses ``project``, ``version``, and ``path`` to query the API,
-but since the CZI grant requires this to work with external sites, those arguments can be replaced with ``url``.
+but since the CZI grant and intersphinx support requires this to work with external sites,
+those arguments can be replaced with ``url``.
 
 Considerations
 ``````````````

From 81edc9c5d449eec691908601c2748891e1fe80bc Mon Sep 17 00:00:00 2001
From: Santos Gallegos <stsewd@protonmail.com>
Date: Thu, 22 Apr 2021 18:31:31 -0500
Subject: [PATCH 05/15] List of extra js/css: rejected

---
 docs/development/design/embed-api.rst | 35 +++++++++------------------
 1 file changed, 12 insertions(+), 23 deletions(-)

diff --git a/docs/development/design/embed-api.rst b/docs/development/design/embed-api.rst
index 46faf6af44d..5715f1749f0 100644
--- a/docs/development/design/embed-api.rst
+++ b/docs/development/design/embed-api.rst
@@ -29,7 +29,6 @@ Some characteristics/problems are:
 - The content is always an array of one element.
 - The section can be an identifier or any other four variants or the title of the section.
 - It doesn't return valid HTML for definition lists (``dd`` tags without a ``dt`` tag).
-- The client doesn't know if the page requires extra JS or CSS in order to make it work or look nice.
 - It doesn't support external sites.
 
 New API
@@ -40,7 +39,7 @@ The API would be split into two endpoints, and only have one way of querying the
 Get page
 --------
 
-Allow us to query information about a page, like its list of sections and extra js/css scripts.
+Allow us to query information about a page, like its list of sections.
 
 .. http:get:: /_/api/v3/embed/pages?project=docs&version=latest&path=install.html
 
@@ -65,11 +64,7 @@ Allow us to query information about a page, like its list of sections and extra
                "title": "Examples",
                "id": "examples"
             }
-         ],
-         "extras": {
-            "js": ["https://docs.readthedocs.io/en/latest/index.js"],
-            "css": ["https://docs.readthedocs.io/en/latest/index.css"],
-         }
+         ]
       }
 
 Get section
@@ -92,11 +87,7 @@ Allow us to query the content of the section, with all links re-written as absol
          "url": "https://docs.readthedocs.io/en/latest/install.html#examples",
          "id": "examples",
          "title": "Examples",
-         "content": "<div>I'm a html block!<div>",
-         "extras": {
-            "js": ["https://docs.readthedocs.io/en/latest/index.js"],
-            "css": ["https://docs.readthedocs.io/en/latest/index.css"],
-         }
+         "content": "<div>I'm a html block!<div>"
       }
 
 Implemention
@@ -163,17 +154,6 @@ Parse the HTML page itself rather than the relying on the fjson files.
   We can re-use code from the search parsing to detect the main content.
   This improvement can be shared with the current API (v2).
 
-Return extra js and css that may be required to render the page correctly.
-  We return a list of js and css files that are included in the page ``style`` and ``script`` tags.
-  The returned js and css files aren't guaranteed to be required in order to render the content,
-  but a decision for the client to make. Of course users can also anticipate the kind of content
-  they want to embed and extract the correct css and js in order to make it work.
-  We won't check for inline scripts.
-
-``extras`` could be returned only on ``/pages``, or only on ``/sections``.
-  It makes more sense to be only on ``/pages``,
-  but them querying a section would require to query a page to get the extra js/css files.
-
 .. note::
 
    We should probably make a distinction between our general API that handles Read the Docs resources,
@@ -228,3 +208,12 @@ Having a guide on how to use it would be better than having to maintain and publ
 
 Most users would use the embed API in their docs in form of an extension (like sphinx-hoverxref).
 Users using the API in other pages would probably have the sufficient knowledge to use the fetch and DOM API.
+
+Rejected/posponed ideas
+-----------------------
+
+Including a list of extra js/css files that may be required to make the embedded content work.
+  The client should be aware of the content it's embedding,
+  and it's their responsibility to include the required js/css to make it work.
+  We can't guarantee that the given files are necessary,
+  and could present a security threat.

From 4efd3ecc3c142f920fbaf32ccc70786644120c8a Mon Sep 17 00:00:00 2001
From: Santos Gallegos <stsewd@protonmail.com>
Date: Wed, 26 May 2021 16:56:59 -0500
Subject: [PATCH 06/15] Small updates

---
 docs/development/design/embed-api.rst | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/docs/development/design/embed-api.rst b/docs/development/design/embed-api.rst
index 5715f1749f0..e736db59565 100644
--- a/docs/development/design/embed-api.rst
+++ b/docs/development/design/embed-api.rst
@@ -118,7 +118,7 @@ The content is an string with a well formed HTML block.
 
   .. code:: html
 
-     <dl>
+     <dl class="some-class">
       ...
 
       <dt id="foo">Foo</dt>
@@ -134,7 +134,7 @@ The content is an string with a well formed HTML block.
 
   .. code:: html
 
-     <dl>
+     <dl class="some-class">
       <dt id="title">Title<dt>
       <dd>Some definition</dd>
      </dl>
@@ -150,7 +150,7 @@ The content is an string with a well formed HTML block.
   This improvement can be shared with the current API (v2).
 
 Parse the HTML page itself rather than the relying on the fjson files.
-  This allow us to use the embed API in any page and tool, and outside Read the Docs.
+  This allow us to use the embed API with any page or tool, and outside Read the Docs.
   We can re-use code from the search parsing to detect the main content.
   This improvement can be shared with the current API (v2).
 
@@ -165,7 +165,7 @@ Support for external sites
 
 Currently this document uses ``project``, ``version``, and ``path`` to query the API,
 but since the CZI grant and intersphinx support requires this to work with external sites,
-those arguments can be replaced with ``url``.
+those arguments can be replaced with ``url`` (or have two ways of querying the API).
 
 Considerations
 ``````````````
@@ -178,10 +178,12 @@ and if it's from an external site fetch it from the internet.
 The API could be missused.
 This is already true if we don't support external sites,
 since we host arbitrary HTML already.
-But it can be abussed to crawl external sites without the consent of the site admin.
+But it can be abussed to do requests to external sites without the consent of the site owner (SSRF_).
 We can integrate support for external sites in a later stage,
 or have a list of allowed sites.
 
+.. _SSRF: https://en.wikipedia.org/wiki/Server-side_request_forgery
+
 We would need to make our parsing code more generic.
 This is already proposed in this document,
 but testing is going to be done with Sphinx and MkDocs mainly.
@@ -193,7 +195,7 @@ instead of the proxied API.
 Deprecation
 -----------
 
-We should have a section in our docs instead of guide where the embed API is documented.
+We should have a section in our docs instead of a guide where the embed API is documented.
 There we can list v2 as deprecated.
 We would need to migrate our extension as well.
 Most of the parsing code could be shared between the two APIs, so it shouldn't be a burden to maintain.
@@ -202,7 +204,7 @@ API Client
 ----------
 
 Do we really need a JS client?
-The API client is a js script to allow users to use our API in any page.
+The API client is a JS script to allow users to use our API in any page.
 Using the fetch and DOM API should be easy enough to make this work.
 Having a guide on how to use it would be better than having to maintain and publish a JS package.
 

From adebe392553c614e1cd88f480e29e5571ad8f295 Mon Sep 17 00:00:00 2001
From: Manuel Kaufmann <humitos@gmail.com>
Date: Mon, 31 May 2021 13:11:14 +0200
Subject: [PATCH 07/15] Updates after our roadmap planning meeting

---
 docs/development/design/embed-api.rst | 288 +++++++++++++-------------
 1 file changed, 139 insertions(+), 149 deletions(-)

diff --git a/docs/development/design/embed-api.rst b/docs/development/design/embed-api.rst
index e736db59565..e9a34387376 100644
--- a/docs/development/design/embed-api.rst
+++ b/docs/development/design/embed-api.rst
@@ -1,221 +1,211 @@
-Embed API
-=========
+Embed APIv3
+===========
 
-The Embed API allows to embed content from docs pages in other sites.
-For a while it has been as an *experimental* feature without public documentation or real applications,
-but recently it has been used widely (mainly because we created a Sphinx extension).
+The Embed API allows to embed content from documentation pages in other sites.
+It has been treated as an *experimental* feature without public documentation or real applications,
+but recently it started to be used widely (mainly because we created a Sphinx extension).
 
-This improvement is part of the `CZI grant`_.
-Due to this we need to have more friendly to use API,
-and general and stable enough to support it for a long time and with external sites.
+The main goal of this document is to design a new version of the Embed API to be more user friendly,
+make it more stable over time, support documentation pages not hosted at Read the Docs,
+and remove some quirkiness that makes it hard to maintain and difficult to use.
+
+.. note::
+
+   This work is part of the `CZI grant`_ that Read the Docs received.
 
 .. _CZI grant: https://blog.readthedocs.com/czi-grant-announcement/
 
 .. contents::
    :local:
-   :depth: 3
+   :depth: 2
+
 
 Current implementation
 ----------------------
 
 The current implementation of the API is partially documented in :doc:`/guides/embedding-content`.
-Some characteristics/problems are:
+It has some known problems:
+
+* There are different ways of querying the API: ``?url=`` (generic) and ``?doc=`` (relies on Sphinx's specific concept)
+* Doesn't support MkDocs
+* Lookups are slow (~500 ms)
+* IDs returned aren't well formed (like empty IDs ``"headers": [{"title": "#"}]``)
+* The content is always an array of one element
+* It tries different variations of the original ID
+* It doesn't return valid HTML for definition lists (``dd`` tags without a ``dt`` tag)
+
 
-- There are three ways of querying the API, and some rely on Sphinx's concepts like ``doc``.
-- Doesn't support MkDocs.
-- It returns all sections from the current page on every request.
-- Lookups are slow (~500 ms).
-- IDs returned aren't well formed (like empty IDs `#`).
-- The content is always an array of one element.
-- The section can be an identifier or any other four variants or the title of the section.
-- It doesn't return valid HTML for definition lists (``dd`` tags without a ``dt`` tag).
-- It doesn't support external sites.
+Goals
+-----
 
-New API
--------
+Considering the problems mentioned in the previous section,
+the inclusion of new features and the definition of a contract that works the same for all,
+this document set the following goals for the new version of this endpoint:
 
-The API would be split into two endpoints, and only have one way of querying the API.
+* Support external documents hosted outside Read the Docs
+* Do not depend on Sphinx ``.fjson`` files
+* Query and parse the ``.html`` file directly (from our storage or from an external request)
+* Rewrite all links returned in the content to make them absolute
+* Always return valid HTML structure
+* Delete HTML tags from the original document if needed
+* Support ``?nwords=`` and ``?nparagraphs=`` to return chunked content
+* Require a valid HTML ``id`` selector
+* Handle special cases for particular doctools (e.g. Sphinx requires to return the ``.parent()`` element for ``dl``)
+* Make explicit the client is asking to handle the special cases (e.g. send ``?doctool=sphinx&version=4.0.1``)
+* Accept only ``?url=`` request GET argument to query the endpoint
+* Add HTTP cache headers to cache responses
+* Allow :abbr:`CORS` from everywhere
 
-Get page
---------
 
-Allow us to query information about a page, like its list of sections.
+Embed endpoint
+--------------
 
-.. http:get:: /_/api/v3/embed/pages?project=docs&version=latest&path=install.html
+Returns the exact HTML content for a specific identifier.
+If no anchor identifier is specified the content of the whole page is returned.
 
-   :query project: (required)
-   :query version: (required)
-   :query path: (required)
+.. http:get:: /api/v3/embed/?url=https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment
+
+   :query url (required): Full URL for the documentation page with optional anchor identifier.
+   :query expand (optional): Allows to return extra data about the page. Currently, only ``?expand=identifiers`` is supported
+      to return all the identifiers that page accepts.
 
    .. sourcecode:: json
 
       {
          "project": "docs",
          "version": "latest",
-         "path": "install.html",
-         "title": "Installation Guide",
-         "url": "https://docs.readthedocs.io/en/latest/install.html",
-         "sections": [
-            {
-               "title": "Installation",
-               "id": "installation"
-            },
-            {
-               "title": "Examples",
-               "id": "examples"
-            }
-         ]
+         "language": "en",
+         "path": "development/install.html",
+         "title": "Development Installation",
+         "url": "https://docs.readthedocs.io/en/latest/install.html#set-up-your-environment",
+         "id": "#set-up-your-environment",
+         "content": "<div class=\"section\" id=\"development-installation\">\n<h1>Development Installation<a class=\"headerlink\" href=\"https://docs.readthedocs.io/en/stable/development/install.html#development-installation#development-installation\" title=\"Permalink to this headline\">¶</a></h1>\n ..."
       }
 
-Get section
------------
-
-Allow us to query the content of the section, with all links re-written as absolute.
 
-.. http:get:: /_/api/v3/embed/sections?project=docs&version=latest&path=install.html#examples
-
-   :query project: (required)
-   :query version: (required)
-   :query path: Path with or without fragment (required)
+   When used together with ``?expand=identifiers`` the follwing field is also returned:
 
    .. sourcecode:: json
 
       {
-         "project": "docs",
-         "version": "latest",
-         "path": "install.html",
-         "url": "https://docs.readthedocs.io/en/latest/install.html#examples",
-         "id": "examples",
-         "title": "Examples",
-         "content": "<div>I'm a html block!<div>"
+         "identifiers": [
+            {
+               "title": "Set up your environment",
+               "id": "#set-up-your-environment",
+               "url": "https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment"
+            },
+            {
+               "title": "Check that everything works",
+               "id": "#check-that-everything-works",
+               "url": "https://docs.readthedocs.io/en/latest/development/install.html#check-that-everything-works"
+            },
+            ...
+         ]
       }
 
-Implemention
-------------
 
-If a section or page doesn't exist, we return 404.
-  This guarantees that the client requesting this resource has a way of knowing the response is correct.
+Handle specific Sphinx cases
+----------------------------
 
-All links are re-written to be absolute.
-  Allow the content to be located in any page and in external sites
-  (this is already done).
+.. https://github.com/readthedocs/readthedocs.org/pull/8039#discussion_r640670085
 
-All sections listed are from html tags that are linkeable.
-  This is, they have an ``id``
-  (we don't rely on the toctree from the fjson file anymore).
-  This way is more easy to parse and get the wanted section,
-  instead of restricting to some types of contents.
+We are currently handling some special cases for Sphinx due how it writes the HTML output structure.
+In some cases, we look for the HTML tag with the identifier requested but we return
+the ``.next()`` HTML tag or the ``.parent()`` tag instead of the *requested one*.
 
-The IDs returned don't contain the redundant ``#`` symbol.
-  The fragment part could be used in external tools.
+Currently, we have identified that this happens for definition tags (``dl``, ``dt``, ``dd``)
+--but may be other cases we don't know yet.
+Sphinx adds the ``id=`` attribute to the ``dt`` tag, which contains only the title of the definition,
+but as a user, we are expecting the description of it.
 
-The content is an string with a well formed HTML block.
-  Malformed HTML can cause the content to be rendered in unexpected ways.
-  Some HTML tags are required to be be inside other tags or be surrounded by other tags,
-  examples are ``li`` tags inside ``ul`` or ``dd`` tags inside ``dl`` and having a ``dt`` tag.
+In the following example we will return the whole ``dl`` HTML tag instead of
+the HTML tag with the identifier ``id="term-name"`` as requested by the client,
+because otherwise the "Term definition for Term Name" content won't be included and the response would be useless.
 
-  For example extracting the ``title`` section from this snipped:
+.. code:: html
 
-  .. code:: html
+   <dl class="glossary docutils">
+     <dt id="term-name">Term Name</dt>
+     <dd>Term definition for Term Name</dd>
+   </dl>
 
-     <dl class="some-class">
-      ...
+If the definition list (``dl``) has more than *one definition* it will return **only the term requested**.
+Considering the following example, with the request ``?url=glossary.html#term-name``
 
-      <dt id="foo">Foo</dt>
-      <dd>Some definition</dd>
+.. code:: html
 
-      <dt id="title">Title<dt>
-      <dd>Some definition</dd>
+   <dl class="glossary docutils">
+     ...
 
-      ...
-     </dl>
+     <dt id="term-name">Term Name</dt>
+     <dd>Term definition for Term Name</dd>
 
-  Would result in
+     <dt id="term-unknown">Term Unknown</dt>
+     <dd>Term definition for Term Unknown </dd>
 
-  .. code:: html
+     ...
+   </dl>
 
-     <dl class="some-class">
-      <dt id="title">Title<dt>
-      <dd>Some definition</dd>
-     </dl>
 
-  Instead of
+It will return the whole ``dl`` with only the ``dt`` and ``dd`` for ``id`` requested:
 
-  .. code:: html
+.. code:: html
 
-     <dd>Some definition</dd>
+   <dl class="glossary docutils">
+     <dt id="term-name">Term Name</dt>
+     <dd>Term definition for Term Name</dd>
+   </dl>
 
-  Note that we only try to keep the current structure,
-  if the page contains malformed HTML, we don't try to *fix it*.
-  This improvement can be shared with the current API (v2).
 
-Parse the HTML page itself rather than the relying on the fjson files.
-  This allow us to use the embed API with any page or tool, and outside Read the Docs.
-  We can re-use code from the search parsing to detect the main content.
-  This improvement can be shared with the current API (v2).
+However, this assumptions may not apply to documentation pages built with a different doctool than Sphinx.
+For this reason, we need to communicate to the API that we want to handle this special cases in the backend.
+This will be done by appending a request GET argument to the Embed API endpoint: ``?doctool=sphinx&version=4.0.1``.
+In this case, the backend will known that has to deal with these special cases.
 
 .. note::
 
-   We should probably make a distinction between our general API that handles Read the Docs resources,
-   vs our APIs that expose features (like server side search, footer, and embed, all of them proxied).
-   This way we can version each endpoint separately.
+   This leaves the door open to be able to support more special cases (e.g. for other doctools) without breaking the actual behavior.
 
-Support for external sites
---------------------------
 
-Currently this document uses ``project``, ``version``, and ``path`` to query the API,
-but since the CZI grant and intersphinx support requires this to work with external sites,
-those arguments can be replaced with ``url`` (or have two ways of querying the API).
+Support for external documents
+------------------------------
 
-Considerations
-``````````````
+When the ``?url=`` argument passed belongs to a documentation page not hosted on Read the Docs,
+the endpoint will do an external request to download the HTML file,
+parse it and return the content for the identifier requested.
 
-If a project changes its custom domain, current usage of the API would break.
+The whole logic should be the same, the only difference would be where the source HTML comes from.
 
-We would need to check if the domain belongs to a project inside RTD and fetch the file from storage,
-and if it's from an external site fetch it from the internet.
+.. warning::
 
-The API could be missused.
-This is already true if we don't support external sites,
-since we host arbitrary HTML already.
-But it can be abussed to do requests to external sites without the consent of the site owner (SSRF_).
-We can integrate support for external sites in a later stage,
-or have a list of allowed sites.
+   We should be carefull with the URL received from the user because those may be internal URLs and we could be leaking some data.
+   Example: ``?url=http://localhost/some-weird-endpoint`` or ``?url=http://169.254.169.254/latest/meta-data/``
+   (see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html).
 
-.. _SSRF: https://en.wikipedia.org/wiki/Server-side_request_forgery
+   This is related to SSRF (https://en.wikipedia.org/wiki/Server-side_request_forgery).
+   It doesn't seem to be a huge problem, but something to consider.
 
-We would need to make our parsing code more generic.
-This is already proposed in this document,
-but testing is going to be done with Sphinx and MkDocs mainly.
+   Also, the endpoint may need to limit the requests per-external domain to avoid using our servers to take down another site.
 
-If we want to support external site to use the API,
-then we would need to expose it in a general public endpoint
-instead of the proxied API.
 
-Deprecation
------------
+Embed APIv2 deprecation
+-----------------------
 
-We should have a section in our docs instead of a guide where the embed API is documented.
-There we can list v2 as deprecated.
-We would need to migrate our extension as well.
-Most of the parsing code could be shared between the two APIs, so it shouldn't be a burden to maintain.
+The v2 is currently widely used by projects using the ``sphinx-hoverxref`` extension.
+Because of that, we need to keep supporting it as-is for a long time.
 
-API Client
-----------
+Next steps on this direction should be:
 
-Do we really need a JS client?
-The API client is a JS script to allow users to use our API in any page.
-Using the fetch and DOM API should be easy enough to make this work.
-Having a guide on how to use it would be better than having to maintain and publish a JS package.
+* Add a note in the documentation mentioning this endpoint is deprecated
+* Promote the usage of the new Embed APIv3
+* Migrate the ``sphinx-hoverxref`` extension to use the new endpoint
 
-Most users would use the embed API in their docs in form of an extension (like sphinx-hoverxref).
-Users using the API in other pages would probably have the sufficient knowledge to use the fetch and DOM API.
+Once we have done them, we could check our NGINX logs to find out if there are people still using APIv2,
+contact them and let them know that they have some months to migrate since the endpoint is deprecated and will be removed.
 
-Rejected/posponed ideas
------------------------
 
-Including a list of extra js/css files that may be required to make the embedded content work.
-  The client should be aware of the content it's embedding,
-  and it's their responsibility to include the required js/css to make it work.
-  We can't guarantee that the given files are necessary,
-  and could present a security threat.
+Unanswered questions
+--------------------
+
+* How do we distinguish between our APIv3 for resources (models in the database) from these "feature API endpoints"?
+* What happen if a project changes its custom domain? Do we support redirects in this case?

From e307563f47e447d9ecf4f360713d112af0924f5c Mon Sep 17 00:00:00 2001
From: Manuel Kaufmann <humitos@gmail.com>
Date: Tue, 1 Jun 2021 10:11:12 +0200
Subject: [PATCH 08/15] Remove `#` from the id's field response

---
 docs/development/design/embed-api.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/development/design/embed-api.rst b/docs/development/design/embed-api.rst
index e9a34387376..55b379eb4b5 100644
--- a/docs/development/design/embed-api.rst
+++ b/docs/development/design/embed-api.rst
@@ -78,7 +78,7 @@ If no anchor identifier is specified the content of the whole page is returned.
          "path": "development/install.html",
          "title": "Development Installation",
          "url": "https://docs.readthedocs.io/en/latest/install.html#set-up-your-environment",
-         "id": "#set-up-your-environment",
+         "id": "set-up-your-environment",
          "content": "<div class=\"section\" id=\"development-installation\">\n<h1>Development Installation<a class=\"headerlink\" href=\"https://docs.readthedocs.io/en/stable/development/install.html#development-installation#development-installation\" title=\"Permalink to this headline\">¶</a></h1>\n ..."
       }
 
@@ -91,12 +91,12 @@ If no anchor identifier is specified the content of the whole page is returned.
          "identifiers": [
             {
                "title": "Set up your environment",
-               "id": "#set-up-your-environment",
+               "id": "set-up-your-environment",
                "url": "https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment"
             },
             {
                "title": "Check that everything works",
-               "id": "#check-that-everything-works",
+               "id": "check-that-everything-works",
                "url": "https://docs.readthedocs.io/en/latest/development/install.html#check-that-everything-works"
             },
             ...

From f783339f63636983a1baf181dfad9a1aeb52b22b Mon Sep 17 00:00:00 2001
From: Manuel Kaufmann <humitos@gmail.com>
Date: Mon, 7 Jun 2021 15:43:56 +0200
Subject: [PATCH 09/15] Update docs/development/design/embed-api.rst

---
 docs/development/design/embed-api.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/development/design/embed-api.rst b/docs/development/design/embed-api.rst
index 55b379eb4b5..b5118981a81 100644
--- a/docs/development/design/embed-api.rst
+++ b/docs/development/design/embed-api.rst
@@ -79,7 +79,7 @@ If no anchor identifier is specified the content of the whole page is returned.
          "title": "Development Installation",
          "url": "https://docs.readthedocs.io/en/latest/install.html#set-up-your-environment",
          "id": "set-up-your-environment",
-         "content": "<div class=\"section\" id=\"development-installation\">\n<h1>Development Installation<a class=\"headerlink\" href=\"https://docs.readthedocs.io/en/stable/development/install.html#development-installation#development-installation\" title=\"Permalink to this headline\">¶</a></h1>\n ..."
+         "content": "<div class=\"section\" id=\"development-installation\">\n<h1>Development Installation<a class=\"headerlink\" href=\"https://docs.readthedocs.io/en/stable/development/install.html#development-installation\" title=\"Permalink to this headline\">¶</a></h1>\n ..."
       }
 
 

From efe5abc45d09dcc2af57e352ea0e7d58c9c2d3d3 Mon Sep 17 00:00:00 2001
From: Manuel Kaufmann <humitos@gmail.com>
Date: Wed, 9 Jun 2021 15:26:17 +0200
Subject: [PATCH 10/15] Clarify embedding content from pages hosted outside
 readthedocs

---
 docs/development/design/embed-api.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/development/design/embed-api.rst b/docs/development/design/embed-api.rst
index b5118981a81..c26d965458b 100644
--- a/docs/development/design/embed-api.rst
+++ b/docs/development/design/embed-api.rst
@@ -6,7 +6,7 @@ It has been treated as an *experimental* feature without public documentation or
 but recently it started to be used widely (mainly because we created a Sphinx extension).
 
 The main goal of this document is to design a new version of the Embed API to be more user friendly,
-make it more stable over time, support documentation pages not hosted at Read the Docs,
+make it more stable over time, support embedding content from pages not hosted at Read the Docs,
 and remove some quirkiness that makes it hard to maintain and difficult to use.
 
 .. note::
@@ -42,7 +42,7 @@ Considering the problems mentioned in the previous section,
 the inclusion of new features and the definition of a contract that works the same for all,
 this document set the following goals for the new version of this endpoint:
 
-* Support external documents hosted outside Read the Docs
+* Support embedding content from pages hosted outside Read the Docs
 * Do not depend on Sphinx ``.fjson`` files
 * Query and parse the ``.html`` file directly (from our storage or from an external request)
 * Rewrite all links returned in the content to make them absolute

From 145b6e808f1eded077fc407c5db5458d82332478 Mon Sep 17 00:00:00 2001
From: Manuel Kaufmann <humitos@gmail.com>
Date: Wed, 9 Jun 2021 15:27:25 +0200
Subject: [PATCH 11/15] Update document based on feedback

- re-order goals
- allow CORS only for public projects
- new section with the definition of the contract
- define `/api/v3/embed/identifiers/` endpoint
  - remove `title` field from it because it's not easy to get it
  - return only available identifiers
  - add `_links` to make the API browseable
- handle project's domain changes querying for 3xx status codes
---
 docs/development/design/embed-api.rst | 84 +++++++++++++++++++++------
 1 file changed, 65 insertions(+), 19 deletions(-)

diff --git a/docs/development/design/embed-api.rst b/docs/development/design/embed-api.rst
index c26d965458b..194127416ef 100644
--- a/docs/development/design/embed-api.rst
+++ b/docs/development/design/embed-api.rst
@@ -46,28 +46,42 @@ this document set the following goals for the new version of this endpoint:
 * Do not depend on Sphinx ``.fjson`` files
 * Query and parse the ``.html`` file directly (from our storage or from an external request)
 * Rewrite all links returned in the content to make them absolute
-* Always return valid HTML structure
-* Delete HTML tags from the original document if needed
-* Support ``?nwords=`` and ``?nparagraphs=`` to return chunked content
 * Require a valid HTML ``id`` selector
+* Accept only ``?url=`` request GET argument to query the endpoint
+* Support ``?nwords=`` and ``?nparagraphs=`` to return chunked content
 * Handle special cases for particular doctools (e.g. Sphinx requires to return the ``.parent()`` element for ``dl``)
 * Make explicit the client is asking to handle the special cases (e.g. send ``?doctool=sphinx&version=4.0.1``)
-* Accept only ``?url=`` request GET argument to query the endpoint
+* Delete HTML tags from the original document if needed
 * Add HTTP cache headers to cache responses
-* Allow :abbr:`CORS` from everywhere
+* Allow :abbr:`CORS` from everywhere *only* for public projects
+
+
+The contract
+------------
+
+Return the HTML tag (and its children) with the ``id`` selector requested
+and replace all the relative links from its content making them absolute.
+
+.. note::
+
+   Any other case outside this contract will be considered *special* and will be implemented
+   only under ``?doctool=`` and ``?version=`` arguments.
+
+If no ``id`` selector is sent to the request, the content of the first meaningfull HTML tag
+(``<main>``, ``<div role="main">``, etc) identifier found is returned.
 
 
-Embed endpoint
---------------
+Embed endpoints
+---------------
 
-Returns the exact HTML content for a specific identifier.
-If no anchor identifier is specified the content of the whole page is returned.
+This is the list of endpoints to be implemented in APIv3:
 
 .. http:get:: /api/v3/embed/?url=https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment
 
+   Returns the exact HTML content for a specific identifier (``id``).
+   If no anchor identifier is specified the content of the first one returned.
+
    :query url (required): Full URL for the documentation page with optional anchor identifier.
-   :query expand (optional): Allows to return extra data about the page. Currently, only ``?expand=identifiers`` is supported
-      to return all the identifiers that page accepts.
 
    .. sourcecode:: json
 
@@ -83,25 +97,31 @@ If no anchor identifier is specified the content of the whole page is returned.
       }
 
 
-   When used together with ``?expand=identifiers`` the follwing field is also returned:
+.. http:get:: /api/v3/embed/identifiers/?url=https://docs.readthedocs.io/en/latest/development/install.html
+
+   Returns all the available identifiers for an specific page.
+
+   :query url (required): Full URL for the documentation page
 
    .. sourcecode:: json
 
-      {
-         "identifiers": [
+      [
             {
-               "title": "Set up your environment",
                "id": "set-up-your-environment",
                "url": "https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment"
+               "_links": {
+                 "embed": "https://docs.readthedocs.io/_/api/v3/embed/?url=https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment"
+               }
             },
             {
-               "title": "Check that everything works",
                "id": "check-that-everything-works",
                "url": "https://docs.readthedocs.io/en/latest/development/install.html#check-that-everything-works"
+               "_links": {
+                 "embed": "https://docs.readthedocs.io/_/api/v3/embed/?url=https://docs.readthedocs.io/en/latest/development/install.html#check-that-everything-works"
+               }
             },
             ...
-         ]
-      }
+      ]
 
 
 Handle specific Sphinx cases
@@ -188,6 +208,33 @@ The whole logic should be the same, the only difference would be where the sourc
    Also, the endpoint may need to limit the requests per-external domain to avoid using our servers to take down another site.
 
 
+Handle project's domain changes
+-------------------------------
+
+The proposed Embed APIv3 implementation only allows ``?url=`` argument to embed content from that page.
+That URL can be:
+
+* a URL for a project hosted under ``<project-slug>.readthedocs.io``
+* a URL for a project with a custom domain
+
+In the first case, we can easily get the project's slug directly from the URL.
+However, in the second case we get the project's slug by querying our database for a ``Domain`` object
+with the full domain from the URL.
+
+Now, consider that all the links in the documentation page that uses Embed APIv3 are pointing to
+``docs.example.com`` and the author decides to change the domain to be ``docs.newdomain.com``.
+At this point there are different possible scenarios:
+
+* The user creates a new ``Domain`` object with ``docs.newdomain.com`` as domain's name.
+  In this case, old links will keep working because we still have the old ``Domain`` object in our database
+  and we can use it to get the project's slug.
+* The user *deletes* the old ``Domain`` besides creating the new one.
+  In this scenario, our query for a ``Domain`` with name ``docs.example.com`` to our database will fail.
+  We will need to do a request to ``docs.example.com`` and check for a 3xx response status code and in that case,
+  we can read the ``Location:`` HTTP header to find the new domain's name for the documentation.
+  Once we have the new domain from the redirect response, we can query our database again to find out the project's slug.
+
+
 Embed APIv2 deprecation
 -----------------------
 
@@ -208,4 +255,3 @@ Unanswered questions
 --------------------
 
 * How do we distinguish between our APIv3 for resources (models in the database) from these "feature API endpoints"?
-* What happen if a project changes its custom domain? Do we support redirects in this case?

From 5063bbc6e900a5c3cbe123e81969f1f2a8e78c80 Mon Sep 17 00:00:00 2001
From: Manuel Kaufmann <humitos@gmail.com>
Date: Wed, 9 Jun 2021 16:46:46 +0200
Subject: [PATCH 12/15] Rename endpoint to be `/metadata/`

---
 docs/development/design/embed-api.rst | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/docs/development/design/embed-api.rst b/docs/development/design/embed-api.rst
index 194127416ef..217f57eb58f 100644
--- a/docs/development/design/embed-api.rst
+++ b/docs/development/design/embed-api.rst
@@ -97,15 +97,16 @@ This is the list of endpoints to be implemented in APIv3:
       }
 
 
-.. http:get:: /api/v3/embed/identifiers/?url=https://docs.readthedocs.io/en/latest/development/install.html
+.. http:get:: /api/v3/embed/metadata/?url=https://docs.readthedocs.io/en/latest/development/install.html
 
-   Returns all the available identifiers for an specific page.
+   Returns all the available metadata for an specific page.
 
    :query url (required): Full URL for the documentation page
 
    .. sourcecode:: json
 
-      [
+      {
+        "identifiers":
             {
                "id": "set-up-your-environment",
                "url": "https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment"
@@ -121,7 +122,7 @@ This is the list of endpoints to be implemented in APIv3:
                }
             },
             ...
-      ]
+      }
 
 
 Handle specific Sphinx cases

From 84c7ce0aece10bcdbe293aa92ac3e2c953b8a478 Mon Sep 17 00:00:00 2001
From: Manuel Kaufmann <humitos@gmail.com>
Date: Thu, 10 Jun 2021 13:23:32 +0200
Subject: [PATCH 13/15] Apply suggestions from code review

Co-authored-by: Eric Holscher <25510+ericholscher@users.noreply.github.com>
---
 docs/development/design/embed-api.rst | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/docs/development/design/embed-api.rst b/docs/development/design/embed-api.rst
index 217f57eb58f..0d67770abc3 100644
--- a/docs/development/design/embed-api.rst
+++ b/docs/development/design/embed-api.rst
@@ -1,9 +1,9 @@
 Embed APIv3
 ===========
 
-The Embed API allows to embed content from documentation pages in other sites.
+The Embed API allows users to embed content from documentation pages in other sites.
 It has been treated as an *experimental* feature without public documentation or real applications,
-but recently it started to be used widely (mainly because we created a Sphinx extension).
+but recently it started to be used widely (mainly because we created the ``hoverxref`` Sphinx extension).
 
 The main goal of this document is to design a new version of the Embed API to be more user friendly,
 make it more stable over time, support embedding content from pages not hosted at Read the Docs,
@@ -38,9 +38,8 @@ It has some known problems:
 Goals
 -----
 
-Considering the problems mentioned in the previous section,
-the inclusion of new features and the definition of a contract that works the same for all,
-this document set the following goals for the new version of this endpoint:
+We plan to add new features and define a contract that works the same for all HTML.
+This project has the following goals:
 
 * Support embedding content from pages hosted outside Read the Docs
 * Do not depend on Sphinx ``.fjson`` files
@@ -197,6 +196,9 @@ parse it and return the content for the identifier requested.
 
 The whole logic should be the same, the only difference would be where the source HTML comes from.
 
+To start this would be an allowed list of domains for common Sphinx docs projects.
+Things like Django & Python, where hoverxref users might commonly want to embed from.
+We aren't planning to allow arbitrary HTML from any website.
 .. warning::
 
    We should be carefull with the URL received from the user because those may be internal URLs and we could be leaking some data.

From 3b320e84c5f59f316f1c056abbb7ee9f0ac06749 Mon Sep 17 00:00:00 2001
From: Manuel Kaufmann <humitos@gmail.com>
Date: Mon, 14 Jun 2021 11:38:12 +0200
Subject: [PATCH 14/15] Updates from feedback

---
 docs/development/design/embed-api.rst | 33 ++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/docs/development/design/embed-api.rst b/docs/development/design/embed-api.rst
index 0d67770abc3..8e5b8c4705d 100644
--- a/docs/development/design/embed-api.rst
+++ b/docs/development/design/embed-api.rst
@@ -49,8 +49,8 @@ This project has the following goals:
 * Accept only ``?url=`` request GET argument to query the endpoint
 * Support ``?nwords=`` and ``?nparagraphs=`` to return chunked content
 * Handle special cases for particular doctools (e.g. Sphinx requires to return the ``.parent()`` element for ``dl``)
-* Make explicit the client is asking to handle the special cases (e.g. send ``?doctool=sphinx&version=4.0.1``)
-* Delete HTML tags from the original document if needed
+* Make explicit the client is asking to handle the special cases (e.g. send ``?doctool=sphinx&version=4.0.1&writer=html4``)
+* Delete HTML tags from the original document (for well-defined special cases)
 * Add HTTP cache headers to cache responses
 * Allow :abbr:`CORS` from everywhere *only* for public projects
 
@@ -64,10 +64,10 @@ and replace all the relative links from its content making them absolute.
 .. note::
 
    Any other case outside this contract will be considered *special* and will be implemented
-   only under ``?doctool=`` and ``?version=`` arguments.
+   only under ``?doctool=``, ``?version=`` and ``?writer=`` arguments.
 
 If no ``id`` selector is sent to the request, the content of the first meaningfull HTML tag
-(``<main>``, ``<div role="main">``, etc) identifier found is returned.
+(``<main>``, ``<div role="main">`` or other well-defined standard tags) identifier found is returned.
 
 
 Embed endpoints
@@ -100,6 +100,16 @@ This is the list of endpoints to be implemented in APIv3:
 
    Returns all the available metadata for an specific page.
 
+   .. note::
+
+      As it's not trivial to get the ``title`` associated with a particular ``id`` and it's not easy to get a nested list of identifiers,
+      we may not implement this endpoint in initial version.
+
+      The endpoint as-is, is mainly useful to explore/discover what are the identifiers available for a particular page
+      --which is handy in the development process of a new tool that consumes the API.
+      Because of this, we don't have too much traction to add it in the initial version.
+
+
    :query url (required): Full URL for the documentation page
 
    .. sourcecode:: json
@@ -179,7 +189,7 @@ It will return the whole ``dl`` with only the ``dt`` and ``dd`` for ``id`` reque
 
 However, this assumptions may not apply to documentation pages built with a different doctool than Sphinx.
 For this reason, we need to communicate to the API that we want to handle this special cases in the backend.
-This will be done by appending a request GET argument to the Embed API endpoint: ``?doctool=sphinx&version=4.0.1``.
+This will be done by appending a request GET argument to the Embed API endpoint: ``?doctool=sphinx&version=4.0.1&writer=html4``.
 In this case, the backend will known that has to deal with these special cases.
 
 .. note::
@@ -196,9 +206,6 @@ parse it and return the content for the identifier requested.
 
 The whole logic should be the same, the only difference would be where the source HTML comes from.
 
-To start this would be an allowed list of domains for common Sphinx docs projects.
-Things like Django & Python, where hoverxref users might commonly want to embed from.
-We aren't planning to allow arbitrary HTML from any website.
 .. warning::
 
    We should be carefull with the URL received from the user because those may be internal URLs and we could be leaking some data.
@@ -210,6 +217,12 @@ We aren't planning to allow arbitrary HTML from any website.
 
    Also, the endpoint may need to limit the requests per-external domain to avoid using our servers to take down another site.
 
+.. note::
+
+   Due to the potential security issues mentioned, we will start with an allowed list of domains for common Sphinx docs projects.
+   Projects like Django and Python, where ``sphinx-hoverxref`` users might commonly want to embed from.
+   We aren't planning to allow arbitrary HTML from any website.
+
 
 Handle project's domain changes
 -------------------------------
@@ -237,6 +250,10 @@ At this point there are different possible scenarios:
   we can read the ``Location:`` HTTP header to find the new domain's name for the documentation.
   Once we have the new domain from the redirect response, we can query our database again to find out the project's slug.
 
+  .. note::
+
+     We will follow up to 5 redirects to find out the project's domain.
+
 
 Embed APIv2 deprecation
 -----------------------

From b34fce088cf358d1375e7a787a637fe00850d00a Mon Sep 17 00:00:00 2001
From: Manuel Kaufmann <humitos@gmail.com>
Date: Mon, 14 Jun 2021 11:54:44 +0200
Subject: [PATCH 15/15] Improve API-docs render

---
 docs/development/design/embed-api.rst | 53 +++++++++++++++++----------
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/docs/development/design/embed-api.rst b/docs/development/design/embed-api.rst
index 8e5b8c4705d..463e0c67400 100644
--- a/docs/development/design/embed-api.rst
+++ b/docs/development/design/embed-api.rst
@@ -75,12 +75,18 @@ Embed endpoints
 
 This is the list of endpoints to be implemented in APIv3:
 
-.. http:get:: /api/v3/embed/?url=https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment
+.. http:get:: /api/v3/embed/
 
    Returns the exact HTML content for a specific identifier (``id``).
    If no anchor identifier is specified the content of the first one returned.
 
-   :query url (required): Full URL for the documentation page with optional anchor identifier.
+    **Example request**:
+
+    .. tabs::
+
+       $ curl https://readthedocs.org/api/v3/embed/?url=https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment
+
+    **Example response**:
 
    .. sourcecode:: json
 
@@ -95,8 +101,10 @@ This is the list of endpoints to be implemented in APIv3:
          "content": "<div class=\"section\" id=\"development-installation\">\n<h1>Development Installation<a class=\"headerlink\" href=\"https://docs.readthedocs.io/en/stable/development/install.html#development-installation\" title=\"Permalink to this headline\">¶</a></h1>\n ..."
       }
 
+   :query url (required): Full URL for the documentation page with optional anchor identifier.
+
 
-.. http:get:: /api/v3/embed/metadata/?url=https://docs.readthedocs.io/en/latest/development/install.html
+.. http:get:: /api/v3/embed/metadata/
 
    Returns all the available metadata for an specific page.
 
@@ -109,30 +117,35 @@ This is the list of endpoints to be implemented in APIv3:
       --which is handy in the development process of a new tool that consumes the API.
       Because of this, we don't have too much traction to add it in the initial version.
 
+    **Example request**:
 
-   :query url (required): Full URL for the documentation page
+    .. tabs::
+
+       $ curl https://readthedocs.org/api/v3/embed/metadata/?url=https://docs.readthedocs.io/en/latest/development/install.html
+
+    **Example response**:
 
    .. sourcecode:: json
 
       {
-        "identifiers":
-            {
-               "id": "set-up-your-environment",
-               "url": "https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment"
-               "_links": {
-                 "embed": "https://docs.readthedocs.io/_/api/v3/embed/?url=https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment"
-               }
-            },
-            {
-               "id": "check-that-everything-works",
-               "url": "https://docs.readthedocs.io/en/latest/development/install.html#check-that-everything-works"
-               "_links": {
-                 "embed": "https://docs.readthedocs.io/_/api/v3/embed/?url=https://docs.readthedocs.io/en/latest/development/install.html#check-that-everything-works"
-               }
-            },
-            ...
+        "identifiers": {
+            "id": "set-up-your-environment",
+            "url": "https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment"
+            "_links": {
+                "embed": "https://docs.readthedocs.io/_/api/v3/embed/?url=https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment"
+            }
+        },
+        {
+            "id": "check-that-everything-works",
+            "url": "https://docs.readthedocs.io/en/latest/development/install.html#check-that-everything-works"
+            "_links": {
+                "embed": "https://docs.readthedocs.io/_/api/v3/embed/?url=https://docs.readthedocs.io/en/latest/development/install.html#check-that-everything-works"
+            }
+         },
       }
 
+   :query url (required): Full URL for the documentation page
+
 
 Handle specific Sphinx cases
 ----------------------------