-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design doc: Embed APIv3 #8222
Design doc: Embed APIv3 #8222
Conversation
* Rewrite all links returned in the content to make them absolute | ||
* Always return valid HTML structure | ||
* Delete HTML tags from the original document if needed | ||
* Support ``?nwords=`` and ``?nparagraphs=`` to return chunked content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still were discussing this. I don't think we can support it without having to handle specific cases, not all returned html will have p
tags or the word will end in a valid tag, we also need to skip tags from the lenght
<p>Cut <strong>here?</strong></p>
Having the client using css to "hide" long texts is more easy to handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example you shared would return:
- with
?nwords=1
:<p>Cut</p>
- with
?nwords=2
:<p>Cut <strong>here?</strong></p>
- with
?nparagraphs=1
:<p>Cut <strong>here?</strong></p>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but the problem is calculating the words and taking the tags into account, you'll be removing tags from the original html, possible breaking some styles like with tables/definition lists. And also how would you know if there is a paragraph when you have content surrounded in other tags like lists or divs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have been playing with this already in https://github.com/readthedocs/readthedocs-ext/pull/304/. How to do it is an implementation detail and I'm sure there is going to be some problems we will need to solve. However, I don't think it's impossible.
I've been playing a little with BeautifulSoup already and this seems to work close enough:
# nparagraphs.py
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('install.html'))
nparagraphs = 3
for element in soup.find('div', attrs={'id': 'development-installation'}).findAll():
if nparagraphs == 0:
element.replaceWith('')
if element.name == 'p' and nparagraphs > 0:
nparagraphs -= 1
$ wget https://docs.readthedocs.io/en/stable/development/install.html
$ python nparagraph.py
Click to see the output
<div class="section" id="development-installation">
<h1>
Development Installation
<a class="headerlink" href="#development-installation" title="Permalink to this headline">
¶
</a>
</h1>
<p>
These are development setup and
<a class="hoverxref tooltip reference internal" data-doc="development/install" data-docpath="/development/install.html" data-project="docs" data-section="core-team-standards" data-version="stable" href="#core-team-standards">
<span class="std std-ref">
standards
</span>
</a>
that are adhered to by the core development team while
developing Read the Docs and related services. If you are a contributor to Read the Docs,
it might a be a good idea to follow these guidelines as well.
</p>
<div class="admonition note">
<p class="admonition-title">
Note
</p>
<p>
We do not recommend to follow this guide to deploy an instance of Read the Docs for production usage.
Take into account that this setup is only useful for developing purposes.
</p>
</div>
</div>
} | ||
|
||
|
||
When used together with ``?expand=identifiers`` the follwing field is also returned: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about returning this information on the same endpoint, is confusing returning the content of a section plus the list of sections, we should have a different endpoint to return data about the page itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand here why it may cause confusion and how having a different endpoint to only show the available id=
for a page is better? I think I'm not against a different endpoint, but I want to understand your position better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you are requesting information for a section, but getting information about the page too, you don't want to have to query a random section just to get information about the page. This also allow us to return more information about the page itself like the title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinging @agjohnson here since he wrote:
More metadata around headers, such as heading level. I'd like to display the topics in a nested menu, as they don't make as much sense in sequential order
in #7117
@agjohnson can you take a look at this proposal and suggest that would be the ideal response to you and how you would like to use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agjohnson returning headers may not be useful if those headers don't have an id=
that we can find. Also, I realized that it's not trivial to "find the title for a particular id=
" (we have this hack currently implemented that only works over a
tags. The id=
could be in the div
surrounding the h1
tag or the dl
tag or any other.
<div id="configuration">
<h1>Configuration<a class="headerlink" href="#configuration" title="Permalink to this headline">¶</a></h1>
<p>...</p>
</div>
For this case, we could get the .next()
from the id=configuration
. However, we can't guarantee that this will always work. Even if we get it, we will want to remove the trailing ¶
from its name as well in this case (but it could be a different char)
<dt id="confval-hoverxref_modal_class">
<code class="sig-name descname">hoverxref_modal_class</code>
<a class="headerlink" href="#confval-hoverxref_modal_class" title="Permalink to this definition">¶</a>
</dt>
This one is similar, but the .next()
element is exactly the title we are looking for. It does not contain the ¶
char.
However, if the title we want is exactly on the h1
or dl
(instead of their child), we will fail to detect it and we would return something invalid.
So, I think it's still useful to return all the available id=
s from the page so developers can explore them, but it's not easy (maybe impossible) to know the exact title for that particular id=
and the exact hierarchy as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this to a particular endpoint /api/v3/embed/identifiers/
and define the initial response: "return all the available identifiers for a specific page".
We can expand it later if we need something else and there is a good way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked more here, and the use case or feature that I'm describing is currently in a strange place.
More explicitly, I require a list of headings -- basically the toctree on a document -- with heading text, URL to link to the heading, and the heading nesting level or heading nesting as a data structure. This gives me what I need to basically expose the toctree in our application. This is the feature that I had wanted to expose for commercial use -- embedding documentation metadata/headings/etc in customer applications. A hoverxref type extension might be useful in addition here, but is separate. Customers would still need to get metadata out of a particular document in order to inject a toctree into an application view.
So, where I'm at is that I feel like this feature is more of a relic of where the embed API started and it is dragging the direction of the embed API down -- however this might just be my interpretation. The embed API is more focused on having generic support, and so therefore parsing HTML, and what I want is basically exposing the contents of objects.inv
. I could be talking about a separate feature, or we could still be talking about keeping this as a separate API endpoint.
I am going to research this a bit more, I may be talking about a separate feature entirely at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
objects.inv
is already indexed & exposed w/ the SphinxDomain modeling, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, so we talked about exposing a separate API for that instead.
We need to decide if this feature is going to be a separate product and how users will interact with it. I think we'd be maintaining 1 client JS library that does both embed of hover cards and injection of document header/refs from Sphinx, as the use case is similar.
@humitos Did we talk about what mechanism we'd use if this function was a subpath on the embed API? Are we using Sphinx refs via the SphinxDomain modeling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@humitos Did we talk about what mechanism we'd use if this function was a subpath on the embed API? Are we using Sphinx refs via the SphinxDomain modeling?
Nope, we didn't talk about this and I haven't thought too much either. I usually forget that we have SphinxDomain
model --and I haven't used it yet.
Something like /api/v3/embed/headers/
may work for this use case. However, we have to keep in mind that:
- it will only work with Sphinx if we use the
SphinxDomain
model - making it generic by parsing the HTML is probably impossible
- we can use the HTTP request arguments for other doctools, as
?doctool=mkdocs&version=1.0.1
and parse a known HTML structure
Notes to myself: it seems I missed some points from #7117 that @agjohnson wants, like language as argument and/or HTTP header. So, I'll come back to this document and update it with those points. |
} | ||
|
||
|
||
When used together with ``?expand=identifiers`` the follwing field is also returned: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using the embed API list of sections to inject a (currently flat, not nested) list of headings for help topics. Displaying heading listing would be an important feature of an embed client, for providing customers with a way to inject topics to their application.
- re-order goals - allow CORS only for public projects - new section with the definition of the contract - define `/api/v3/embed/identifiers/` endpoint - remove `title` field from it because it's not easy to get it - return only available identifiers - add `_links` to make the API browseable - handle project's domain changes querying for 3xx status codes
@ericholscher @stsewd @agjohnson I updated the document with the latest conversation we had. I think I covered all the concerns risen here. Let's see if you agree with what I wrote 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of small language nitpicks. I think in general this is heading in a good direction. I like having a way to know when to apply sphinx magic as needed.
only under ``?doctool=`` and ``?version=`` arguments. | ||
|
||
If no ``id`` selector is sent to the request, the content of the first meaningfull HTML tag | ||
(``<main>``, ``<div role="main">``, etc) identifier found is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this useful? Seems like a lot of content to return -- is the use case here for small pages or browsing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept it because it's how it works currently. I don't have a strong opinion and I'm happy to remove it if there is no well-defined use case yet. In fact, I didn't add this originally but then I came back and added this paragraph because I realized that I was removing a feature that we currently have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I'm not sure. I use this functionality a lot when tested (eg. I just throw a URL at the API and it returns the whole page content) -- but I'm not sure if this is generally useful. It definitely makes the API easier to browse and understand, so I think that's valuable. A lot of users will probably do this instead of any metadata
endpoint to get the page sections.
I think this functionality probably makes sense, but we should be very clear about what it does (no etc
:D)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I'm not sure if this is generally useful
I'm not sure we are already using the feature as is right now. However, I opened a PR in sphinx-hoverxref to support :doc:
that would require this: readthedocs/sphinx-hoverxref#68
but we should be very clear about what it does (no
etc
:D)
He he, I left the "etc" here because I didn't do the research required to grab the most used ones. The two that I mentioned here are HTML5 and Sphinx default's one, but it may be others. This will be clear in the documentation of the endpoint itself without magic/guessing involved, tho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this feature combined with ?nparagraphs=
makes more sense.
|
||
.. note:: | ||
|
||
This leaves the door open to be able to support more special cases (e.g. for other doctools) without breaking the actual behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a better solution than doing it for all requests, and gives us a way to deprecate things 👍
Unanswered questions | ||
-------------------- | ||
|
||
* How do we distinguish between our APIv3 for resources (models in the database) from these "feature API endpoints"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely a good question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to switch the version and API name so we have the version after its name? That way we can release a new version of a particular API without touching the others.
/api/search/v3/
/api/footer/v3/
/api/resources/v2/
and/api/resources/v3/
/api/embed/v3/
- etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That naming feels weird to me, I think we just need to document which endpoints are part of what.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stsewd can you expand how that would be and give some examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is already done in part for search for example https://docs.readthedocs.io/en/stable/server-side-search.html#api we just need to do the same with the embed api and footer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stsewd I don't follow you. That search endpoint is under /api/v2/
--so we are not differentiating the "Feature APIs" from the "Resources API" there.
print(stsewd.dump("brain").verbose())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to make this distinction in our docs, not in the URLs
Co-authored-by: Eric Holscher <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting a few questions left over after talking through this with @humitos
I think we are ready for a final review (cc @stsewd, @ericholscher, @agjohnson) and to start with its implementation. We had discussed a lot of things and it seems that we may add extra features in the future when we have more clear use-cases, but I think that's fine. I'm happy that we are covering the general case in a simple way and the special cases with an explicit argument. |
I think this looks good to get started on. I still have a couple implementation questions, and think we should focus on the core implementation first and then worry about Sphinx edge cases later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are to separate out the conversation around embedding Sphinx refs/headings, then I think this particular document is very close and we can continue. We need to come back to a separate conversation on a replacement API next.
|
||
* Add a note in the documentation mentioning this endpoint is deprecated | ||
* Promote the usage of the new Embed APIv3 | ||
* Migrate the ``sphinx-hoverxref`` extension to use the new endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are specifying deprecation in this document, then the use case for embedding Sphinx refs should also be mentioned here. Deprecation should depend on surfacing an API to expose documentation refs. We need to have a discussion about this end point and how it is implemented/etc next, but that can be separate.
I made too many changes to the original document that I'm creating a new PR based on Santos' PR: #8039. It basically extends and replaces it.
📝 read the rendered version at: https://docs--8222.org.readthedocs.build/en/8222/development/design/embed-api.html
Closes #7963
Closes #8039